Glenn's Notepad

Inquire

TLDR

During 2024 and 2025 I led the design and development of Inquire, a centralized user data service powering personalization across more than 100 local newspapers.

  • I implemented an entity-attribute-value (EAV) data model to avoid the friction of schema changes and let teams onboard new attributes quickly.
  • I built attribute catalogs and an intuitive management dashboard to enable a high degree of self-service and to reduce platform bottlenecks.
  • I added metrics to provide data owners with high-resolution insights into usage patterns.
  • I integrated user consent data, ensuring that consumers can be confident in data usage compliance.
  • I implemented in-flight type checking to guarantee the correctness of delivered data.
Table of contents

Introduction

I was recently challenged to write about work I was proud of. This post is an attempt at that.

Background

At Amedia, we host and develop for more than 100 local newspapers across Norway. A portion of Amedia's success is owed to partially personalized front pages, along with personalized marketing and communications. Amedia's backend is a collection of (micro) services called in a variety of situations, e.g., to serve the contents of an article, a front-page teaser, or marketing messages.

Since each page load is partially personalized to the logged-in user, many of the backend services require attributes of the user to function optimally. Much of this data lives in our data lake, which is not optimized for fast, random-access queries for individual users.

To support in-flight requests for user analytics, Amedia has over the course of several years developed multiple services that host selections of user attributes for fast access. Inquire is the third iteration of a user data service, and was made to address some challenges that were revealed along the way:

Need for high-resolution usage metrics:
Not having usage metrics on the attribute level left us searching in our codebases for whether an attribute was used if we wanted to retire it. Combined with each request returning all user data, it made attribute usage information hard to find.
Out-of-band consent data:
Whether the user consented to use of attributes for personalized ads, communications, marketing, or editorial content had to be requested from separate services. This fragmented setup made life harder for data consumers, who had to piece together user context from multiple services
Platform bottleneck:
Appending user attributes required significant work for the maintaining platform team. One column per user attribute implied schema migrations and accompanying application code changes whenever, e.g., a marketing team wanted to test new personalized messages. The fast pace of these teams is important for the organization.

While the previous iterations of the user data services have provided tremendous value for Amedia, Inquire was built to confront the challenges listed above.

System overview

Inquire consists of a central Postgres database, an HTTP/JSON service that serves user data to downstream consumers, and a suite of support components:

Batch ingestion pipeline
for periodic loading from the data lake.
Stream ingestion pipeline
for real-time updates (e.g., user consent).
Management dashboard
for attribute onboarding and observability.

Almost all user analytics data lives in Google BigQuery and is updated at most daily. The batch ingest pipeline regularly checks source tables for updates and makes sure that the latest data is imported to the Inquire database. Other data, such as user consents, are retrieved from message queues via the stream ingestor, immediately following changes to the user's preferences.

I initially wrote the main HTTP/JSON service in Rust, and the support components in Python. However, since then, all the support services have been migrated to Rust to achieve a homogenous code base for this system.

Key design decisions and challenges

A selection of key challenges and areas where I had important contributions and learnings.

Flexible data modeling with EAV

A key design choice in Inquire is an entity-attribute-value data model. Data is stored in a table with columns for user ID, attribute ID which references an attribute catalog, metadata describing when data was updated and when it becomes stale, and the data value itself, stored as JSON. The attribute catalog contains a human-readable name and expected type information. This lets us avoid constant schema migrations and unblock teams who wanted to test new attributes quickly.

Composed attributes

Some attributes are derived from multiple data sources. For instance, a user changing their newspaper subscription plan is a rare occurrence, therefore, the subscription_plan_batch attribute is based on a table in our data lake, which is updated and ingested nightly. However, we also receive events when this happens, which creates or updates the subscription_plan_stream attribute.

To streamline how the consumers use this data, they only see virtual user attributes, which are functions of the physical attributes we have talked about up until now. A virtual attribute (e.g., subscription_plan) has an ordered list of physical attributes (e.g., [subscription_plan_stream, subscription_plan_batch]). A virtual attribute's value resolves to one of the physical values depending on the chosen selection strategy for that virtual attribute:

Coalesce:
The first non-null value in the ordered list of physical attributes.
Newest:
The newest non-null value among the list of physical attributes. Order is ignored.
Special:
Value selection or composition is hard-coded in the application, i.e., not configurable at runtime.

Virtual attributes lets us present a clean and stable interface to consumers, even when underlying data sources change, vary in freshness, or in reliability.

Easy-to-use management dashboard

Our flexible data model gives us the power to add attributes, modify their composition or type, associate them with purposes and consent data. To make this easy, we have developed a management dashboard that streamlines common operations and gives us an overview of attributes and data freshness.

The management dashboard is a server-side rendered website with htmx for dynamic content.

The dashboard has list pages for the physical and virtual attribute catalogs, purposes, and ingest jobs. Each attribute entry has a detail page and an edit page. Each purpose can also be edited, and each job has a detail page. It's also possible to add new attributes via the dashboard. Every developer in our organization can view all data, but only data platform developers can POST to the edit and new endpoints.

The management dashboard allows us to swiftly carry out necessary modifications and lets us get fast overviews of the data in our service.

Type checking of attribute values

The EAV data model stores attribute values as JSON in the database. This is great for flexibility, but sacrifices type safety at rest. To simplify the batch ingestor pipeline, we do not have any type checking of the data imported from our data lake. (The data in our data lake is, however, typed.) Instead, we type check the JSON data on every query from consumers, and log fatal errors if a discrepancy from the expected type (as stored in the attribute catalog) should be found.

To date, after nearly one year of operation with approximately 250 requests per second on average over 30 attributes, we have not had a single instance of wrong typing. This has given us confidence in that the tradeoff for flexibility against rigidness has been worthwhile.

High-resolution metrics

We have metrics recording the use of each virtual attribute per consumer per purpose. Additionally, we record whether a request for an attribute returned a null value or actual data. This allows us to answer questions like these:

  • Which attributes are currently in use by any consumer?
  • Which consumers use a certain attribute?
  • For what purpose does a certain consumer use data?
  • What attribute does a certain consumer use?
  • An attribute's hit rate suddenly dropped to 0%, why?

These metrics have been invaluable. We can now confidently deprecate attributes, catch stale data early, and prove compliance when needed.

Performance considerations

The EAV data model places multiple rows per user in our data table. This bars us from using a single ID (e.g., user ID) as lookup value when fetching data for a user. Instead, we have to fetch rows corresponding for all relevant attributes for that user. We also make the observation that multiple requests for the same user (but for different attributes) is likely to occur within a short span of time, since multiple backends querying Inquire are involved for every page load.

To make the EAV model viable at scale, I indexed user ID to speed up multi-row lookups.

To avoid hitting the database for each request for the same user in the same page load, we added an in-memory cache to the API application. When any user data is requested, we fetch all attribute values for that user and store them in the cache for a couple of seconds. Subsequent requests for that user use in-memory attribute values. We have approximately 30% cache hit rate. This is quite low, but is explained by us having multiple replicas, so subsequent requests might hit a different process. Each cache hit saves a database round-trip.

Impact

Inquire is the central analytics user data API for personalization across Amedia's product ecosystem, serving hundreds of millions of requests monthly. By addressing flexibility, governance, and observability, it has empowered product and marketing teams to iterate faster.

My role in leading the design and implementation of the system's architecture, data modeling, and APIs has given me valuable experience in building scalable, developer-friendly platforms.