Docs Menu
Docs Home
/ /

Integrate MongoDB with Feast

Feast provides a high-level FeatureStore API that allows you to define features and groups of features (feature views), online and offline storage, and the ability to dynamically move data from offline to online storage (materialization). The MongoDB integration allows you to use MongoDB as both an online and offline store for Feast, so you can define features once and serve them consistently across model training and online inference without maintaining separate storage systems.

MongoDB's flexible document model and MQL allow it to handle the complex query patterns required for the offline store. For the online store, MongoDB is optimized for web-scale access patterns—fast reads/writes, horizontal scaling, and flexible schemas that minimize joins and round trips.

In this integration overview, you can find:

  • An introduction to MongoDB as Feast's online and offline store.

  • How Feast concepts map to MongoDB.

  • Detailed explanations of the MongoDB offline and online store designs.

  • Configuration examples for setting up the MongoDB stores in Feast.

  • The online store is a key-value store backed by a single MongoDB collection, optimized for low-latency retrieval of the latest features per entity during online inference.

  • The offline store is a compute and translation layer that queries rows of historical feature data stored in a MongoDB collection (typically named feature_history) for training datasets, scoring and materialization (promoting data to the online store).

A typical end-to-end workflow looks like this:

  1. Define entities, feature views, and data sources that point to MongoDB-backed collections.

  2. Ingest feature data into the offline store via offline_write_batch, which accepts a PyArrow table as input and inserts the data into the feature_history MongoDB collection following the offline store schema.

  3. Generate training data using get_historical_features, which runs an efficient point-in-time join over historical feature rows stored in MongoDB.

  4. Materialize the latest feature values from the offline store into the online store using pull_latest_from_table_or_query and online_write_batch.

  5. Serve features online via Feast's online APIs, which read from a single MongoDB collection keyed by a serialized entity key.

The MongoDB integration follows Feast's standard conceptual model but maps those abstractions to a MongoDB schema designed for entity-centric online documents and append-only historical events.

Feast Concept
Role in Feast
MongoDB Representation

Entity

Domain object that features describe (e.g. driver, user).

Encoded into a serialized entity key; stored as _id in the online store and entity_id in the offline store.

Join key

Column(s) used to identify an entity row in a dataframe.

Fed into serialize_entity_key; the resulting bytes are used as the entity identifier in MongoDB.

Serialized EntityKey

Deterministic binary encoding of join key names and values.

Online: _id: serialized_entity_key (primary key). Offline: entity_id: Binary(...) field in feature_history documents.

Feature

Named, typed measurement at a point in time.

A field inside the features subdocument (offline) or features.<feature_view>.<feature_name> (online).

FeatureView

Binds features to entities, data source, and TTL; unit of organization.

Offline: feature_view discriminator string on each historical document. Online: groups nested under features.<feature_view> and per-FV timestamps in event_timestamps.

DataSource

Metadata pointer to where historical features live.

MongoDBSource pointing at a MongoDB collection (database, collection, connection_string) plus timestamps.

OfflineStore

Read/write interface for historical features and PIT joins.

MongoDBOfflineStore implementation running MQL aggregations over a shared feature_history collection with a compound index.

OnlineStore

Low-latency store of latest feature values per entity.

Single MongoDB collection of entity documents keyed by _id = serialized_entity_key, with nested features and event_timestamps subdocuments.

TTL

FeatureView-level freshness window.

Enforced in offline queries and Python post-filtering when computing historical features; may also be combined with created_timestamp or updated_at in indexes.

FeatureService

Named list of feature references for a model.

No direct MongoDB representation; used by Feast to decide which features.<feature_view>.<feature_name> paths to read from the online store.

Registry

Metadata store for entities, feature views, and services.

Unchanged; MongoDB integration does not replace the Feast registry.

RetrievalJob

Deferred execution wrapper returning feature tables.

For MongoDB offline store, encapsulates an MQL aggregation and exposes Arrow exports backed by cursor-to-Arrow conversion.

Materialization

Scheduled propagation of latest offline features into the online store.

Implemented via pull_latest_from_table_or_query over feature_history then online_write_batch into the online MongoDB collection.

The MongoDB offline store uses a single shared collection (by default feature_history) that stores append-only historical feature rows for all feature views.

Each document represents one observation of one entity for one FeatureView at a specific event timestamp:

{
"entity_id": "Binary(...)",
"feature_view": "driver_stats",
"event_timestamp": "ISODate(2024-01-15T12:00:00Z)",
"created_at": "ISODate(2024-01-15T12:01:00Z)",
"features": {
"conv_rate": 0.72,
"acc_rate": 0.91,
"avg_daily_trips": 14
}
}

Key properties:

  • Append-only: historical data is treated as immutable; corrections are written as new rows with newer created_at timestamps rather than in-place updates.

  • Time-series friendly: event_timestamp represents when the feature value was observed; created_at is used as a tie-breaker when multiple observations share the same event timestamp.

  • Feature grouping by FeatureView: feature_view identifies which FeatureView the row belongs to, so a single collection can host multiple FVs.

A single compound index supports all major query patterns:

(entity_id ASC, feature_view ASC, event_timestamp DESC, created_at DESC)

This index enables efficient range scans over entities and feature views, while ensuring that the most recent observation per (entity_id, feature_view) is seen first during aggregation.

The MongoDB offline store implements the standard Feast offline store interface:

  • offline_write_batch - Writes a pyarrow.Table of feature data into the underlying MongoDB collection, using the configured MongoDBSource metadata to determine connection_string, database, and collection.

  • get_historical_features - Given an entity_df of entities and event timestamps plus a set of FeatureViews, returns a widened table where each row includes point-in-time correct feature values: for each (entity_id, event_timestamp) pair, the most recent feature value whose event_timestamp <= entity_event_timestamp and within TTL is selected.

  • pull_latest_from_table_or_query - Returns one row per entity containing the latest feature values in a time window, used by Feast's materialization engine to seed the online store.

  • pull_all_from_table_or_query - Retrieves all rows from a data source in a specified date range for export or inspection, backed by the same feature_history schema and index.

  • persist (via RetrievalJob.persist) - Writes the result of a historical feature query to a separate collection or external sink via SavedDatasetStorage, distinct from feature_history.

The recommended offline implementation is the aggregation-based MongoDB offline store, named MongoDBOfflineStore.

Key characteristics:

  • Uses a single feature_history collection shared by all FeatureViews, distinguished by feature_view.

  • Relies on the compound index (entity_id, feature_view, event_timestamp, created_at) for all queries, avoiding full collection scans.

  • Uses server-side $group $first for "scoring" workloads (one row per entity), and pd.merge_asof for "training" workloads with repeated entity IDs, balancing correctness and performance.

  • Bounded memory usage via chunking, so large entity_df values can be processed without exhausting RAM.

Benchmarks show this implementation provides the best combination of throughput and memory efficiency compared to alternative MongoDB offline approaches.

Capability
Supported?
Notes

get_historical_features (PIT join)

Yes

Implemented via MongoDBOfflineStore using indexed aggregations and Pandas merge-asof.

pull_latest_from_table_or_query

Yes

Uses $match + $sort + $group $first over (entity_id, feature_view, event_timestamp, created_at).

pull_all_from_table_or_query

Yes

Full historical scan with time filters over feature_history.

offline_write_batch

Yes

Writes Arrow tables into MongoDB via the configured MongoDBSource.

persist

Yes

Exports historical query results to a separate collection using SavedDatasetStorage.

Additional conveniences like exporting directly to data lakes or warehouses depend on the specific RetrievalJob implementation and are expected to follow Feast's standard patterns for offline stores.

The MongoDB online store uses a single collection for all FeatureViews, keyed by the serialized entity key.

  • _id: serialized_entity_key(entity_key), produced by Feast's stable encoding function that sorts entity names and values and encodes them into bytes.

  • features: nested subdocument where each FeatureView maintains its own feature namespace.

  • event_timestamps: per-FeatureView timestamps indicating when the latest value for that FeatureView was written.

  • created_timestamp or updated_at: bookkeeping fields useful for TTL indexing and diagnostics.

Example (simplified):

{
"_id": "b\"<serialized_entity_key>\"",
"features": {
"driver_stats": {
"rating": 4.91,
"trips_last_7d": 132
},
"pricing": {
"surge_multiplier": 1.2
}
},
"event_timestamps": {
"driver_stats": "ISODate(2026-01-01T12:00:00Z)",
"pricing": "ISODate(2026-01-21T12:00:00Z)"
},
"created_timestamp": "ISODate(2026-01-21T12:00:00Z)"
}

Design rationale:

  • A single collection keeps each entity's state in one document, which matches Feast's expectation of key-based lookups and avoids fragmenting state across per-FeatureView collections.

  • Using the serialized entity key as _id reuses Feast's deterministic encoding, avoids duplicate primary keys across collections, and keeps retrieval to a single key lookup per entity.

The MongoDB online store implements Feast's standard online store API:

  • online_write_batch - During materialization, Feast writes the latest feature values for each entity into MongoDB documents. Each batch upsert updates only the relevant nested features.<feature_view> subdocument and its corresponding entry in event_timestamps, keeping entity documents atomic and consistent.

  • online_read and get_online_features - Online serving resolves entity keys into _id values using the same serialization logic as offline, then performs key lookups. Each lookup returns all requested features for the entity in a single round trip, leveraging the nested features structure.

  • TTL and freshness - Feature TTL is configured on the FeatureView and used primarily in offline PIT joins; online TTL can be implemented with an index on updated_at or similar timestamp, consistent with Feast's notion that offline stores are append-only while online stores hold the latest state.

The offline store is configured using MongoDBOfflineStoreConfig:

class MongoDBOfflineStoreConfig(FeastConfigBaseModel):
type: str = "...MongoDBOfflineStore"
connection_string: str = "mongodb://localhost:27017"
database: str = "feast"
collection: str = "feature_history"

Example feature_store.yaml:

offline_store:
type: feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb.MongoDBOfflineStore
connection_string: "mongodb+srv://user:pass@cluster.mongodb.net"
database: feast
collection: feature_history

MongoDBSource is the corresponding DataSource. Its name field becomes the feature_view discriminator stored in every document. For full configuration options, see the MongoDB Data Source reference in the Feast docs.

source = MongoDBSource(
name="driver_stats",
timestamp_field="event_timestamp",
created_timestamp_column="created_at",
)
  • Follow the Feast Quickstart to set up a local feature store, then swap in MongoDB as an online and offline store using the configuration examples on this page.

  • Review the MongoDB Online Store reference in the Feast docs for configuration options, async support, and the full functionality matrix.

  • Review the MongoDB Offline Store reference for offline store configuration and supported functionality.

  • Review the MongoDB Data Source reference for MongoDBSource options and schema details.

  • Learn core Feast concepts such as entities, feature views, and materialization in the Feast Concepts guide.

Back

Build an AI Agent

On this page