LAUNCHMongoDB 8.3 is built for the sub-100ms retrieval & zero downtime AI demands. Read blog >
AI DATAStop fighting your data layer. Get the memory & retrieval agents need to scale. Read blog >

Machine Learning Research Group

Research in machine learning focused on information retrieval, application development and modernization, and database performance.
An illustration that shows cognitive functions being powered by machines (cogwheels).
Who We Are
The Machine Learning team focuses on representation learning, language models, and learning on semi-structured data. It provides MongoDB with deep technical knowledge of the latest developments in machine learning.
Research Areas
Technical_MDB_Vectors

Representation learning

MongoDB conducts research on enabling embedding models to run on resource-constrained infrastructure and automated fine-tuning approaches.
mdb_database

Learning on semi-structured data

Models that natively support semi-structured data as input are useful for tasks such as supervised learning, DB index recommendations, and cardinality estimation.
general_action_develop

Language models for code generation

MongoDB aims to improve the capabilities of language models in the domain of code generation to support application modernization.
Our Team
Picture of Robin Vujanic

Robin Vujanic

Staff Research Engineer
Robin's research focuses on embedding models, particularly techniques for improving computational efficiency.
Read bio
Research Papers

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

Robin Vujanic, Thomas Rückstieß

arXiv, 2025

ORIGAMI: A generative transformer architecture for predictions from semi-structured data

Thomas Rückstieß, Alana Huang and Robin Vujanic

arXiv, 2024
ORIGAMI enables efficient end-to-end learning on semi-structured JSON data.
 
 
Read the paper | GitHub Repository