WEBINARHow to build smarter AI apps with Python and MongoDB. Register now >
NEWHow to build smarter AI apps with Python and MongoDB. Register now >

Machine Learning Research Group

Research in machine learning focused on information retrieval, application development and modernization, and database performance.
An illustration that shows cognitive functions being powered by machines (cogwheels).
Who We Are
The Machine Learning team focuses on representation learning, language models, and learning on semi-structured data. It provides MongoDB with deep technical knowledge of the latest developments in machine learning.
Research Areas
Technical_MDB_Vectors

Representation learning

MongoDB conducts research on enabling embedding models to run on resource-constrained infrastructure and automated fine-tuning approaches.
mdb_database

Learning on semi-structured data

Models that natively support semi-structured data as input are useful for tasks such as supervised learning, DB index recommendations, and cardinality estimation.
general_action_develop

Language models for code generation

MongoDB aims to improve the capabilities of language models in the domain of code generation to support application modernization.
Our Team
Picture of Robin Vujanic

Robin Vujanic

Staff Research Engineer
Robin's research focuses on embedding models, particularly techniques for improving computational efficiency.
Read bio
Research Papers

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

Robin Vujanic, Thomas Rückstieß

arXiv, 2025

ORIGAMI: A generative transformer architecture for predictions from semi-structured data

Thomas Rückstieß, Alana Huang and Robin Vujanic

arXiv, 2024
ORIGAMI enables efficient end-to-end learning on semi-structured JSON data.
 
 
Read the paper | GitHub Repository