MongoDB conducts research on enabling embedding models to run on resource-constrained infrastructure and automated fine-tuning approaches.
Models that natively support semi-structured data as input are useful for tasks such as supervised learning, DB index recommendations, and cardinality estimation.
MongoDB aims to improve the capabilities of language models in the domain of code generation to support application modernization.
Robin completed his Ph.D. at ETH Zurich in the Department of Information Technology and Electrical Engineering, working on mathematical optimization problems, including decomposition methods for large-scale non-convex programs and robust optimization. He was then a postdoctoral researcher and later a group technical lead at the University of Sydney, where he worked on applying machine learning and optimization techniques in the industrial sector. After spending several years in the software industry as a machine learning engineer, he joined MongoDB, where he works in the Machine Learning research group.
His research interests lie in representation learning, with a particular focus on models for information retrieval, as well as approaches aimed at reducing the computational burden of transformer-based language models.
We introduce a knowledge distillation technique that produces embedding models retaining up to 97% of their teacher's performance, while being 5x-15x smaller, 7x-24x faster, and fully compatible with their teacher for flexible deployment.
Read the paper | Models | Blog
ORIGAMI enables efficient end-to-end learning on semi-structured JSON data.
Read the paper | GitHub Repository