MongoSF 2012

MongoSF 2012

May 4th

MongoSF is an annual one-day conference in San Francisco dedicated to the open source, non-relational database MongoDB. The conference will feature over 40 sessions from MongoDB developers at 10gen, MongoDB users from the community, and technology partners, with presentations for both the novice and expert. Past conferences have sold out so reserve your ticket today.

Slides & Video

Slides and video from MongoSF are being posted on as they become available.



Follow the #mongoSF hashtag to stay up-to-date on all things MongoSF.



For a full schedule with track descriptions, please click here.

Presentation Abstracts:


9:50am - 4:55pm

Eliot Horowitz, Jared Rosoff, and Edouard Servan-Schreiber, 10gen | Building an Application with MongoDB: A Tutorial

In this interactive, full-day tutorial, 10gen founder and CTO Eliot Horowitz will walk you through building a MongoDB-powered chat server. Whether you are an experienced MongoDB user or you are building your first app, this track is a great way to learn about important concepts in MongoDB, such as schema design, indexing, scaling out, administration, and more. This track will cover:

  • Schema design for various components of the application, including users, rooms, and chat logs
  • Deploying using replication and setting up back ups
  • Moving from a single replica set to a sharded cluster -- without any down time
  • Building reports using map/reduce and the new aggregation framework

You can attend the entire track, or drop in for individual sessions over the course of the conference. The source code and other content will be available on Github so you can easily follow along, and attendees will be able communicate with one another during the conference using the chat app.


9:50am - 10:35am

Kevin Hanson, Solutions Architect, 10gen | Schema Design by Example

MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might not be obvious. In this talk, we'll learn about MongoDB in practice by looking at hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.


11:10am - 11:55am

Max Schireson, President, 10gen | Indexing and Query Optimization

MongoDB supports a wide range of indexing options to enable fast querying of your data. In this talk we’ll cover how indexing works, the various indexing options, and cover use cases where each might be useful.


Dwight Merriman, Co-Founder and CTO, 10gen | MongoDB Internals: A Tour of the Source Code


Ben Sabrin & Edouard Servan-Schreiber, 10gen | Introducing MongoDB into Your Organization

Adoption of MongoDB has accelerated tremendously among developers in the past 18 months, and many large enterprises have now deployed MongoDB in reliable and large scale production environments. However, for many developers, it remains a challenge to convince production teams and business stakeholders to adopt an open source technology that has not been certified yet by their IT teams. This session will provide you with the compelling arguments to reassure business and production teams such as:

  • Public customer references and real-world case studies (migration, and adoption stories)
  • Deployment support and practices for robustness
  • How MongoDB contributes to your company’s business value


11:35am - 12:20pm

Scott Hernandez, Software Engineer, 10gen | Operational Best Practices

In this session we’ll review how to administer and deploy MongoDB starting from the basics and covering the best practices and procedures. This session will cover backups, network availability, performance pitfalls, log management, monitoring and alerting.


1:15pm - 1:45pm

Dan Pasette, Engineering Manager, 10gen | Deployment Preparedness

The last bugs are finished, testing is complete, and business is ready. What do you do next? In this talk we will cover the topics to ensure that you are prepared for a successful launch of your MongoDB based product, including:

  • Machine Sizing : How much CPU, memory, and disk should I use for my MongoDB? Backup / Restore procedures? What are my options and what do I need to do?

  • Load Testing and Capacity Planning: How much resource is my MongoDB going to use? When do I need to add replicas and shards?

  • Monitoring: What should I be watching and how do I know if things are running correctly?


1:50pm - 2:35pm

Ben Becker, Software Engineer, 10gen | Journaling and the Storage Engine

MongoDB supports write-ahead journaling (by default) to facilitate fast crash recovery and consistency in database files after that crash. In this session, we'll give an overview of on-disk persistence with MongoDB, journaling, and discuss the internals of journaling and the storage engine.


Chris Westin, Software Engineer, 10gen | The New Aggregation Framework

We're working on a new aggregation framework for MongoDB that will introduce a new aggregation system that will make it a lot easier to do simple tasks like counting, averaging, and finding minima or maxima while grouping by keys in a collection. The new aggregation features are not a replacement for map-reduce, but will make it possible to do a number of things much more easily, without having to resort to the big hammer that is map-reduce. After introducing the syntax and usage patterns for the new aggregation system, we will give some demonstrations of aggregation using the new system.


2:45pm - 3:30pm

Dwight Merriman, Co-Founder and CEO, 10gen | _Concurrency Internals in MongoDB v2.2

10gen CEO & Co-Founder Dwight Merriman will look "under the hood" atconcurrency internals in the upcoming version of MongoDB.


Steve Francia, 10gen | MongoDB and Hadoop

Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using Hadoop's MapReduce and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With support for Hadoop streaming support goes beyond the native Java enabling map reduce to be run in languages like Python and Ruby.


Ben Sabrin & Edouard Servan-Schreiber, 10gen | Introducing MongoDB into Your Organization

Adoption of MongoDB has accelerated tremendously among developers in the past 18 months, and many large enterprises have now deployed MongoDB in reliable and large scale production environments. However, for many developers, it remains a challenge to convince production teams and business stakeholders to adopt an open source technology that has not been certified yet by their IT teams. This session will provide you with the compelling arguments to reassure business and production teams such as:

  • Public customer references and real-world case studies (migration, and adoption stories)
  • Deployment support and practices for robustness
  • How MongoDB contributes to your company’s business value

3:35pm - 4:05 pm

Sandeep Parikh, 10gen | MongoDB on Amazon EC2

Gain valuable insights on running MongoDB on Amazon EC2. EC2 provides a simple and flexible deployment model for your application and databases. However, you still need to manage, maintain, and monitor your system. So how do you do this with EC2? We'll explore these questions and more in this session.


4:10pm - 4:55pm

Scott Hernandez, Software Engineer, 10gen | Data Center Awareness

This talk will introduce deployment of MongoDB across multiple data centers. We'll discuss the advantages of a multi data center deployment for read/write locality, the various deployment strategies, and disaster preparedness and recovery. In addition, we'll look at the MongoDB roadmap and planned enhancements around data center awareness.



9:50 am - 10:20am

Ev Kontsevoy, Co-Founder & CEO, Mailgun | Scaling Write-Heavy Applications with MongoDB

Optimizing the database layer for most web applications tends to be mostly about fast reads. That is why the majority of benchmarks and technical blogs focus on read-intensive workloads. At Mailgun, up to 75% of queries are ordered updates and deletes. This presents a number of challenges. This talk is about lessons we learned at Mailgun while building the highly available and fast back-end for our ever-changing message queue on top of MongoDB.


Alexei Krainiouk, Adobe | Using MongoDB as a Persistence Layer for High Performance Lookups in Adobe Scene7

Developed a redundant, scalable database that can replicate naturally to external data centers (for geographic redundancy). As Scene7 SaaS clients have high cache reuse, we implemented a front side cache to the database such that the server footprint is minimized but each individual data center can scale and handle all requests for a geography should the other data center go offline.


10:25am - 10:55am

Chris Merz, Manager of Operations, MapMyFitness | MongoDB Versatility: Scaling the MapMyFitness Platform

The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit it's ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.


10:40am - 11:25am

John Nunemaker, Github | Schema Design For Analytics

The flexibility of MongoDB makes it perfect for storing analytics. I'll discuss a few patterns for storing data that we have learned while growing from zero to millions of page views a day. You'll leave with a desire to measure everything and the ability to do it.


12:00pm - 12:30pm

Juan L. Negron, Integration Engineer, & Mark Baker, Server Product Manager, Canonical | Cloud Bursting MongoDB with Ubuntu Server and Juju

This talk explores how to expand and contract a network deployment using multiple MongoDB environments with Ubuntu Server and Juju. In this talk we will show one way that related services can be deployed on multiple environments via Juju and still maintain connectivity between them.


Jason Hoffman, Founder & CTO, Joyent | N2M: Node.js and MongoDB as the Modern Stack for the Real-Time Web

The combination of node.js and mongodb has emerged as the framework of choice for people building real-time applications, and is on the path to becoming the replacement for the common LAMP stack. This talk will suggest some reasons why and discuss some of the observed patterns in these types of applications.


1:15pm - 1:45pm

Yuri Finkelstein, Architect, eBay | MongoDB @ eBay

I will cover several eBay use cases where MongoDB is used and will touch on most interesting tech problems that we faced and decisions that we made.


Steve Citron-Pousty, Red Hat | Get your Spatial On with MongoDB in the Cloud

You have seen the stuff that FourSquare has done with spatial and you want some of that hotness for your app. But, where to start? Have no fear - by the end of this session you will have all the pieces necessary to write your own location based app. The OpenShift platform already has MongoDB plus the spatial bits installed, so there is no need to find a VPS or convince your IT dude to install stuff. What's OpenShift? It's Red Hat's free auto-scaling Platform as a Service. This session will start with a quick intro on firing up an OpenShift instance with MongoDB. Then we will load some data into MongoDB, show you how to handle spatial data, do some command line spatial operations, and finally plugin in some code to build a simple "Find the National Park Application". When you go home you will be able to amaze your friends and supervisors with some spatial magic goodness you can control.


1:50pm - 2:35pm

Jeremy Zawodny, Software Engineer, Craigslist | MongoDB at Craigslist: 1 year later

Last year craigslist deployed MongoDB for its multi-billion document posting archive, largely due to its schema-free nature and built-in sharding and replica sets. Since then we've looked at it for other projects--specifically high-volume and multi-datacenter. In the process we've learned more about where other features do and don't work so well, including replication, capped collections, and compound indexes. This presentation wil recap what's worked well for us and discuss the other issues we ran into for new projects, as well as possible improvements in the the design or enhancements for MongoDB.


2:45pm - 3:30pm

Wes Widner, Software Architect, McAfee | MongoDB Security Considerations

MongoDB is growing and while that is welcome news to MongoDB fans, it also makes MongoDB an attractive target. In this talk we'll be exploring MongoDB security and how we can make our clusters less attractive targets.


3:35pm - 4:05pm

Tony Tam, VPE, Wordnik | Backup Strategies: Keeping Your MongoDB Data Safe

With over 2 years of production experience with MongoDB, including data center migrations, hardware failures and an occasional developer fat finger error, Wordnik has learned a thing or two about keeping MongoDB data safe and available. During this talk, Wordnik's CTO Tony Tam will share some of the tips & tricks that they have developed.


Monica Wilkinson, Developer Relations, Cloud Foundry | Activity Streams on Cloud Foundry with MongoDB and NodeJS

Learn how you can add Activity Streams to your web applications with a few simple steps using Cloud Foundry, MongoDB and NodeJS. This hands-on session will walk you through the schema design, engine creation on Cloud Foundry and usage from any web client.


4:10pm - 4:55pm

Montse Medina, COO, Jetlore | MongoDB Schema Design: Insights and Tradeoffs

I will describe the challenges we faced when designing a MongoDB database for processing large data streams and the solutions we applied. Some of the difficulties included write-intensive loads, uneven access patterns (posts with many followers get many more hits than posts with few followers), and non-trivial support of privacy. I will describe the choices we made for schema design to optimize writes and efficient querying/retrieval. I will also talk about indexing strategies, tradeoffs we made to work around MongoDB design, and reasoning we applied to find the most optimal denormalization of collections.


Greg Brockman, Stripe | High Availability with MongoDB for Fun and Profit

MongoDB's replica sets provide a powerful primitive for high availability. However, like any tool, replica sets require proper wielding. At Stripe, we've evolved a set of development, deployment, and administration techniques to achieve true zero downtime during both routine maintenance and catastrophic failure. In this talk, we'll share a selection of these techniques and tricks, as well as the mistakes we made along the way.


Ecosystem & Related Technologies

9:50am - 10:20am

Sebastian Stadil, Scalr | Putting MongoDB on Auto-Pilot

Sebastian Stadil, Scalr Founder & CEO, will introduce the open-source project Scalr compatibility with MongoDB. On Scalr, MongoDB users can manage, automate and auto-scale their DB. Backups, replication, and configuration are all managed for them. Sebastian will present an insight of the technological challenges behind this compatibility and how can MongoDB users can get the most out of it.


10:25am - 10:50am

Roger Bodamer, Analytica | Making it Easy to Report and Analyze your Data Using Analytica and Excel

Getting insights out of MongoDB is easy if you know how to program. But what if you don't? In this talk, we introduce Analytica, a software product with an Excel front-end that makes it easy to report, analyze, and graph your data directly from MongoDB. We will show you how to build reports, create graphs, and deploy Analytica without impacting your production deployment.

11:10am - 11:55am

Ryan Gyer, RightScale | MongoDB in Minutes in the Cloud

You have a need for MongoDB, and you’d like to run it in the cloud…but can it be done quickly? It’s much easier than you might think. In this presentation, Claudio Gentile from RightScale (Leaders in Cloud Management) will discuss real world use cases for social gaming applications using MongoDB, and will showcase how easy it is to deploy Mongo as part of a 3-tier cloud architecture in just minutes. The audience will learn:

  • How to deploy MongoDB alone, or as part of a multitier HA architecture in minutes
  • A tour through the dashboard, available reporting, and show full server access
  • What RightScalegives MongoDB users vs. AWS or other clouds alone
  • Why MongoDB is a perfect fit for the cloud

1:15pm - 1:45pm

John A. De Goes, ReportGrid | Advanced Analytics and Statistics with MongoDB

Big data guru John A. De Goes, CTO of Precog, presents an overview of Quirrel, a high-level, statistically-oriented, open source query language designed for advanced analytics and statistics on large-scale JSON data sets. John discusses how the language can be used to solve a variety of common problems encountered by modern application developers, and then overviews ongoing efforts to port the language to MongoDB as part of a pure open source distribution.

1:50pm - 2:35pm

Omer Gertel, OffScale Co-Founder and CTO | Automated testing with MongoDB and Offscale Automated testing with MongoDB and Offscale

In this talk we will explore several techniques to integrate MongoDB into the automated build and test cycle. Dynamic schema databases like MongoDB allows for more flexibly during development, but require more rigorous tests for different data types. As code evolves, it needs to support objects from older versions or migrate objects as the expected object structure changes. Among the different strategies for automated tests that include the database, we will show how OffScale, git for databases, can help you quickly set up data sets for tests.


2:45pm - 3:30pm

Eystein Stenberg, Software Architect, CFEngine | Using MongoDB to Understand the State of Agile and Large-Scale IT Systems

CFEngine is the world's most widely used software solution for ensuring uptime of large-scale distributed IT systems. As part of its operation, CFEngine is collecting a wide variety of reports about the state of all systems, including compliance, application patch status, monitoring, time-series data and more. Every five minutes reports are collected from thousands of nodes, a constant stream of gigabytes of data written per hour to the central database. This database use-case is different from the standard web-application one, where the data does not change significantly. This talk will explain why MongoDB was selected as the back-end database for the operations CFEngine requires, and cover experiences and performance characteristics when using MongoDB in a write-intensive environment.

3:35pm - 4:05pm

Todd Dampier, CTO, MongoLab | Rock Solid MongoDB Ops: Running MongoDB Like a Pro

Ok, so you’ve launched your development sandbox, love MongoDB, and are now thinking about how you want to handle your production environment. Learn all sorts of tips and tricks in this practical session on MongoDB operations by leading cloud database hosting provider MongoLab. We at MongoLab provide database hosting on EC2, Rackspace and Joyent for thousands of applications powered by MongoDB. In this session we will share with you some of the best practices we have developed, and help you avoid some of the pitfalls common with running production MongoDB deployments. This talk will cover the basics, such as VM selection, OS and disk configuration as well as more advanced topics such as clustering, VM migrations/upgrades, backup strategies and monitoring, with special emphasis on running MongoDB in the cloud. Don’t miss this informative session that will help you operate MongoDB like a pro!


Speaker Bios

Shafaq Abdullah | Zenprise

Shafaq Abdullah is a Principal Engineer/Architect Android at Zenprise Inc where he is engineering solutions for enterprises to bring security, monitoring, mobile device management (MDM) using cloud based web services on Android platform. Scalability and redundancy are the hall-mark of security and MDM as-a-service model. He has worked as a Software Consultant at Sony Ericsson for Android platform where he participated in architecture and development of Android framework for XPeria Play. At Nokia, Mt. View California, Shafaq participated in building highly scalable web services as a back-end of as an R&D Engineer. As a Lead Dev in Open Source Software Operations (OSSO), Nokia Finland, He had been responsible for Multimedia framework design and development and plays a role in architecture and future road mapping of cutting-edge technologies not only within OSSO but also Nokia wide. Shafaq holds MS.Eng Information Technology from Tampere Univ of Technology Finland and B.Sc Electrical (Computer) Engg from University of Engg & Tech, Lahore, Pakistan.

John Nunemaker | Github

John Nunemaker has been a MongoDB enthusiast from nearly the beginning. He created MongoMapper, the popular Ruby ORM for Mongo, and works at GitHub, primarily on Gauges

Jason Hoffman | Joyent

Jason is the founder and CTO at Joyent, where he is responsible for overseeing the engineering, operations and product groups’ development and implementation of Joyent’s Cloud Computing technology (node.js, smartOS and Smart Datacenter). He is also responsible for research and advanced development, technical outreach, evangelism, consultative efforts for partners and business units, and manages Joyent’s intellectual property portfolio including involvement in open source projects, licensing, technology transfer, assessments of potential partnerships, mergers and acquisitions. His specialties include bioinformatics, grid computing, cloud computing, distributed systems, collaborative applications and deploying and scaling web applications. Jason earned a BS and MS in Chemistry and Biochemistry at UCLA and a PhD in Molecular Pathology at The Burnham Institute and UCSD School of Medicine. Jason taught at the university level for more than a decade, is a prolific speaker and author and a highly-regarded expert on scalable systems. He serves as the Outside Director of the WordPress Foundation, frequently blogs at Joyeur, and most often can be found on Twitter when not flying across an ocean in an aisle seat. He lives in San Francisco with his wife and daughters.

Mark Baker | Canonical

Since starting his career at Oracle in the early 90s, Mark has worked at some of the most significant software companies in the last 20 years. In the Oracle Advanced Technologies labs in 1999 he started working with Linux and from that moment on there was no looking back. Mark spent the next 10 years working for leading Open Source companies Red Hat and then MySQL. With the market for IaaS and Big Data developing, Mark joined Canonical in 2010 as Server Product Manager to help deliver the next iteration of open source innovation. Mark is based in Canonical HQ in London.

Juan L. Negron | Canonical

Juan works closely with new and emerging technologies to facilitate their integration into Ubuntu server and provide detailed documentation that supports Canonical’s Professional Services Architects and Engineers. He also supports business development and pre-sales as part of the team.

Yuri Finkelstein | eBay

Yuri is a senior architect with eBay Platform group. He is involved in design and development of various infrastructure services in eBay platform including NoSQL databases.

Steve Citron-Pousty | Red Hat

Steve is a PaaS Dust Spreader (aka Developer Evangelist) with OpenShift. He goes around and shows off all the great work the OpenShift engineers do. He can teach you about PaaS with MongoDB, and also Java, PostgreSQL, mobile JavaScript, some Android, a little bit of iPhone, and even some Python. He has 11 years of Java programming expertise ranging from data processing and statistical analysis to ORM and web applications. He began doing geospatial work 19 years ago and has done geospatial programming work on multiple platforms using JavaScript, .NET, and Java. Before OpenShift, Steve was a developer evangelist for LinkedIn and deCarta. Steve holds a B.A. from Vassar College, an M.S. from University of Georgia and a Ph. D. in Ecology from University of Connecticut. He likes building interesting applications and helping developers create great solutions.

Jeremy Zawodny | Craiglist

Jeremy Zawodny and Chris Mooney are software engineers at craigslist, working on various back-end infrastructure including databases, email systems, abuse prevention, and search.

Wes Widner | McAfee

Wes Widner is a software architect at McAfee in their Global Threat Intelligence research department.

Monica Wilkinson | Cloud Foundry

Monica Wilkinson has been writing software for over a decade and for a variety of large scale employers such as IBM, MySpace, Facebook and VMware. Monica is an Open Web Standards and Data Portability Advocate and has contributed to many open specifications around Activity Streams and real-time notifications. Today she works as a Developer Advocate at Cloud Foundry where she is directly responsible for the website, documentation, open source gallery and working with developers and partners to enhance the platform. On twitter @ciberch

Montse Medina | Jetlore

Montse is on leave from the Ph.D. program in Computational and Mathematical Engineering at Stanford University where she has been doing research in parallel computing and data mining. She has also previously worked at Oracle in text search. Montse is a co-founder of Jetlore. Jetlore developed technology to categorize short colloquial posts typical of social networks and identify mentioned topics (including products, movies, and general interests). Their algorithms utilize social graph signals and are optimized for colloquial language and minimal textual context.

Greg Brockman | Stripe

Greg Brockman was the first engineering hire at Stripe. He loves building large systems. In his spare time, he is often found building large systems. Prior to Stripe, he was an undergraduate at both Harvard and MIT and interned at Ksplice as well as the National Security Agency.

Ryan Gyer | RightScale

Ryan is a Sales Engineer at RightScale, and a champion for cloud computing technology for small to large scale computing environments. He has a proven track record of quickly adapting to new technologies and tools while applying fundamental best practices, with strong applied knowledge of the server software and hardware used to host web applications (MySQL, Linux, Apache, etc).

Omer Gertel | Offscale Co-Founder & CTO During his 6 years in an elite intelligence unit in the IDF, Omer has developed data mining solutions over very large databases, and was head of an R&D team specializing in data mining and text classification. Omer has a B.Sc. in Computer Science (Talpiot program) from the Hebrew University of Jerusalem and an MBA at Tel-Aviv University.

Todd Dampier | MongoLab

Todd Dampier, CTO As a gearhead and aesthete, Todd took an early interest in massively parallel processing. At MIT, he helped create the runtime system for the 1024-node J-Machine and a compiler to optimize multidimensional data transforms for a parallel target. After various valley startups Todd most recently served as Chief Architect at Merced Systems, helping to create and shape its enterprise platform for nearly a decade. His work at Merced yielded four patents in multi-dimensional analytics and reporting. Now thoroughly enchanted by the document databases he wishes he had a decade ago, Todd believes MongoDB unlocks a new class of software solutions and that the mission of MongoLab is to bring those possibilities to developers the world over. He holds B.S. and M.Eng. degrees in Computer Science from MIT, with a minor in Comparative Literature.

Sebastian Stadil | Scalr

Sebastian Stadil launched Scalr to offer open source-based cloud computing management software. He also founded the Silicon Valley Cloud Computing Group, a user group of over 6000 members that meets monthly to present the latest developments in the industry.

Roger Bodamer | Analytica

Roger co-founded Analytica to build native analytics tools that take advantage of document databases. Prior to founding Analytica, Roger incubated the West Coast Office for 10gen. He's got deep expertise and knowledge of database architectures and internals. His experience leading product development and engineering teams includes 12 year with Oracle's database and Application Server development organization where he pioneered products that delivered distributed and heterogeneous interoperability, as well as several years as COO/SVP of Apple's PowerSchool division. In his spare time.. well he doesn't have any spare time :)