Last year, Craigslist moved their archive to MongoDB from MySQL. After the initial set up, we spoke with Jeremy Zawodny, software engineer at Craigslist and the author of High Performance MySQL (O'Reilly), and asked him some questions about their cluster. In advance of their talk at MongoSF tomorrow, we caught up with Jeremy to get the scoop on what’s happening at Craigslist one year later.
Last time we spoke you were building a MongoDB store for 5 Billion Documents. What do your numbers look like now?
We’re currently approaching the 3 billion mark. The 5 billion number was our target capacity when building the system. Back then we had about 2.5 billion documents that we migrated into MongoDB, and we’ve continued to add documents ever since then.
Can you share an anecdote on the benefits of replica sets/sharding and something you’d like to change/improve in that feature set?
The sharding has made it easy for handling growth. We know that when the day comes, we can add an additional replica set to our cluster and it will help ease any space crunch. The replica sets have been great for handling machine failures. We’ve had several machines lock-up on us and require unplanned reboots or service. Throughout that time, the worst thing we’ve seen is some read-only time for the cluster metadata (when a config server dropped) but we’ve been able to serve requests without stopping.
Can you share some anecdotes about how your team adjusted to working with MongoDB?
There was a bit of adjustment that our systems administration team performed to the original deployment and configuration to make it better mesh with our home-grown management and deployment tools. But other than that, MongoDB has been pretty hands-off for most of the team. As long as it behaves well (which it does), we don’t need to touch it that often.
Any exciting plans for your MongoDB clusters?
We’ve been testing MongoDB in a few new roles at Craigslist and plan to present some of those challenges at MongoSF on May 4th.
Thanks to Jeremy for giving us some insight into how MongoDB powers Craigslist!
MongoDB: Powering the Magic and the Monsters at Stripe
Update: Watch the video of Greg Brockman’s talk on MongoDB for High Availability at MongoSF ‘12 Stripe offers a simple platform for developers to accept online payments. They are a long-time user of MongoDB and have built a powerful and flexible system for enabling transactions on the web. In advance of their talk at MongoSF on MongoDB for high availability, Stripe’s engineer, Greg Brockman spoke with us about what’s going on with MongoDB at Stripe. Stripe has a heavy write load with large query volumes. Can you give us some insight into your tips and tricks for wrangling with MongoDB’s replica sets on your system? Getting replica sets up and running is actually incredibly easy. I used to run MySQL clusters where configuring and maintaining replication was a pain, and it was a joy to just be able to run “rs.add(node)” and watch it join the cluster. In order to avoid losing any operations even if we lose our database primary, we structure our application such that all writes are idempotent. We then wrap our calls to the MongoDB driver in a retry block. If the call fails because our MongoDB cluster is currently reconfiguring, we try the operation again (with the usual backoff and timeout you’d expect from a scheme like this). We’re very careful to avoid operations which could result in evicting hot data from the cache. Running unindexed queries is an obvious example of this, but we’ve also found that running a large multi-update can have production impact. So when we need to change our schema for an entire collection of documents, we’ll usually run a slower (but non-impactful) document-by-document migration at the application level. Let’s take a step back to your past talk at MongoSV '11 – what are you doing with Monster (Stripe’s native events processing system for payments)? Monster is our framework for event production and event consumption, which uses MongoDB as a highly-available, persistent queue. With Monster, our engineers can start logging a new type of event with only a few lines of code, and at any time in the future can add a consumer that will automatically be passed relevant events (possibly even historical ones). We use it for a variety of purposes: structured logging, incremental updating of state (such as people’s graphs of payment volume), and background jobs. Lots of people are innovating in the financial space – in particular building APIs for mobile payments. For those just starting up, why should they use MongoDB? As a payments processor, our uptime is incredibly important. We were initially drawn to MongoDB because replica sets make it incredibly easy to run your database in a highly-available fashion. I came from a world where my database master could never be rebooted, since there was no zero-downtime failover strategy even for routine maintenance – MongoDB gives you this almost out of the box. MongoDB also makes it easy to do zero-downtime migrations, with features such as background index builds and allowing multiple schemas in a single collection. Anyone caring about their availability should look very hard at MongoDB. How are you guys using the Ruby driver in your system? Anything interesting? We’ve banged on the Ruby driver in a variety of configurations, ensuring that it behaves properly when exposed to all the possible failures we can imagine (or have noticed) our database servers experiencing. These days, we’re very happy with how robust the Ruby driver is against the wide variety of failure modes of the distributed MongoDB nodes. What’s your wish list for the Ruby Driver? I wish there were a configuration option for forcing reads from a secondary. (Right now, you can request that reads be on a secondary if one is available, but they’ll start reading from the primary if no secondary is available.) What’s on stripe’s engineering roadmap? While making Stripe available outside the US is our top priority, our biggest engineering challenges at the moment are scaling our systems to keep up with the phenomenal growth we’ve been experiencing. Many thanks to Greg for taking the time to tell a bit about the magic at Stripe.
How DataSwitch And MongoDB Atlas Can Help Modernize Your Legacy Workloads
Data modernization is here to stay, and DataSwitch and MongoDB are leading the way forward. Research strongly indicates that the future of the Database Management System (DBMS) market is in the cloud, and the ideal way to shift from an outdated, legacy DBMS to a modern, cloud-friendly data warehouse is through data modernization. There are a few key factors driving this shift. Increasingly, companies need to store and manage unstructured data in a cloud-enabled system, as opposed to a legacy DBMS which is only designed for structured data. Moreover, the amount of data generated by a business is increasing at a rate of 55% to 65% every year and the majority of it is unstructured. A modernized database that can improve data quality and availability provides tremendous benefits in performance, scalability, and cost optimization. It also provides a foundation for improving business value through informed decision-making. Additionally, cloud-enabled databases support greater agility so you can upgrade current applications and build new ones faster to meet customer demand. Gartner predicts that by 2022, 75% of all databases will be on the cloud – either by direct deployment or through data migration and modernization. But research shows that over 40% of migration projects fail. This is due to challenges such as: Inadequate knowledge of legacy applications and their data design Complexity of code and design from different legacy applications Lack of automation tools for transforming from legacy data processing to cloud-friendly data and processes It is essential to harness a strategic approach and choose the right partner for your data modernization journey. We’re here to help you do just that. Why MongoDB? MongoDB is built for modern application developers and for the cloud era. As a general purpose, document-based, distributed database, it facilitates high productivity and can handle huge volumes of data. The document database stores data in JSON-like documents and is built on a scale-out architecture that is optimal for any kind of developer who builds scalable applications through agile methodologies. Ultimately, MongoDB fosters business agility, scalability and innovation. Key MongoDB advantages include: Rich JSON Documents Powerful query language Multi-cloud data distribution Security of sensitive data Quick storage and retrieval of data Capacity for huge volumes of data and traffic Design supports greater developer productivity Extremely reliable for mission-critical workloads Architected for optimal performance and efficiency Key advantages of MongoDB Atlas , MongoDB’s hosted database as a service, include: Multi-cloud data distribution Secure for sensitive data Designed for developer productivity Reliable for mission critical workloads Built for optimal performance Managed for operational efficiency To be clear, JSON documents are the most productive way to work with data as they support nested objects and arrays as values. They also support schemas that are flexible and dynamic. MongoDB’s powerful query language enables sorting and filtering of any field, regardless of how nested it is in a document. Moreover, it provides support for aggregations as well as modern use cases including graph search, geo-based search and text search. Queries are in JSON and are easy to compose. MongoDB provides support for joins in queries. MongoDB supports two types of relationships with the ability to reference and embed. It has all the power of a relational database and much, much more. Companies of all sizes can use MongoDB as it successfully operates on a large and mature platform ecosystem. Developers enjoy a great user experience with the ability to provision MongoDB Atlas clusters and commence coding instantly. A global community of developers and consultants makes it easy to get the help you need, if and when you need it. In addition, MongoDB supports all major languages and provides enterprise-grade support. Why DataSwitch as a partner for MongoDB? Automated schema re-design, data migration & code conversion DataSwitch is a trusted partner for cost-effective, accelerated solutions for digital data transformation, migration and modernization through a modern database platform. Our no-code and low-code solutions along with cloud data expertise and unique, automated schema generation accelerates time to market. We provide end-to-end data, schema and process migration with automated replatforming and refactoring, thereby delivering: 50% faster time to market 60% reduction in total cost of delivery Assured quality with built-in best practices, guidelines and accuracy Data modernization: How “DataSwitch Migrate” helps you migrate from RDBMS to MongoDB DataSwitch Migrate (“DS Migrate”) is a no-code and low-code toolkit that leverages advanced automation to provide intuitive, predictive and self-serviceable schema redesign from a traditional RDBMS model to MongoDB’s Document Model with built-in best practices. Based on data volume, performance, and criticality, DS Migrate automatically recommends the appropriate ETTL (Extract, Transfer, Transform & Load) data migration process. DataSwitch delivers data engineering solutions and transformations in half the timeframe of the existing typical data modernization solutions. Consider these key areas: Schema redesign – construct a new framework for data management. DS Migrate provides automated data migration and transformation based on your redesigned schema, as well as no-touch code conversion from legacy data scripts to MongoDB Atlas APIs. Users can simply drag and drop the schema for redesign and the platform converts it to a document-based JSON structure by applying MongoDB modeling best practices. The platform then automatically migrates data to the new, re-designed JSON structure. It also converts the legacy database script for MongoDB. This automated, user-friendly data migration is faster than anything you’ve ever seen. Here’s a look at how the schema designer works. Refactoring – change the data structure to match the new schema. DS Migrate handles this through auto code generation for migrating the data. This is far beyond a mere lift and shift. DataSwitch takes care of refactoring and replatforming (moving from the legacy platform to MongoDB) automatically. It is a game-changing unique capability to perform all these tasks within a single platform. Security – mask and tokenize data while moving the data from on-premise to the cloud. As the data is moving to a potentially public cloud, you must keep it secure. DataSwitch’s tool has the capability to configure and apply security measures automatically while migrating the data. Data Quality – ensure that data is clean, complete, trustworthy, consistent. DataSwitch allows you to configure your own quality rules and automatically apply them during data migration. In summary: first, the DataSwitch tool automatically extracts the data from an existing database, like Oracle. It then exports the data and stores it locally before zipping and transferring it to the cloud. Next, DataSwitch transforms the data by altering the data structure to match the re-designed schema, and applying data security measures during the transform step. Lastly, DS Migrate loads the data and processes it into MongoDB in its entirety. Process Conversion Process conversion, where scripts and process logic are migrated from legacy DBMS to a modern DBMS, is made easier thanks to a high degree of automation. Minimal coding and manual intervention are required and the journey is accelerated. It involves: DML – Data Manipulation Language CRUD – typical application functionality (Create, Read, Update & Delete) Converting to the equivalent of MongoDB Atlas API Degree of automation DataSwitch provides during Migration Schema Migration Activities DS Automation Capabilities Application Data Usage Analysis 70% 3NF to NoSQL Schema Recommendation 60% Schema Re-Design Self Services 50% Predictive Data Mapping 60% Process Migration Activities DS Automation Capabilities CRUD based SQL conversion (Oracle, MySQL, SQLServer, Teradata, DB2) to MongoDB API 70% Data Migration Activities DS Automation Capabilities Migration Script Creation 90% Historical Data Migration 90% 2 Catch Load 90% DataSwitch Legacy Modernization as a Service (LMaas): Our consulting expertise combined with the DS Migrate tool allows us to harness the power of the cloud for data transformation of RDBMS legacy data systems to MongoDB. Our solution delivers legacy transformation in half the time frame through pay-per-usage. Key strengths include: ● Data Architecture Consulting ● Data Modernization Assessment and Migration Strategy ● Specialized Modernization Services DS Migrate Architecture Diagram Contact us to learn more.