Last month, 10gen announced its sponsorship for hackNY, the non-profit aiming to federate the next generation of hackers for New York City. We’ve been longtime supporters of hackNY and were excited to present the founders, Evan Korth and Chris Wiggins, with a donation of $75,000 at MongoNYC.
At the conference, Chris and Evan sat for a brief interview, where they talked about how technology is transforming the key industries in New York City, from media, advertising, publishing, finance and beyond. They explained how hackNY’s model, which organizes student hackathons and summer fellowship programs, give students practical, hands-on experience with programming that they don’t receive at university. Open source technologies like MongoDB are a great fit for hackathons since they enable developers to rapidly prototype, with the knowledge that they can scale their applications.
Libbson is a new shared library written in C for developers wanting to work with the BSON serialization format. Its API will feel natural to C programmers but can also be used as the base of a C extension in higher-level MongoDB drivers. The library contains everything you would expect from a BSON implementation. It has the ability to work with documents in their serialized form, iterating elements within a document, overwriting fields in place, Object Id generation, JSON conversion, data validation, and more. Some lessons were learned along the way that are beneficial for those choosing to implement BSON themselves. Improving small document performance A common use case of BSON is for relatively small documents. This has a profound impact on the memory allocator in userspace, causing what is commonly known as “memory fragmentation". Memory fragmentation can make it more difficult for your allocator to locate a contiguous region of memory. In addition to increasing allocation latency, it increases the memory requirements of your application to overcome that fragmentation. To help with this issue, the bson_t structure contains 120 bytes of inline space that allows BSON documents to be built directly on the stack as opposed to the heap. When the document size grows past 120 bytes it will automatically migrate to a heap allocation. Additionally, bson_t will grow it’s buffers in powers of two. This is standard when working with buffers and arrays as it amortizes the overhead of growing the buffer versus calling realloc() every time data is appended. 120 bytes was chosen to align bson_t to the size of two sequential cachelines on x86_64 (each 64 bytes). This may change based on future research, but not before a stable ABI has been reached. Single allocation for nested documents One strength of BSON is it’s ability to nest objects and arrays. Often times when serializing these nested documents, each sub-document is serialized independently and then appended to the parents buffer. As you might imagine, this takes quite the toll on the allocator. It can generate many small allocations which were only created to have been immediately discarded after appending to the parents buffer. Libbson allows for building sub-documents directly into the parent documents buffer. Doing so helps avoid this costly fragmentation. The topmost document will grow its underlying buffers in powers of two each time the allocation would overflow. Parsing BSON documents from network buffers Another common area for allocator fragmentation is during BSON document parsing. Libbson allows parsing and iteration of BSON documents directly from your incoming network buffer. This means the only allocations created are those needed for your higher level language such as a PyDict if writing a Python extension. Developers writing C extensions for their driver may choose to implement a “generator" style parsing of documents to help keep memory fragmentation low. A technique we’re yet to explore is implementing a hashtable-esque structure backed by BSON, only deserializing the entire buffer after a threshold of keys have been accessed. Generating BSON documents into network buffers Much like parsing BSON documents, generating documents and placing them into your network buffers can be hard on your memory allocator. To help keep this fragmentation down, Libbson provides support for serializing your document to BSON directly within a buffer of your choosing. This is ideal for situations such as writing a sequence of BSON documents into a MongoDB message. Generating Object Ids without Synchronization Applications are often doing ObjectId generation, especially in high insert environments. The uniqueness of generated ObjectIds is critical to avoiding duplicate key errors across multiple nodes. Highly threaded environments create a local contention point slowing the rate of generation. This is because the threads must synchronize on the increment counter of each sequential ObjectId. Failure to do so could cause collisions that would not be detected until after a network round-trip. Most drivers implement the synchronization with an atomic increment or a mutex if atomics are not available. Libbson will use atomic increments and in some cases avoid synchronization altogether if possible. One such case is a non-threaded environment. Another is when running on Linux as both threads and processes are in the same namespace. This allows the use of the thread identifier as the pid within the ObjectId. You can find Libbson at https://github.com/mongodb/libbson and discuss design choices with its author, Christian Hergert, who can be found on twitter as @hergertme .
4 Ways MongoDB Enhances Your Google BigQuery Experience
MongoDB and Google Cloud continue to build on their partnership, with MongoDB enhancing Google Cloud with pay-as-you-go abilities, unified billing, and integrations with multiple different GC features, including BigQuery . And, when it comes to data architecture, BigQuery and MongoDB are two products that are better together. Google BigQuery and MongoDB are better together Google’s serverless data warehouse, BigQuery, was launched in 2011 with an aim to enhance business agility as their cloud-native data warehouse. BigQuery allows for fast queries that can uncover insights using familiar SQL. When MongoDB is added to the database technology stack as a complementary technology, it enhances the breadth of capabilities for the developer across a variety of use cases, including the following four examples. Combined impact of the Enterprise Data Warehouse and the Operational Data Store BigQuery is best suited as an Enterprise Data Warehouse (EDW), meaning it is designed to optimize long-running analytics. MongoDB Atlas , on the other hand, is best suited as an Operational Data Store (ODS), designed to optimally support high throughput and highly concurrent real-time operational applications that demand random access to an entity’s data in native JSON. This combination means that BigQuery and MongoDB are complementary technologies that can jointly deliver more value — each delivering on their strongest qualities. BigQuery excels at long-running queries, while Atlas handles the real-time operational application needs with thousands of concurrent sessions and millisecond response times. Enriched end-customer experiences BigQuery enables data scientists and analysts with machine learning (ML) models and BI tools for structured and semi-structured data at scale. For roles that need results with a turnaround time of a day or more, BigQuery is a strong tool for big data queries. With MongoDB Atlas, engineers and development teams can build applications faster and handle highly diverse schema, query, and update patterns, adapting to demanding user needs and competition. Atlas can also deliver the real-time or less than 24-hour queries that are necessary to keep your business operational. Additionally, data can easily move back and forth between the two platforms, creating a prime combination for running analytics on operational data. Being able to unlock the full potential of your data across your organization means that everyone has the insight into the business metrics they need, when they need it. This allows quicker decision making, as well as stronger and more accurate reporting. Extensibility to MongoDB Atlas features On top of the value and synergy that can be realized by a BigQuery+Atlas combination, other Atlas features can help enhance the usefulness and sophistication of a data architecture, such as: Atlas Charts can be leveraged to create rich visualizations of any data stored within Atlas. Atlas Triggers and Alerts can apply database logic in response to events or on a predefined schedule. Atlas Search brings full-text search at scale to all data across MongoDB and BigQuery alike. Atlas Data Federation enables aggregating data across multiple data sources, such as Atlas clusters and HTTPS endpoints, and transforming it into analytical formats (e.g., Parquet). This means you can not only access data in real-time, but you can also analyze it in a visual, user-friendly way. This functionality makes your data more actionable, allowing you not only to answer questions about your business data but also make better predictions and future adjustments based on it. Furthermore, being alerted to certain data-based events and triggering new actions based on that information means you can have your data working more efficiently for you, freeing up time to innovate and focus on core business competencies. Lastly, this approach simplifies your data lifecycle, so JSON data from various applications and endpoints can easily be transformed and consumed for rich analytics. Deeper understanding of your customer Businesses can use fully managed MongoDB Atlas to store customer 360 profiles. A 360-degree view of a customer allows businesses to track an individual customer’s journey across multiple channels, devices, purchases, and interactions, and improves customer satisfaction. With the combination of Atlas and BigQuery, businesses can also use compiled data — such as, transactional data, behavioral data, user profile and segmentations, and business analytics — to match user profiles with products and services using Artificial Intelligence (AI). Vertex AI , a managed machine learning platform, provides all the Google cloud services in one place to deploy and maintain AI models. Being able to easily access a 360 view for each customer and have automation around their customer journey helps with customer engagement and loyalty by improving customer satisfaction and retention through personalization and targeted marketing communications. It also enables retailers to aggregate customer interactions across all channels and identify valuable new customers. Google BigQuery and MongoDB Atlas in the real world Current , a leading U.S. challenger bank, uses innovative approaches, services, and technologies to serve people overlooked by traditional banks, regardless of age or income level, to help improve their financial outcomes. To help create customer experiences that cannot exist in traditional systems, Current chose to leverage Google Cloud, including BigQuery, with MongoDB layering the platform to achieve their goals. Read Full Current Story Are you a Google BiqQuery customer that is curious about how MongoDB Atlas can amplify your existing data warehouse or data lake architecture? Try MongoDB Atlas for free today and spin up your first workload in minutes. Try pay-as-you-go Atlas on GC Marketplace