Last month, 10gen announced its sponsorship for hackNY, the non-profit aiming to federate the next generation of hackers for New York City. We’ve been longtime supporters of hackNY and were excited to present the founders, Evan Korth and Chris Wiggins, with a donation of $75,000 at MongoNYC.
At the conference, Chris and Evan sat for a brief interview, where they talked about how technology is transforming the key industries in New York City, from media, advertising, publishing, finance and beyond. They explained how hackNY’s model, which organizes student hackathons and summer fellowship programs, give students practical, hands-on experience with programming that they don’t receive at university. Open source technologies like MongoDB are a great fit for hackathons since they enable developers to rapidly prototype, with the knowledge that they can scale their applications.
Libbson is a new shared library written in C for developers wanting to work with the BSON serialization format. Its API will feel natural to C programmers but can also be used as the base of a C extension in higher-level MongoDB drivers. The library contains everything you would expect from a BSON implementation. It has the ability to work with documents in their serialized form, iterating elements within a document, overwriting fields in place, Object Id generation, JSON conversion, data validation, and more. Some lessons were learned along the way that are beneficial for those choosing to implement BSON themselves. Improving small document performance A common use case of BSON is for relatively small documents. This has a profound impact on the memory allocator in userspace, causing what is commonly known as “memory fragmentation". Memory fragmentation can make it more difficult for your allocator to locate a contiguous region of memory. In addition to increasing allocation latency, it increases the memory requirements of your application to overcome that fragmentation. To help with this issue, the bson_t structure contains 120 bytes of inline space that allows BSON documents to be built directly on the stack as opposed to the heap. When the document size grows past 120 bytes it will automatically migrate to a heap allocation. Additionally, bson_t will grow it’s buffers in powers of two. This is standard when working with buffers and arrays as it amortizes the overhead of growing the buffer versus calling realloc() every time data is appended. 120 bytes was chosen to align bson_t to the size of two sequential cachelines on x86_64 (each 64 bytes). This may change based on future research, but not before a stable ABI has been reached. Single allocation for nested documents One strength of BSON is it’s ability to nest objects and arrays. Often times when serializing these nested documents, each sub-document is serialized independently and then appended to the parents buffer. As you might imagine, this takes quite the toll on the allocator. It can generate many small allocations which were only created to have been immediately discarded after appending to the parents buffer. Libbson allows for building sub-documents directly into the parent documents buffer. Doing so helps avoid this costly fragmentation. The topmost document will grow its underlying buffers in powers of two each time the allocation would overflow. Parsing BSON documents from network buffers Another common area for allocator fragmentation is during BSON document parsing. Libbson allows parsing and iteration of BSON documents directly from your incoming network buffer. This means the only allocations created are those needed for your higher level language such as a PyDict if writing a Python extension. Developers writing C extensions for their driver may choose to implement a “generator" style parsing of documents to help keep memory fragmentation low. A technique we’re yet to explore is implementing a hashtable-esque structure backed by BSON, only deserializing the entire buffer after a threshold of keys have been accessed. Generating BSON documents into network buffers Much like parsing BSON documents, generating documents and placing them into your network buffers can be hard on your memory allocator. To help keep this fragmentation down, Libbson provides support for serializing your document to BSON directly within a buffer of your choosing. This is ideal for situations such as writing a sequence of BSON documents into a MongoDB message. Generating Object Ids without Synchronization Applications are often doing ObjectId generation, especially in high insert environments. The uniqueness of generated ObjectIds is critical to avoiding duplicate key errors across multiple nodes. Highly threaded environments create a local contention point slowing the rate of generation. This is because the threads must synchronize on the increment counter of each sequential ObjectId. Failure to do so could cause collisions that would not be detected until after a network round-trip. Most drivers implement the synchronization with an atomic increment or a mutex if atomics are not available. Libbson will use atomic increments and in some cases avoid synchronization altogether if possible. One such case is a non-threaded environment. Another is when running on Linux as both threads and processes are in the same namespace. This allows the use of the thread identifier as the pid within the ObjectId. You can find Libbson at https://github.com/mongodb/libbson and discuss design choices with its author, Christian Hergert, who can be found on twitter as @hergertme .
Hear From the MongoDB World 2022 Diversity Scholars
The MongoDB Diversity Scholarship program is an initiative to elevate and support members of underrepresented groups in technology across the globe. Scholars receive complimentary access to the MongoDB World developer conference in New York, on-demand access to MongoDB University to prepare for free MongoDB certification, and mentorship via an exclusive discussion group. This year at MongoDB World, our newest cohort of scholars got the opportunity to interact with company leadership at a luncheon and also got a chance to share their experience in a public panel discussion at the Community Café. Hear from some of the 2022 scholars, in their own words. Rebecca Hayes, System Analyst at Alliance for Safety and Justice I did an internal transition from managing Grants/Contracts to IT and just finished a data science certificate (Python, Unix/Linux, SQL) through my community college. My inspiration for pursuing STEM was wanting to understand how reality is represented in systems and how data science can be used to change the world. What was your most impactful experience as part of the Diversity Scholarship? Most impactful were the conversations I had with other attendees at the conference. I talked to people from all sectors who were extremely knowledgeable and passionate about shaping the future of databases. The opportunity to hear from MongoDB leaders and then understand how the vision behind the product was being implemented made me feel inspired for my future in STEM. How has the MongoDB World conference inspired you in your learning or your career path? MongoDB World inspired me to understand the real world applications of databases. I left knowing what's possible with a product like MongoDB and the limits of SQL and traditional databases. After the conference, I wrote this article on Medium reflecting on what I learned at the conference. What is your advice to colleagues pursuing STEM and/or on a similar path as you? Embrace what makes you unique. Just because things take time doesn't mean they won't happen. When learning programming and data science, think about how your work relates to the real world and share those thoughts with others. Seek out new perspectives, stay true to yourself, and keep an open mind. Delphine Nyaboke, Junior Software Engineer at Sendy I am passionate about energy in general. My final year project was on solar mini-grid design and interconnection. I have a mission of being at the intersection of energy and AI What inspired me to get into tech is the ability to solve societal problems without necessarily waiting for someone else to do it for you. This can be either in energy or by code. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience, apart from attending and listening in on the keynotes, was to attend the breakout sessions. They had lovely topics full of learnings and inspiration, including Engineering Culture at MongoDB; Be a Community Leader; Principles of Data Modeling for MongoDB; and Be Nice, But Not Too Nice just to mention but a few. How has the MongoDB World conference inspired you in your learning or your career path? MongoDB World has inspired me to keep on upskilling and being competitive in handling databases, which is a key skill in a backend engineer like myself. I will continue taking advantage of the MongoDB University courses and on-demand courses available thanks to the scholarship. What is your advice to colleagues pursuing STEM and/or on a similar path as you? STEM is a challenging yet fun field. If you’re tenacious enough, the rewards will trickle in soon enough. Get a community to be around, discuss what you’re going through together, be a mentor, get a mentor, and keep pushing forward. We need like-minded individuals in our society even in this fourth industrial revolution, and we are not leaving anyone behind. Video: Watch the panel in its entirety Raja Adil, Student at Cal Poly SLO Currently, I am a software engineer intern at Salesforce. I started self-teaching myself software development when I was a junior in high school during the COVID-19 pandemic, and from there I started doing projects and gaining as much technical experience as I could through internships. Before the pandemic I took my first computer science class, which was taught in C#. At first, I hated it as it looked complex. Slowly, I started to enjoy it more and more, and during the pandemic I started learning Python on my own. I feel blessed to have found my path early in my career. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience was the network and friends I made throughout the four days I was in New York for MongoDB World. I also learned a lot about the power of MongoDB, as opposed to relational databases, which I often use in my projects. How has the MongoDB World conference inspired you in your learning or your career path? The MongoDB World conference was amazing and has inspired me a ton in my learning path. I definitely want to learn even more about MongoDB as a database, and in terms of a career path, I would love to intern at MongoDB as a software engineer down the line. What is your advice to colleagues pursuing STEM and/or on a similar path as you? My advice would be to network as much as you can and simply make cool projects that others can use. Evans Asuboah, Stetson University I am an international student from Ghana. I was born and raised by my dad, who is a cocoa farmer, and my mum, who is a teacher. I got into tech miraculously, because my country's educational system matches majors to students according to their final high school grades. Initially, I wanted to do medicine, but I was offered computer science. I realized that computer science could actually be the tool to help my community and also use the knowledge to help my dad on the farm. What was your most impactful experience as part of the Diversity Scholarship? The breakout room sessions. As scholars, we had the chance to talk to MongoDB employees, and the knowledge and experiences changed my thoughts and increased my desire to persevere. I have learned never to stop learning and not to give up. How has the MongoDB World conference inspired you in your learning or your career path? Meeting these amazing people, connecting with the scholars, being at the workshops, and talking to the startups at the booths has made me realize the sky is the limit. I dare to dream and believe until I see the results. What is your advice to colleagues pursuing STEM and/or on a similar path as you? 1. Explore MongoDB; 2. You are the only one between you and your dream; 3. Take the initiative and meet people; 4. Never stop learning. Daniel Erbynn, Drexel University I love traveling and exploring new places. I am originally from Ghana, and I got the opportunity to participate in a summer program after high school called Project ISWEST, which introduced me to coding and computer science through building a pong game and building an Arduino circuit to program traffic lights. This made me excited about programming and the possibilities of solving problems in the tech space. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience was meeting with other students and professionals in the industry, learning from them, making lifelong connections, and getting the opportunity to learn about MongoDB through the MongoDB University courses. How has the MongoDB World conference inspired you in your learning or your career path? This conference has inspired me to learn more about MongoDB and seek more knowledge about cloud technology. What is your advice to colleagues pursuing STEM and/or on a similar path as you? Don’t be afraid to reach out to people you want to learn from, and create projects you are passionate about. Build your skills with MongoDB University's free courses and certifications . Join our developer community to stay up-to-date with the latest information and announcements.