MongoDB is coming to London and we want to see you there. Tickets are going fast, but we’re putting two free day-passes up for grabs - all you need to do is share your very best mug shot online!
By now, most of you should have a MongoDB mug. If not, where have you been the past 6 years? We've shipped mugs around the world but we rarely get to see the Mug owners. To get in the draw to win two free tickets to MongoDB London, simply snap a photo (alone or with others) with your MongoDB mug(s) and tag it on Twitter with the hashtag #mongodbmug.
Good luck, and look forward to seeing you in November at MongoDB London.
What to Expect at MongoDB London
MongoDB London will not only reveal the why behind using MongoDB in a range of use cases, but crucially a varied roster of presenters will explain the how. MongoDB Co-Founder and CTO, Eliot Horowitz, will open the day and give a roadmap of what's to come in MongoDB 2.8.
At MongoDB London, you will see how you can build applications never before possible in your organization. Gary Collier, CTO at AHL Man Group, will discuss how his team created a single platform for all of its financial data with MongoDB. That project resulted in a 25x improvement in tick throughput and slashed the required disk storage by 40 percent. Additionally, James Stewart, will describe the process of digital transformation in the UK Government and Robert Hill, at Capgemini, will be showcase building a real-time view of your business to integrate all of your siloed data--also called the Single View of the Customer.
Alongside a number of user presentations, there will be several MongoDB experts giving talks on everything from the Internet of Things to examples of MongoDB best practices in Financial Services. Finally, you will get the chance to sit down with MongoDB experts at our "Ask The Experts" hall to answer your toughest questions about MongoDB.
Don't delay. Snap your MUG and send it over.
Sharding Pitfalls Part III: Chunk Balancing and Collection Limits
In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall , the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page . If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB , then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations . Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.
Hear From the MongoDB World 2022 Diversity Scholars
The MongoDB Diversity Scholarship program is an initiative to elevate and support members of underrepresented groups in technology across the globe. Scholars receive complimentary access to the MongoDB World developer conference in New York, on-demand access to MongoDB University to prepare for free MongoDB certification, and mentorship via an exclusive discussion group. This year at MongoDB World, our newest cohort of scholars got the opportunity to interact with company leadership at a luncheon and also got a chance to share their experience in a public panel discussion at the Community Café. Hear from some of the 2022 scholars, in their own words. Rebecca Hayes, System Analyst at Alliance for Safety and Justice I did an internal transition from managing Grants/Contracts to IT and just finished a data science certificate (Python, Unix/Linux, SQL) through my community college. My inspiration for pursuing STEM was wanting to understand how reality is represented in systems and how data science can be used to change the world. What was your most impactful experience as part of the Diversity Scholarship? Most impactful were the conversations I had with other attendees at the conference. I talked to people from all sectors who were extremely knowledgeable and passionate about shaping the future of databases. The opportunity to hear from MongoDB leaders and then understand how the vision behind the product was being implemented made me feel inspired for my future in STEM. How has the MongoDB World conference inspired you in your learning or your career path? MongoDB World inspired me to understand the real world applications of databases. I left knowing what's possible with a product like MongoDB and the limits of SQL and traditional databases. After the conference, I wrote this article on Medium reflecting on what I learned at the conference. What is your advice to colleagues pursuing STEM and/or on a similar path as you? Embrace what makes you unique. Just because things take time doesn't mean they won't happen. When learning programming and data science, think about how your work relates to the real world and share those thoughts with others. Seek out new perspectives, stay true to yourself, and keep an open mind. Delphine Nyaboke, Junior Software Engineer at Sendy I am passionate about energy in general. My final year project was on solar mini-grid design and interconnection. I have a mission of being at the intersection of energy and AI What inspired me to get into tech is the ability to solve societal problems without necessarily waiting for someone else to do it for you. This can be either in energy or by code. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience, apart from attending and listening in on the keynotes, was to attend the breakout sessions. They had lovely topics full of learnings and inspiration, including Engineering Culture at MongoDB; Be a Community Leader; Principles of Data Modeling for MongoDB; and Be Nice, But Not Too Nice just to mention but a few. How has the MongoDB World conference inspired you in your learning or your career path? MongoDB World has inspired me to keep on upskilling and being competitive in handling databases, which is a key skill in a backend engineer like myself. I will continue taking advantage of the MongoDB University courses and on-demand courses available thanks to the scholarship. What is your advice to colleagues pursuing STEM and/or on a similar path as you? STEM is a challenging yet fun field. If you’re tenacious enough, the rewards will trickle in soon enough. Get a community to be around, discuss what you’re going through together, be a mentor, get a mentor, and keep pushing forward. We need like-minded individuals in our society even in this fourth industrial revolution, and we are not leaving anyone behind. Video: Watch the panel in its entirety Raja Adil, Student at Cal Poly SLO Currently, I am a software engineer intern at Salesforce. I started self-teaching myself software development when I was a junior in high school during the COVID-19 pandemic, and from there I started doing projects and gaining as much technical experience as I could through internships. Before the pandemic I took my first computer science class, which was taught in C#. At first, I hated it as it looked complex. Slowly, I started to enjoy it more and more, and during the pandemic I started learning Python on my own. I feel blessed to have found my path early in my career. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience was the network and friends I made throughout the four days I was in New York for MongoDB World. I also learned a lot about the power of MongoDB, as opposed to relational databases, which I often use in my projects. How has the MongoDB World conference inspired you in your learning or your career path? The MongoDB World conference was amazing and has inspired me a ton in my learning path. I definitely want to learn even more about MongoDB as a database, and in terms of a career path, I would love to intern at MongoDB as a software engineer down the line. What is your advice to colleagues pursuing STEM and/or on a similar path as you? My advice would be to network as much as you can and simply make cool projects that others can use. Evans Asuboah, Stetson University I am an international student from Ghana. I was born and raised by my dad, who is a cocoa farmer, and my mum, who is a teacher. I got into tech miraculously, because my country's educational system matches majors to students according to their final high school grades. Initially, I wanted to do medicine, but I was offered computer science. I realized that computer science could actually be the tool to help my community and also use the knowledge to help my dad on the farm. What was your most impactful experience as part of the Diversity Scholarship? The breakout room sessions. As scholars, we had the chance to talk to MongoDB employees, and the knowledge and experiences changed my thoughts and increased my desire to persevere. I have learned never to stop learning and not to give up. How has the MongoDB World conference inspired you in your learning or your career path? Meeting these amazing people, connecting with the scholars, being at the workshops, and talking to the startups at the booths has made me realize the sky is the limit. I dare to dream and believe until I see the results. What is your advice to colleagues pursuing STEM and/or on a similar path as you? 1. Explore MongoDB; 2. You are the only one between you and your dream; 3. Take the initiative and meet people; 4. Never stop learning. Daniel Erbynn, Drexel University I love traveling and exploring new places. I am originally from Ghana, and I got the opportunity to participate in a summer program after high school called Project ISWEST, which introduced me to coding and computer science through building a pong game and building an Arduino circuit to program traffic lights. This made me excited about programming and the possibilities of solving problems in the tech space. What was your most impactful experience as part of the Diversity Scholarship? My most impactful experience was meeting with other students and professionals in the industry, learning from them, making lifelong connections, and getting the opportunity to learn about MongoDB through the MongoDB University courses. How has the MongoDB World conference inspired you in your learning or your career path? This conference has inspired me to learn more about MongoDB and seek more knowledge about cloud technology. What is your advice to colleagues pursuing STEM and/or on a similar path as you? Don’t be afraid to reach out to people you want to learn from, and create projects you are passionate about. Build your skills with MongoDB University's free courses and certifications . Join our developer community to stay up-to-date with the latest information and announcements.