Get to MongoDB London on Us: Show Us Your MUG
October 22, 2014 | Updated: May 22, 2015
MongoDB is coming to London and we want to see you there. Tickets are going fast, but we’re putting two free day-passes up for grabs - all you need to do is share your very best mug shot online!
By now, most of you should have a MongoDB mug. If not, where have you been the past 6 years? We've shipped mugs around the world but we rarely get to see the Mug owners. To get in the draw to win two free tickets to MongoDB London, simply snap a photo (alone or with others) with your MongoDB mug(s) and tag it on Twitter with the hashtag #mongodbmug.
Good luck, and look forward to seeing you in November at MongoDB London.
What to Expect at MongoDB London
MongoDB London will not only reveal the why behind using MongoDB in a range of use cases, but crucially a varied roster of presenters will explain the how. MongoDB Co-Founder and CTO, Eliot Horowitz, will open the day and give a roadmap of what's to come in MongoDB 2.8.
At MongoDB London, you will see how you can build applications never before possible in your organization. Gary Collier, CTO at AHL Man Group, will discuss how his team created a single platform for all of its financial data with MongoDB. That project resulted in a 25x improvement in tick throughput and slashed the required disk storage by 40 percent. Additionally, James Stewart, will describe the process of digital transformation in the UK Government and Robert Hill, at Capgemini, will be showcase building a real-time view of your business to integrate all of your siloed data--also called the Single View of the Customer.
Alongside a number of user presentations, there will be several MongoDB experts giving talks on everything from the Internet of Things to examples of MongoDB best practices in Financial Services. Finally, you will get the chance to sit down with MongoDB experts at our "Ask The Experts" hall to answer your toughest questions about MongoDB.
Don't delay. Snap your MUG and send it over.
Sharding Pitfalls Part III: Chunk Balancing and Collection Limits
In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall , the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page . If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB , then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations . Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.
Security in Government Solutions: Why Secure By Default is Essential
Data security in government agencies is table stakes at this point. Everyone knows it’s essential, both for compliance and data protection purposes. However, most government agencies are working with solutions that require frequent security patches or built-on tools to protect their data. Today, the federal government is pushing its agencies to move to modernize their solutions and improve their security posture. For example, the DHS and Cybersecurity and Infrastructure Security Agency’s recently issued technical rule for modernization of the Protected Critical Information Infrastructure program – a program that provides legal protections for cyber and physical infrastructure information submitted to DHS. “The PCII Program is essential to CISA’s ability to gather information about risks facing critical infrastructure,” said Dr. David Mussington, Executive Assistant Director for Infrastructure Security. “This technical rule modernizes and clarifies important aspects of the Program, making it easier for our partners to share information with DHS. These revisions further demonstrate our commitment to ensuring that sensitive, proprietary information shared with CISA remains secure and protected.” So how can government agencies modernize their data infrastructure and find solutions that not only protect data but also power innovation? Let’s look into a few different strategies. 1. Why secure by default is key Secure by default means that any piece of software uses default security settings that are configured for the highest possible security out of the box. CISA Director Jen Easterly has addressed how using solutions that are secure by default is critical for any organization. “We have to have [multi-factor authentication] by default. We can't charge extra for security logging and [single sign-on],” Easterly said . “We need to ensure that we're coming together to really protect the technology ecosystem instead of putting the burden on those least able to defend themselves.” “The American people have accepted the fact that they’re constantly going to have to update their software,” she said. “The burden is placed on you as the user and that’s what we have to collectively stop.” Easterly is right. Secure-by-design solutions are vital to the success of data protection. The expectation should alway be that solutions have built-in, not bolt-on security features. One approach that’s gaining traction both in the public and private sectors is zero trust environments. In a zero trust environment, the perimeter is assumed to have been breached. There are no trusted users, and no user or device gains trust simply because of its physical or network location. Every user, device, and connection must be continually verified and audited. As the creator of zero trust, security expert John Kindervag, summed it up: “Never trust, always verify.” For government agencies, that means the underlying database must be secure by default, and it needs to limit users’ opportunities to make it less secure. 2. Security isn't just on-prem anymore; cloud is secure, too Cloud can be a scary word for public sector organizations. Trusting your sensitive data to the cloud might feel risky for those who handle some of the country’s most sensitive data. But, cloud providers are stepping up to meet the security needs of government agencies. There is no need to fear the cloud anymore. Government agencies and other public sector organizations nationwide are navigating cloud modernization through the lens of increased cybersecurity requirements outlined in the 2021 Executive Order on Improving the Nation’s Cybersecurity . “The Federal Government must adopt security best practices; advance toward Zero Trust Architecture; accelerate movement to secure cloud services, including Software as a Service (SaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS); centralize and streamline access to cybersecurity data to drive analytics for identifying and managing cybersecurity risks; and invest in both technology and personnel to match these modernization goals.” Also, the major cloud providers are well established, purpose-built options for government users. AWS GovCloud, for example, is more than a decade old and was “ the first cloud provider to build cloud infrastructure specifically designed to meet U.S. government security and compliance needs.” This push by the federal government toward cloud modernization and increased cybersecurity will be a catalyst in upcoming years for rapid cloud adoption and greater dependence on cloud solutions designed specifically for government users. 3. Security features purpose-built for goverment needs are essential Government agencies are held to a higher standard than those in the private sector. From data used in sometimes life-or-death missions to data for students building their futures in educational institutions (and everything in between), security has real-world consequences. Today, security is non-negotiable and like we explored above, it’s especially crucial that public sector entities have built-in security measures to keep data protected. So, what built-in features should you look for? Network isolation and access It’s critical that your data and underlying systems are fully isolated from other organizations using the same cloud provider. Database resources should be associated with a user group, which is contained in its own Virtual Private Cloud (VPC), and access should be granted by IP access lists, VPC peering, or private endpoints. Encyption in flight, at rest, and in use Encryption should be the standard. For example, when using MongoDB Atlas, all network traffic is encrypted using Transport Layer Security (TLS). Encryption for data at rest is automated using encrypted storage volumes. Customers can use field-level encryption to encrypt sensitive workloads which enables you to encrypt data in your application before you send it over the network to MongoDB clusters. Users can bring their own encryption keys for an additional level of control. Granular database auditing Granular database auditing allows administrators to answer detailed questions about systems activity by tracking all commands against the database. This ensures you always know who has access to what data and how they’re using it. Multi-factor authentication User credentials should always be stored using industry-standard and audited one-way hashing mechanisms, with multi-factor authentication options including SMS, voice call, a multi-factor app, or a multi-factor device, ensuring only approved users have access to your data. MongoDB Atlas for Government: Purpose-built for public sector As we’ve discussed, solutions that are purpose-built with built-in security are ideal for government agencies, and choosing the right one is the best way to keep sensitive data protected. MongoDB Atlas for Government on AWS GovCloud recently secured its FedRAMP Moderate authorization thanks to these security measures built into the solution. FedRAMP is a government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services. To ensure the utmost levels of security, Atlas for Government is an independent, dedicated environment for the U.S. public sector, as well as ISVs looking to build U.S. public sector offerings. Public Sector organizations carry a heavy burden when it comes to keeping data protected. However, with the right data platform underpinning modern applications – a platform with built-in security features – progress doesn’t mean you have to compromise on security. Want to learn more about data protection best practices for public sector organizations? Attend our upcoming webinar on April 12 for deeper insight .