Startups are not for the faint of heart. A friend of mine once told me about the challenges he faced in starting his company. To come up with the capital to manufacture his first batch of units, he mortgaged everything he had, slept under his office desk for four months, and worked 16-hour days tirelessly. Luckily, he was able to get his first batch of units to customers on time, and the market reception was overwhelmingly positive.
Fast forward nine years and my friend’s company is now publicly traded, with over 200 employees. The company recently shipped its 15 millionth device.
Hard work and success stories aside, the first step for entrepreneurs to lift their startup off the ground is getting visibility, generating buzz, and showcasing their product in front of the right audience.
At the MongoDB World Startup Showcase, we’re excited to give entrepreneurs a chance to demo their product in front of venture capitalists, the media, and potential employees. Participants will be invited to give their elevator pitch and will be able to showcase their products.
- Elevator pitch in the Giant Ideas Theatre - attract investors, employees, and media
- Two complimentary tickets to MongoDB World 2016*
- Inclusion in MongoDB World marketing and press campaigns
- Promotion in MongoDB social media campaigns
- One year complimentary subscription to Cloud Manager, the easiest way to run MongoDB**
*Hotel and airfare not included **Does not include backup
Participating companies must be building a product or service, employ less than 100 people, and actively use MongoDB.
- Submit an online application.
- In your responses, make sure to address the following questions:
- Company Description: Give us some information about your company, including company name, product, website URL, videos, etc.
- Target Market: Describe the target market and audience for your product or service. Preferred markets are Data Analytics, Mobile, Machine Learning, and IoT.
- MongoDB Usage: Describe how your company uses MongoDB.
- Team Background: Include some background information about your team.
Applications must be submitted by 5:00 PM(PST) on March 25, 2016.
We are looking forward to hearing from you!
MongoDB Debuts in Gartner’s Magic Quadrant for Data Warehouse & Data Management Solutions for Analytics
Why, you may ask, is MongoDB profiled in a research report dedicated to evaluating key trends and vendors in the data warehousing market? After all, MongoDB is designed to serve operational use-cases, including Internet of Things applications, customer data management, catalog and content management, mobile services and more. In fact, Gartner placed MongoDB as a Leader in its most recent Magic Quadrant for Operational Database Management Systems in recognition of its completeness of vision and ability to execute against requirements in the operational database market. While MongoDB is not a data warehouse, we believe its inclusion within Gartner’s latest DW/DMSA Magic Quadrant [available at no cost to eligible Gartner clients] reflects the growing demand from business users to accelerate speed-to-insight and turn analytics into real-time action. Whether that is to detect fraud during transaction processing, present relevant recommendations to shoppers as they browse an eCommerce store, or alert operators to the impending failure of a critical piece of manufacturing equipment, creating fast, actionable insight is accomplished by embedding real-time analytics into operational processes. Gartner calls this trend Hybrid Transactional/Analytical Processing (HTAP), and it is this specific capability, highlighted by users surveyed in Gartner’s research, that has driven MongoDB’s inclusion into the Magic Quadrant. Not only is this placement a first for MongoDB, it is also a first for Gartner. No other open source, non-relational database has ever been included in the DW/DMSA Magic Quadrant. Augmenting the Data Warehouse: Unlocking Real-Time Analytics Using traditional data warehousing platforms, the flow of data – starting with its acquisition from source systems through to transformation, consolidation, analysis, and reporting – follows a well-defined sequential process, as illustrated in Figure 1. Figure 1: Data Flow in Traditional Analytics Processes Operational data from multiple source systems is integrated into a centralized Enterprise Data Warehouse (EDW) and local data-marts using Extract Transform Load (ETL) processes. Reports and visualizations of the data are then generated by BI tools. This workflow is predicated on a number of assumptions: Predictable Frequency. Data is extracted from source systems at regular intervals – typically measured in days, months and quarters. Static Sources. Data is sourced from controlled, internal systems supporting established and well-defined back-office processes. Fixed Models. Data structures are known and modeled in advance of analysis. This enables the development of a single schema to accommodate data from all of the source systems, but adds significant time to the upfront design. Defined Queries. Questions to be asked of the data (i.e., the analytical queries) are pre-defined. If not all of the query requirements are known upfront, or requirements change, then the schema is modified to accommodate changes. Slow-changing requirements. Rigorous change control is enforced before the introduction of new data sources or reporting requirements. Limited users. The consumers of BI reports and analytics are typically business managers and senior executives. Technology Foundations for Real-Time Analytics This workflow remains incredibly valuable, enabling businesses to run deep, historical analysis to monitor performance and inform business strategy. But it presents a significant “impedance mismatch” to the requirements presented by real time analytics: Eliminate latency. The frequency of data acquisition, processing and analysis must increase from days to seconds or less. Source data needs to be analyzed as it is generated by operational applications in order to provide the speed-to-insight demanded by the business. Moving data through an ETL pipeline to the data warehouse will not work for real time use-cases. Uncontrolled sources. Organizations need to harness data that is generated outside of their own firewalls – from location data, to web clicks, to sensors, to social media. The analytics team has no control over these data sources. Dynamic structures. Much of this data is rapidly changing with polymorphic, semi-structured or unstructured formats that do not map neatly to the fixed schema of traditional relational databases powering most data warehouses. Changing query patterns. It is impossible to predict the types of questions that will be asked of the data. Search, aggregations, geospatial analytics, and machine learning are just some of the tools now available to analysts as they explore new data sets and discover previously undetected trends. ”Big” volume. Data arrives faster, and in quantities that overwhelm traditional data management technologies. It means scaling out databases and analytics across commodity hardware, rather than the scale-up approach typical of most data warehouses. Wide consumption. Analytics now extends well beyond the management suite. Permeating through every part of the organization, analytics now need to be accessible to staff on the shopfloor, and consumed by operational applications to control real-time behavior. MongoDB augments the data warehouse by addressing the challenges above, enabling users to run analytics in real-time directly against their data: Rich data structures with complex attributes comprising text, geospatial data, media, arrays, embedded elements, and other complex types can be easily mapped to MongoDB’s JSON-based document data model. A dynamic schema means that each document (record) does not need to have the same set of fields. Users can adapt the structure of documents just by adding new fields or deleting existing ones, making it very simple to extend and evolve applications by adding new attributes for analysis and reporting. An expressive query language and secondary indexes allow fast and rich access to data, enabling complex analytics and search to be performed in place, without having to move the data to dedicated analytics infrastructure. Auto-shading allows MongoDB to partition and distribute large data sets across clusters of commodity servers in the data center or in the cloud. The latest MongoDB 3.2 release builds on these capabilities with advanced feature sets to enhance analytics: The MongoDB Connector for BI allows analysts, data scientists, and business users to seamlessly explore and visualize multi-structured data stored in MongoDB with industry-standard SQL-based BI and analytics platforms such as Tableau, Business Objects, and more. MongoDB Compass presents a simple-to-use, sophisticated GUI that allows any user to visualize and explore data with ad-hoc queries in just a few clicks – all with zero knowledge of the MongoDB query language. For data governance, document validation allows you to enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Dynamic lookup, new math operators and enhanced search allow richer analytics to be run against live, operational data Putting Real-Time Analytics to Work Some of the world’s largest and most innovative organizations are putting real-time analytics to work, creating operational efficiencies and building competitive advantage: Bosch uses MongoDB at the heart of its IoT Suite. Ingesting real-time telemetry data from millions of vehicles enables auto-manufacturers to deliver predictive maintenance schedules to their customers, and improve product design. The City of Chicago uses MongoDB to pull together millions of data points across its most crucial departments, providing real-time data analysis to city managers so they can better predict and allocate resources, respond quickly to emergencies, regulate traffic flow and uncover trends that would have otherwise been invisible. Media company BuzzFeed uses MongoDB to pinpoint when content is viewed, where it’s shared, and how it’s being consumed by its 400 million monthly website visitors. The system enables BuzzFeed’s employees to analyse, track, and display these metrics to writers and editors. The website of OTTO, Germany’s largest online retailer, generates some 10,000 events per second. Every click and hover of every mouse is stored in MongoDB, and real-time data analytics is used to provide unique and personalised web experiences to individual visitors. Hadoop and Spark: Building the Complete Data Analytics Platform Of course, its not just real-time analytics that is driving innovation in the data warehouse world – Apache Hadoop has emerged as a key part of the data management landscape. Some assumed Hadoop would replace the enterprise data warehouse, but that prediction was wrong. In fact, Hadoop is augmenting the data warehouse, in many cases, off-loading data and specific data transformation workloads from existing data warehouses to less-expensive commodity hardware in scale-out environments. Many organizations are harnessing Hadoop and MongoDB together using the MongoDB Connector for Hadoop, providing the ability to use MongoDB as an input source and an output destination for MapReduce, Spark, HIVE and Pig jobs. With this combination, users can create complete analytics and data management platforms: MongoDB powers the online, real time operational application, serving business processes and end-users Hadoop consumes data from MongoDB, blending its with data from other operational systems to fuel sophisticated analytics and machine learning. Results are loaded back to MongoDB to serve smarter operational processes. For example, Ebay handles user data and metadata management for its product catalog in MongoDB, and Hadoop for user analysis to provide personalized search & recommendations. Orbitz uses MongoDB for the management of hotel data and pricing, with Hadoop powering hotel segmentation to support building search facets. Pearson manages student identity and access control along with content management of course materials in MongoDB, and Hadoop for student analytics to create adaptive learning programs. The Rise of Spark No analytics discussion is complete without reference to Apache Spark – it has become one of the fastest growing Apache Software Foundation projects. With its memory-oriented architecture, flexible processing systems, and easy-to-use APIs, Apache Spark has emerged as a leading framework for real-time analytics, supporting streaming, machine learning, SQL processing and more. Unlike Hadoop which has to move all data into HDFS, Spark can directly work against data stored in any database, file system, or message queue. The MongoDB Connector for Hadoop provides a Spark plug-in, allowing Spark jobs to use MongoDB as both a source and a sink. A range of community-developed connectors are also available for MongoDB and Spark integration. Figure 2: Modernized data architecture: MongoDB, Spark, and Hadoop Many organizations are already combining MongoDB and Spark to build new analytics-rich applications. A global manufacturing company has built a pilot project to estimate warranty returns by analyzing material samples from production lines. The collected data enables them to build predictive failure models using Spark Machine Learning and MongoDB. A video sharing website is using Spark with MongoDB to place relevant advertisements in front of users as they browse, view and share videos. A multinational banking group operating in 31 countries with 51 million clients implemented a unified real-time monitoring application, running Apache Spark and MongoDB. The bank wanted to ensure a high quality of service across its online channels, and needed to continuously monitor client activity to check service response times and identify potential issues. All log data is collected in Apache Flume before being persisted to MongoDB where Spark jobs then analyze that data to power real time visualizations and alerts of system health. MongoDB was selected due to high scalability, dynamic schema that can ingest and manage quickly changing log data, and a rich array of secondary indexes, allowing Spark job to efficiently filter and access only the slices of data that are needed to drive the analytics. This approach results in lower latency and higher analytical throughput. Putting it all Together If anyone ever tells you the data warehouse market was slow and boring, dominated by just a few mega-vendors, tell them they are wrong. With the adoption of modern technologies such as MongoDB, Hadoop and Spark, organizations are creating new classes of applications and analytics that offer the promise of unlocking new efficiencies, creating new business models and out-pacing competitors. And with MongoDB serving both operational and analytical use-cases, you can build those applications faster, with lower cost, complexity and risk. To learn more about real time analytics with MongoDB, Spark and Hadoop, read our white paper. Turning Analytics into Real-Time Action References: Gartner Magic Quadrant for Operational Database Management Systems, Donald Feinberg, Merv Adrian, Nick Heudecker, Adam M. Ronthal, Terilyn Palanca, and October 12, 2015. Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics, Roxane Edjlali, Mark A. Beyer, and February 25, 2016. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
At-Rest Encryption in MongoDB 3.2: Features and Performance
Introduction MongoDB 3.2 introduces a new option for at-rest data encryption. In this post we take a closer look at the forces driving the need for increased encryption, MongoDB features for encrypting your data, as well as the performance characteristics of the new Encrypted Storage Engine. Data security is top of mind for many executives due to increased attacks as well as a series of data breaches in recent years that have negatively impacted several high profile brands. For example, in 2015, a major health insurer was a victim of a massive data breach in which criminals gained access to the Social Security numbers of more than 80 million people — resulting in an estimated cost of $100M. In the end, one of the critical vulnerabilities was the health insurer did not encrypt sensitive patient data stored at-rest. Data encryption is a key part of a comprehensive strategy to protect sensitive data. However, encrypting and decrypting data is potentially very resource intensive. It is important to understand the performance characteristics of your encryption technology to accurately conduct capacity planning. MongoDB 3.2: Delivering Native Encryption At-Rest MongoDB 3.2 provides a comprehensive encryption solution that protects your data, both in-flight and at-rest. For encryption-in-flight, MongoDB uses SSL/TLS, which ensures secure communication between your database and client, as well as inter-cluster traffic between nodes. Learn more about MongoDB and SSL/TLS. With the latest version 3.2, MongoDB also includes a fully integrated encryption-at-rest solution that reduces cost and performance overhead. Encryption-at-rest is part of MongoDB Enterprise Advanced only, but is freely available for development and evaluation. We will take a closer look at this new option later in the post. Before 3.2, the primary methods to provide encryption-at-rest were to use 3rd party applications that encrypt files at the application, file system, or disk level. These methods work well with MongoDB but can add extra cost, complexity, and overhead. Additionally, disk and file system encryption might not protect against all situations. While disk level encryption protects from someone taking the physical drive from the machine, it does not protect from someone that has physical access to the machine and can override the file system. Similarly, file system encryption will prevent someone from overriding the file system, but does not preclude someone from gaining unauthorized access through the application or database layer. Database encryption mitigates these problems by adding an extra layer of security. Even if an administrator has access to the file system, he/she will first need to be authenticated to the database before decrypting the data files. MongoDB’s Encrypted Storage Engine supports a variety of encryption algorithms from the OpenSSL library. AES-256 in CBC mode is the default, while other options include GCM mode, as well as FIPS mode for FIPS-140-2 compliance. Encryption is performed at the page level to provide optimal performance. Instead of having to encrypt/decrypt the entire file or database for each change, only the modified pages need to be encrypted or decrypted. Additionally, the Encrypted Storage Engine provides safe and secure management of the encryption keys. Each encrypted node contains an internal database key that is used to encrypt/decrypt the data files. The database key is wrapped with an external master key, which must be given to the node for it to initialize. MongoDB uses operating system protection mechanisms, such as VirtualLock and mlock, that lock the process’ virtual memory space into memory, ensuring that keys are never written or paged to disk in unencrypted form. Evaluating Performance Encrypting and decrypting data requires the use of additional resources, and administrators will want to understand the performance impact to adjust capacity planning accordingly. In our Encrypted Storage Engine benchmarking tests, we saw an average throughput overhead between 10% and 20%. Let’s take a closer look at some benchmark data to show the results for Insert Only, Read Only, and 50%-Read/50%-Insert workloads. For our benchmark, we used Intel Xeon X5675 CPUs, which support the AES-NI instruction set, and ran the CPUs at high load(100%). There were four different configurations that we evaluated; “Working Set Fits In Memory”, “Working Set Exceeds Memory”, “Encrypted”, and “Unencrypted”. The ‘Working Set’ refers to the amount of data and indexes that is actively used by your system. Let’s first look at an Insert-Only workload. With a high CPU load, we see an encryption overhead of around ~16%. Now, let’s take a look at the results of our Read-Only Workload. We ran the benchmark between two scenarios; “Working Set Fits In Memory” and “Working Set Exceeds Memory”. From the benchmark results, the decryption overhead for a Read-Only workload ranges between 5–20%. Lastly, here are the benchmark results for a 50%-Read, 50%-Insert workload. For the 50%-Read/50%-Insert workloads, the encryption overhead ranges between 12%–20%. In addition to throughput, latency is also a critical component of encryption overhead. From our benchmark, average latency overheads ranged between 6% to 30%. Though average latency overhead was slightly higher than throughput overhead, latencies were still very low—all under 1ms. Average Latency(us) Unencrypted Encrypted % Overhead Insert Only Average Latency(us) 32.4 40.9 -26.5% Read Only Working Set Fits In Memory Avg Latency(us) 230.5 245.0 -6.3% Read Only Working Set Exceeds Memory Avg Latency(us) 447.0 565.8 -26.6% 50% Insert/50% Read Working Set Fits In Memory Avg Latency(us) 276.1 317.4 -15.0% 50% Insert/50% Read Working Set Exceeds Memory Avg Latency(us) 722.3 936.5 -29.7% MongoDB Atlas Encryption At Rest MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. MongoDB Atlas provides encryption of data-in-flight over the network and at rest on disk. Data-at-rest can be optionally protected using encrypted data volumes. Encrypted data volumes secure your data without the need for you to build, maintain, and secure your own key management infrastructure. Summary In this post, we looked at a few workloads to determine the impact of encryption with MongoDB's new Encrypted Storage Engine. The results demonstrate that the Encrypted Storage Engine provides a secure way to encrypt your data-at-rest, while maintaining exceptional performance. With the Encrypted Storage Engine and diligent capacity planning, you shouldn't have to make a tradeoff between high performance and strong security when encrypting data-at-rest. For users interested in a database as a service, MongoDB Atlas provides encrypted data volumes to ensure your data at rest is secure. Environment These tests were conducted on bare metal servers. Each server had the following specification: CPU: 3.06GHz Intel Xeon Westmere(X5675-Hexcore) RAM: 6x16GB Kingston 16GB DDR3 2Rx4 OS: Ubuntu 14.04-64 Network Card: SuperMicro AOC-STGN-i2S Motherboard: SuperMicro X8DTN+_R2 Document Size: 1KB Workload: YCSB Version: MongoDB 3.2 Learn More About Encryption and all of the security features available for MongoDB by reading our guide. MongoDB Security Architecture Guide Additional Resources Try MongoDB’s New Encrypted Storage Engine. Users can try the Encrypted Storage Engine free for unlimited development and evaluation. Read our installing MongoDB Enterprise 3.2 documentation. About the Author - Jason Ma Jason is a Principal Product Marketing Manager based in Palo Alto, and has extensive experience in technology hardware and software. He previously worked for SanDisk in Corporate Strategy doing M&A and investments, and as a Product Manager on the Infiniflash All-Flash JBOF. Before SanDisk, he worked as a HW engineer at Intel and Boeing. Jason has a BSEE from UC San Diego, MSEE from the University of Southern California, and an MBA from UC Berkeley.