Analyzing Your MongoDB Data with Analytica

MongoDB
January 29, 2013

This is a guest post by Nosh Petigara, president of Analytica

Analytica is an analytics platform that makes it easy to analyze and report on data like user profiles, event logs, product catalogs, user-generated content, financial assets, or anything else you may have stored in you MongoDB database.

Analytica is built from the ground up for rich document type data and uses a JSON-like representation throughout its architecture. You use Analytica Script a declarative expression language tailored for JSON data, to tell Analytica how perform calculations, filter, group, and transform your documents into the results you want. You can interact with Analytica using a plug-in to Microsoft Excel or a command line shell. Analytica can also be used through its REST API. Browser-based and mobile interfaces are coming soon.

To show some of Analytica’s capabilities, we downloaded all of the tweets sent by the @mongodb twitter account over the last 4 years into a MongoDB database using the Twitter API. Using Analytica, we then developed a dashboard which shows @mongodb’s entire twitter history:

Assuming you had a database called ‘twitter’ and a collection called 'tweets’, which contained the JSON documents for @mongodb’s tweets from the Twitter API- here is how you’d use Analytica to calculate the most commonly used hashtags with 3 commands:

SET twitter.byHashtag = group(tweets.by(entities.hashtags.text)) //group our tweets by hashtag and store them in a calculated (virtual) collection called 'byHashtag'
SET twitter.byHashtag.count = count(tweets) // counts up the number of tweets for each hashtags in our virtual collection
SET twitter.tophashtags = orderdesc(byHashtag.by(count)) //sort the results in descending order

Analytica uses dot notion to specify what collections, documents, or properties to operate on. Each SET command in Analytica results in a computation or the transformation of a set of documents, the results of which are stored in what we call calculated properties or calculated collections. These are intermediate results, stored in Analytica (at the database, collection, or document level - depending on how you specify them), which can be used in subsequent computations. Finally the command 'twitter.tophashtags.(text, count)’ retrieves the text of the hashtags along with the count of how many tweets use that hashtag.

Since we wanted to graph out our results, we used Analytica’s plug in for Excel to enter a series of Analytica script expressions. In addition to calculating the most tweeted hashtags, we also looked at the frequency of tweets per month from the @mongodb account, analyzed the content of @mongodb’s tweets to see how hashtags and URLs were being used, and computed a few other metrics. With this quick analysis, we saw that @mongodb’s tweeting patterns have changed over time (a lot more tweets recently!), figured out that over 80% of @mongodb’s tweets are retweeted at least once, and learnt (perhaps not surprisingly!) that the most popular tweets are about new releases. We graphed out the results and generated the HTML page to share with the MongoDB community.

We’re holding a webinar with 10gen on February 12 so that you can learn more about Analytica and ask questions. In the webinar, we’ll go through how you can use Analytica on your own data to produce in-depth analyses, dashboards and reports and become a data whiz! In the meantime you can learn more and download the beta version of Analytica. You’ll be able to run Analytica against your own datasets or in an example we’ve put together on data from StackOverflow.

If you are looking for other datasets to try, I’d recommend checking out Twitter’s API, Foursquare’s API, the NYTimes API, or Sunlight Labs API. Each of these has JSON, CSV or XML data that you can easily import into MongoDB to start analyzing with Analytica or MongoDB’s query language and aggregation framework. We’ll also post a step-by-step guide soon, which will describe how you can run an analysis on your own twitter history. We’d love to hear from you - you can email with questions or feedback.

← Previous

Announcing New MMS Alerts

From the beginning, 10gen has focused MongoDB on four key areas: flexibility, power, speed, and ease of use. In terms of ease of use, we are very much interested not only in improving usability for developers, but also for IT operations. Early in MongoDB’s development, we released MMS ( MongoDB Monitoring Service ) to offer users visibility into the right metrics to manage and optimize applications during development and in production. To continue to improve our users’ ability to manage MongoDB in production, there is a powerful new type of alert type available in MMS - Metric Min/Max Value. The Metric Min/Max Value alerts provides alerting on a variety of host types and corresponding MongoDB performance metrics. For example, you can say ...Alert me if any of my secondaries experience a repl lag of greater than <x> mins.â€œ This alert type is now flexible enough to provide alerts for the most important performance boundaries that's specific to your application's performance profile. What does each new alert mean? You can alert against a number of different host types listed below. With a replication-enabled host type selected, you'll also have the option to select a specific replica set for this alert, or to have the alert apply to all replica sets. There's a wide variety of performance metrics to choose from for your alerts. To enumerate all the metric types would be intense. The options are essentially straight from existing MMS chart types, and hopefully they're pretty self explanatory. Example of available metrics when ...Secondariesâ€œ host type is selected: Limitations: no hardware stats For now, even if you have hardware stats configured and enabled, they cannot be used for Metric Min/Max Value alerts. Not on MMS? Sign up here MMS Docs Tagged with: MongoDB Monitoring Service, MMS, server, hosts, monitoring

January 25, 2013

Next →

MongoDB: Powering Digital Natives

Today's rapidly evolving digital landscape is dominated by digital native companies, driving innovation . These are companies born in the digital age and who operate through digital channels with a business model enabled by technology and data. They are not only adept at using technology but are also reshaping the way software is developed and deployed. This article delves into the challenges and opportunities facing digital natives in modern application development, with a particular focus on the complexities of managing data. We’ll explore how the right data platform can empower your digital native organization to build high-quality software faster, adapt to changing market demands, and unlock the full potential of your business. Strong foundations: The four pillars of tech-fueled growth for digital natives Achieving explosive growth requires a strong foundation built on specific principles, which empower rapid scaling and success. Here, we explore the four key pillars that fuel tech-driven growth for digital natives: Product-market fit, fast: As a digital native, you must continuously ship and iterate products to achieve a quick product-market fit. This builds customer trust and captures opportunities before competitors can in an evolving market. Data and AI-driven decisions: You must leverage data to personalize experiences, automate processes, and guide product decisions. A robust data architecture feeds real-time data into AI models, enabling data-driven decisions organization-wide. Balance of freedom and control: Your developers must have the freedom to choose technologies, even as your organization maintains control over the infrastructure to manage risks and costs at scale. Selected technologies must integrate within your overall technology estate. Extensible and open technologies: You must explore disruptive technologies while maintaining existing systems. Freedom from platform and vendor lock-in enables quick adoption of innovations, from current generative AI capabilities to future technological advances. Data: The unsolved challenge in modern application development From cloud platforms and managed services to gen AI code assistants, advancements have transformed how engineering teams build, ship, and run applications: Agile methods and programmatic APIs streamline development, while CI/CD and infrastructure as code automate processes. Containerization, microservices, and serverless architectures enable modularity, while new languages and frameworks boost capabilities. Enhanced logging and monitoring tools provide deep application health insights. Figure 1: Tools and processes to maximize velocity. But none of these advancements address where developers spend most of their time— data . In fact, 73% of developers share time and again that working with data is the hardest part of building an application or feature. So why is data the problem? Traditionally, selecting a database, often an open-source relational one, is the first step in development. However, these databases can struggle with the characteristics of modern data: it’s high volume, unstructured, and constantly evolving. As applications mature and their data demands grow, development teams may encounter challenges with achieving scalability and maintaining service resilience. Some teams turn to NoSQL databases, but even then they find there are limited capabilities, pushing them back to relational databases. As the application gains traction, the business’s appetite for innovation grows, compelling development teams to incorporate an expanding array of database technologies. This results in an architectural sprawl, imposing on teams the challenges of mastering, sustaining, and harmonizing new technologies. Concurrently, the dynamic technology landscape undergoes constant evolution, demanding teams to swiftly adjust. As a result, self-contained, autonomous teams encounter these hurdles recurrently, highlighting the pressing need for streamlined solutions to mitigate complexity and enhance agility. Figure 2: The evolving tech landscape. Data sprawl: A major threat to developer productivity and business agility Data sprawl is slowing everyone down. The more systems we add, the harder it is for developers to keep up. Each new database brings its own unique language, format, and way of working. This creates a huge headache for managing everything—from buying new systems to making sure they all work together securely. It’s a constant battle to keep data accessible, consistent, and backed up across all these different platforms. Figure 3: Teams building on separate stacks leads to data sprawl and manageability issues across the organization It compromises every single one of the four outcomes your technology foundation should be providing, yielding the opposite results: Missed opportunities, lost customers: Fragmented development experiences consume time as engineers struggle with multiple technologies, frameworks, and extract, transform, and load mechanisms for duplicating data between systems. This slows down releases, degrades digital product quality, and impedes engineers from achieving product-market fit and effective competition. Flying blind: With your operational data siloed across multiple systems, you lack the data foundations necessary to use live data in shaping customer experiences or reacting to market changes. This is because you are unable to feed reliable, consistent, real-time data into your AI models to take action within the flow of the application or to provide the business with up-to-the-second visibility into operations. High attrition, high costs: Complex data architecture impacts development team culture, leading to siloed knowledge, inefficient collaboration, and decreased developer satisfaction. This complexity also consumes substantial resources in maintaining existing systems by diverting resources from new projects that are vital for business competition in new markets. Disruption from new technologies: Dependence on any one cloud provider can stifle innovation for development teams by restricting access to the latest technologies. Developers are confined to the tools and services offered by a single provider, hindering their ability to explore and integrate new, potentially more efficient, or advanced technologies. Speed: A unified developer experience for building high-quality software faster In today’s digital world, speed is king. Your customers expect seamless experiences, but clunky applications leave them frustrated. But traditional databases can be a bottleneck, struggling to keep pace with your ever-evolving data and slowing down development. The future of data is here, and it’s flexible: a data platform built for digital natives . It leverages a flexible document model, letting you store and work with your data exactly how you need it. This eliminates rigid structures and complex migrations, freeing your developers to focus on what matters—building amazing applications faster. Flexible document data models empower developers to handle today’s rapidly evolving application data ( 80%+ unstructured) that relational databases struggle with. MongoDB documents are richly typed, boosting developer productivity by eliminating the need for lengthy schema migrations when implementing new features. Developers get to use their preferred tools and languages. Through its drivers and integrations, MongoDB supports all of the most popular programming languages, frameworks, integrated development environments, and AI-code assistance tools. MongoDB scales! It starts small and scales globally. Built for elasticity and horizontal scaling, it handles massive workloads without app changes. Figure 4: A unified developer experience, integrating all necessary data services for building sophisticated modern applications Introducing MongoDB Atlas : a fully-managed cloud database built for the modern developer. It enables the integration of real-time data from devices with AI capabilities (through vector embeddings and large language models ) to personalize user experiences. Stream processing empowers constant data analysis, while in-app analytics provides real-time insights without needing separate data warehouses, all while automatically managing data movement and storage for cost-effectiveness. MongoDB Atlas simplifies database management with the following: Easy deployment via UI, API, CLI, Kubernetes, and infrastructure as code tools. Automated operations for cost-effective performance and real-time monitoring. MongoDB Atlas customer success stories: Development with speed, scale, and efficiency Delivery Hero Delivery Hero, a global leader in online food delivery, leverages MongoDB Atlas to power its rapid service. Founded in 2011, Delivery Hero now serves millions of customers in over 70 countries through brands like PedidosYa, foodpanda, and Glovo. Having replaced its legacy SQL database, Delivery Hero optimized operations and bolstered performance by using MongoDB Atlas. By leveraging MongoDB Atlas Search, Delivery Hero revolutionized its search functionality, ensuring a seamless user experience for its extensive customer base through simplified indexing and real-time data accuracy. MongoDB’s scalability has empowered Delivery Hero to manage over 100 million products in its catalog without encountering latency issues, enabling the company to expand its services while maintaining peak performance. This agility, coupled with MongoDB’s cost-effectiveness, has enabled Delivery Hero to swiftly adapt to evolving customer demands, solidifying its position in the fiercely competitive delivery market. MongoDB Atlas Search was a game changer. We ran a proof of concept and discovered how easy it is to use. We can index in one click, and because it’s a feature of MongoDB, we know data is always up-to-date and accurate. Andrii Hrachov, Principal Software Engineer, Delivery Hero Read the full customer story to learn more. Coinbase Coinbase, a prominent cryptocurrency exchange boasting 245,000 ecosystem partners and managing assets worth $273 billion , trusts MongoDB to handle its extensive data workload. As the company grew, MongoDB scaled seamlessly to accommodate the increased demand. To further improve performance in the fast-paced crypto world, Coinbase partnered with MongoDB to develop a system that significantly accelerated data transfer to reporting tools, reducing processing time from days to a mere 5-6 hours. This near real-time data access enables Coinbase to rapidly analyze trends and make informed decisions, maintaining a competitive edge in the ever-evolving crypto landscape. Watch Coinbase's full session at MongoDB.local Austin, 2024 to learn more. MongoDB: Your flexible platform for digital growth With MongoDB, you can freely explore, experiment, develop, and deploy according to your digital-native business needs. If you would like to learn more about how MongoDB can empower your digital-native business to conquer market trends, visit: Innovate With AI: The Future Enterprise Application-Driven Intelligence: Defining the Next Wave of Modern Apps AI-Driven Real-Time Pricing with MongoDB and Vertex AI

November 7, 2024