Atlas Stream Processing

3 results

Atlas Stream Processing is Now in Public Preview

Update May 2, 2024: Atlas Stream processing is now generally available. Read our blog to learn more . This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국인 , 简体中文 . Today, we’re excited to announce that Atlas Stream Processing is now in public preview. Any developer on Atlas interested in giving it a try has full access. Learn more in our docs or get started today. Listen to the MongoDB Podcast to learn about the Atlas Stream Processing public preview from Head of Streaming Products, Kenny Gorman. Developers love the flexibility and ease of use of the document model, alongside the Query API, which allows them to work with data as code in MongoDB Atlas. With Atlas Stream Processing, we are bringing these same foundational principles to stream processing. A report covering the topic published by S&P Global Market Intelligence 451 Research had this to say, “A unified approach to leveraging data for application development — the direction of travel for MongoDB — is particularly valuable in the context of stream processing where operational and development complexity has proven a significant barrier to adoption." First announced at .local NYC 2023, Atlas Stream Processing is redefining the experience of aggregating and enriching streams of high velocity, rapidly changing event data, and unifying how to work with data in motion and at rest. How are developers using the product so far? And what have we learned? During the private preview, we saw thousands of development teams request access and we have gathered useful feedback from hundreds of engaged teams. One of those engaged teams is the marketing technology leader, Acoustic : "At Acoustic, our key focus is to empower brands with behavioral insights that enable them to create engaging, personalized customer experiences. To do so, our Acoustic Connect platform must be able to efficiently process and manage millions of marketing, behavioral, and customer signals as they occur. With Atlas Stream Processing, our engineers can leverage the skills they already have from working with data in Atlas to process new data continuously, ensuring our customers have access to real-time customer insights." John Riewerts, EVP, Engineering at Acoustic Other interesting use cases include: A leading global airline using complex aggregations to rapidly process maintenance and operations data, ensuring on-time flights for their thousands of daily customers, A large manufacturer of energy equipment using Atlas Stream Processing to enable continuous monitoring of high-volume pump data to avoid outages and optimize their yields, and An innovative enterprise SaaS provider leveraging the rich processing capabilities in Atlas Stream Processing to deliver timely and contextual in-product alerts to drive improved product engagement. These are just a few of the many use-case examples that we’re seeing across industries. Beyond the use cases we’ve already seen, developers are giving us tons of insight into what they’d like to see us add to in the future. In addition to enabling continuous processing of data in Atlas databases through change streams, it’s exciting to see developers using Atlas Stream Processing with their Kafka data hosted by valued partners like Confluent , Amazon MSK , Azure Event Hubs , and Redpanda . Our aim with developer data platform capabilities in Atlas has always been to make for a better experience across the key technologies relied on by developers. What’s new in the public preview? That brings us to what’s new. As we scale to more teams, we’re expanding functionality to include the most requested feedback gathered in our private preview. From the many pieces of feedback received, three common themes emerged: Refining the developer experience Expanding advanced features and functionality Improving operations and security Refining the developer experience In private preview, we established the core of the developer experience that is essential to making Atlas Stream Processing a natural solution for development teams. And in public preview, we’re doubling down on this by making two additional enhancements: VS Code integration The MongoDB VS Code plugin has added support for connecting to Stream Processing instances. For developers already leveraging the plugin, teams can create and manage processors in a familiar development environment. This means less time switching between tools and more time building your applications! Improved dead letter queue (DLQ) capabilities DLQ support is a key element for powerful stream processing and in public preview, we’re expanding DLQ capabilities. DLQ messages will now display themselves when executing pipelines with sp.process() and when running .sample() on running processors, allowing for a more streamlined development experience that does not require setting up a target collection to act as a DLQ. Expanding advanced features and functionality Atlas Stream Processing already supported many of the key aggregation operators developers are familiar with in the Query API used with data at rest. We've now added powerful windowing capabilities and the ability to easily merge and emit data to an Atlas database or to a Kafka topic. Public preview will add even more functionality demanded by the most advanced teams relying on stream processing to deliver customer experiences: $lookup Developers can now enrich documents being processed in a stream processor with data from remote Atlas clusters, performing joins against fields from the document and the target collection. Change streams pre- and post-imaging Many developers are using Atlas Stream Processing to continuously process data in Atlas databases as a source through change streams. We have enhanced the change stream $source in public preview with support for pre-and post-images . This enables common use cases where developers need to calculate deltas between fields in documents as well as use cases requiring access to the full contents of a deleted document. Conditional routing with dynamic expressions in merge and emit stages Conditional routing lets developers use the value of fields in documents being processed in Atlas Stream Processing to dynamically send specific messages to different Atlas collections or Kafka topics. The $merge and $emit stages also now support the use of dynamic expressions. This makes it possible to use the Query API for use cases requiring the ability to fork messages to different collections or topics as needed. Idle stream timeouts Streams without advancing watermarks due to a lack of inbound data can now be configured to close after a period of time emitting the results of the windows. This can be critical for streaming sources that have inconsistent flows of data. Improving operations and security Finally, we have invested heavily over the past few months in improving other operational and security aspects of Atlas Stream Processing. A few of the highlights include: Checkpointing Atlas Stream Processing now performs checkpoints for saving a state while processing. Stream processors are continuously running processes, so whether due to a data issue or infrastructure failure, they require an intelligent recovery mechanism. Checkpoints make it easy to resume your stream processors from wherever data stopped being collected and processed. Terraform provider support Support for the creation of connections and stream processing instances (SPIs) is now available with Terraform. This allows for infrastructure to be authored as code for repeatable deployments. Security roles Atlas Stream Processing has added a project-level role, giving users just enough permission to perform their stream processing tasks. Stream processors can run under the context of a specific role, supporting a least privilege configuration. Auditing Atlas Stream Processing can now audit authentication attempts and actions within your Stream Processing Instance giving you insight into security-related events. Kafka consumer group support Stream processors in now use Kafka consumer groups for offset tracking. This allows users to easily change the position of the processor in the stream for operations and easily monitor for potential processor lag. A final note on what’s new is that in public preview, we will begin charging for Atlas Stream Processing, using preview pricing (subject to change). You can learn more about pricing in our documentation . Build your first stream processor today Public preview is a huge step forward for us as we expand the developer data platform and enable more teams with a stream processing solution that simplifies the operational complexity of building reactive, responsive, event-driven applications, while also offering an improved developer experience. We can’t wait to see what you build! Login today or get started with the tutorial , view our resources , or follow the Learning Byte on MongoDB University.

February 13, 2024

The Challenges and Opportunities of Processing Streaming Data

Let’s consider a fictitious bank that has a credit card offering for its customers. Transactional data might land in their database from various sources such as a REST API call from a web application or from a serverless function call made by a cash machine. Regardless of how the data was written to the database, the database performed its job and made the data available for querying by the end-user or application. The mechanics are database-specific but the end goal of all databases is the same. Once data is in a database the bank can query and obtain business value from this data. In the beginning, their architecture worked well, but over time customer usage grew and the bank found it difficult to manage the volume of transactions. The company decides to do what many customers in this scenario do and adopts an event-streaming platform like Apache Kafka to queue these event data. Kafka provides a highly scalable event streaming platform capable of managing large data volumes without putting debilitating pressure on traditional databases. With this new design, the bank could now scale supporting more customers and product offerings. Life was great until some customers started complaining about unrecognized transactions occurring on their cards. Customers were refusing to pay for these and the bank was starting to spend lots of resources figuring out how to manage these fraudulent charges. After all, by the time the data gets written into the database, and the data is batch loaded into the systems that can process the data, the user's credit card was already charged perhaps a few times over. However, hope is not lost. The bank realized that if they could query the transactional event data as it's flowing into the database they might be able to compare it with historical spending data from the user, as well as geolocation information, to make a real-time determination if the transaction was suspicious and warranted further confirmation by the customer. This ability to continuously query the stream of data is what stream processing is all about. From a developer's perspective, building applications that work with streaming data is challenging. They need to consider the following: Different serialization formats: The data that arrives in the stream may contain different serialization formats such as JSON, AVRO, Protobuf or even binary. Different schemas: Data originating from a variety of sources may contain slightly different schemas. Fields like CustomerID could be customerId from one source or CustID in another and a third could not even use the field. Late arriving data: The data itself could arrive late due to network latency issues or being completely out of order. Operational complexity: Developers need to be concerned with reacting to application state changes like failed connections to data sources and how to efficiently scale the application to meet the demands of the business. Security: In larger enterprises, the developer usually doesn’t have access to production data. This makes troubleshooting and building queries from this data difficult. Stream processing can help address these challenges and enable real-time use cases, such as fraud detection, hyper-personalization, and predictive maintenance, that are otherwise difficult or extremely costly to overcome. While many stream processing solutions exist, the flexibility of the document model and the power of the aggregation framework are naturally well suited to help developers with the challenges found with complex event data. Discover MongoDB Atlas Stream Processing Read the MongoDB Atlas Stream Processing announcement and check out Atlas Stream Processing tutorials on the MongoDB Developer Center . Request private preview access to Atlas Stream processing Request access today to participate in the private preview. New to MongoDB? Get started for free today by signing up for MongoDB Atlas .

August 30, 2023

Introducing Atlas Stream Processing - Simplifying the Path to Reactive, Responsive, Event-Driven Apps

Update May 2, 2024: Atlas Stream processing is now generally available. Read our blog to learn more . This post is also available in: Deutsch , Français , Español , Português , 中文 Atlas Stream Processing is now in public preview. Learn more about what’s new! Today, we’re excited to announce the private preview of Atlas Stream Processing ! The world is increasingly fast-paced and your applications need to keep up. Responsive, event-driven applications bring digital experiences to life for your customers and accelerate time to insight and action for the business. Think: Notifying your users as soon as their delivery status changes Blocking fraudulent transactions during payment processing Analyzing sensor telemetry as it is generated to detect and remediate potential equipment failures before costly outages. In each of these examples, data loses its value as the seconds tick by. It needs to be queried and actioned continuously and with low latency. To do this, developers are increasingly turning to event-driven applications fueled by streaming data so that they can instantly react and respond to the constantly changing world around them. Atlas Stream Processing will help developers make the shift to event-driven apps faster. Over the years, developers have adopted the MongoDB database because they love the flexibility and ease of use of the document model, along with the MongoDB Query API which allows them to work with data as code. These foundational principles dramatically remove friction from developing software and applications. Now, we are bringing those same principles to streaming data. Atlas Stream Processing is redefining the developer experience for working with complex streams of high velocity, rapidly changing data, and unifying how developers work with data in motion and at rest. While existing products and technologies have offered many innovations to streaming and stream processing, we think MongoDB is naturally well suited to help developers with some key remaining challenges. These challenges include the difficulty of working with variable, high volume, and high-velocity data; the contextual overhead of learning new tools, languages, and APIs; and the additional operational maintenance and fragmentation that can be introduced through point technologies into complex application stacks. Introducing Atlas Stream Processing Atlas Stream Processing enables processing high-velocity streams of complex data with a few unique advantages for the developer experience: It’s built on the document model, allowing for flexibility when dealing with the nested and complex data structures common in event streams. This alleviates the need for pre-processing steps while allowing developers to work naturally and easily with data that has complex structures. Just as the database allows. It unifies the experience of working across all data, offering a single platform – across API, query language, and tools – to process rich, complex streaming data alongside the critical application data in your database. And it’s fully managed in MongoDB Atlas , building on an already robust set of integrated services. With just a few API calls and lines of code, you can stand up a stream processor, database, and API serving layer across any of the major cloud providers. Watch the MongoDB .local NYC Keynote to see Atlas Stream Processing announced by our Chief Product Officer, Sahir Azam. He covers the emergence of streaming data and how it powers a variety of use cases, key streaming challenges, and how Atlas Stream Processing can help you build modern, event-driven applications. Head of Streaming Products, Kenny Gorman, then goes through a live demo of Atlas Stream Processing in action. How does Atlas Stream Processing work? Atlas Stream Processing connects to your critical data, whether that lives in MongoDB (through change streams ) or in an event streaming platform like Apache Kafka. Developers can easily and seamlessly connect to Confluent Cloud, Amazon MSK, Redpanda, Azure Event Hubs, or self-managed Kafka using the Kafka wire protocol. And by integrating with the native Kafka driver, Atlas Stream Processing offers low-latency native performance at its foundation. In addition to our long-standing strategic partnership with Confluent, we are also excited to announce partnerships with AWS, Microsoft, Redpanda, and Google, at launch. Atlas Stream Processing then provides 3 key capabilities required to turn your firehose of streaming data into differentiated customer experiences. Let’s go through these one by one. Continuous processing First, developers can now use MongoDB’s aggregation framework to continuously process rich and complex streams of data from event streaming platforms such as Apache Kafka. This unlocks powerful new ways to continuously query, analyze, and react to streaming data without any of the delays inherent in batch processing. With the aggregation framework, you can filter and group data, aggregating high-velocity event streams into actionable insights over stateful time windows, powering richer, real-time application experiences. Continuous validation Next, Atlas Stream Processing offers developers robust and native mechanisms to handle incorrect data issues that can otherwise cause havoc in applications. Potential issues include passing inaccurate results to the app, data loss, and application downtime. Atlas Stream Processing solves these problems to ensure streaming data can be reliably processed and shared between event-driven applications. Atlas Stream Processing: Provides Continuous Schema Validation to check that events are properly formed before processing – for example rejecting events with missing fields or containing invalid value ranges Detects message corruption, and Detects late-arriving data that has missed a processing window. Atlas Stream Processing pipelines can be configured with an integrated Dead Letter Queue (DLQ) into which events failing validation are routed. This avoids developers having to build and maintain their own custom implementations. Issues can be quickly debugged while the risk of missing or corrupt data bringing down the entire application is minimized. Continuous merge Your processed data can then be continuously materialized into views maintained in Atlas database collections. We can think of this as a push query. Applications can retrieve results (via pull queries) from the view using either the MongoDB Query API or Atlas SQL interface. Continuously merging updates to collections is a really efficient way of maintaining fresh analytical views of data supporting automated and human decision-making and action. In addition to materialized views, developers also have the flexibility to publish processed events back into streaming systems like Apache Kafka. Creating a Stream Processor Let’s show you how easy it is to build a stream processor in MongoDB Atlas. With Atlas Stream Processing, you can use the same aggregation pipeline syntax for a stream processor that you’re familiar with from the database. Below we’re showcasing a simple stream processing instance from start to finish. It takes just a few lines of code. First, we’ll write an aggregation pipeline that defines a source for your data, performs validation ensuring data is not coming from the localhost/127.0.0.1 IP address, creates a tumbling window to collect grouped message data every minute, and then merges that newly processed data into a MongoDB collection in Atlas. Then, we’ll create our Stream Processor called “netattacks” specifying our newly defined pipeline p as well as dlq as arguments. This will perform our desired processing, and by using a Dead Letter Queue (DLQ), will store any invalid data safely for inspection, debugging, or re-processing later. Lastly, we can start it. That’s all it takes to build a stream processor in MongoDB Atlas. Request private preview We’re excited to get this product into your hands and see what you build with it. Learn more about Atlas Stream Processing and request private preview to participate in the private preview once it opens to developers. New to MongoDB? Get started for free today by signing up for MongoDB Atlas . Head to the MongoDB.local hub to see where we'll be showing up next. Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.

June 22, 2023