PyMongoArrow Now Generally Available

Shelby Carpenter and Shubham Ranjan
July 5, 2023 | Updated: July 19, 2023

We are pleased to announce that PyMongoArrow, a Python library for data analysis with MongoDB, is now generally available.

PyMongoArrow allows you to efficiently move data in and out of MongoDB into other popular analytics tools in an easy and efficient manner. This library is built on top of PyMongo, MongoDB’s popular Python driver for synchronous programming.

Why we built PyMongoArrow

Today, PyMongoArrow is the recommended way to materialize MongoDB query result sets as contiguous-in-memory, typed arrays suited for in-memory analytical processing applications. It currently supports exporting MongoDB data into Pandas DataFrames, NumPy arrays, and Apache Arrow tables.

Before MongoDB created PyMongoArrow, it was possible to move data out of MongoDB into other analytics tools and systems, but there wasn’t a unified tool for working with the variety of data formats commonly used for analysis. Because different data analysts and developers may have different approaches and use different formats, this could sometimes interrupt collaboration and create a bottleneck in teams’ analytics pipelines.

PyMongoArrow solves these challenges for our users. While PyMongoArrow has been available in Public Preview since 2021, we have now made it generally available after adding additional features to ensure the best user experience.

Why use PyMongoArrow

The PyMongoArrow library can be easily integrated into your already existing analytics pipeline. Since it's built on top of the PyMongo library, it also extends all its functionality to let you work with MongoDB data in an easy and performant manner when operating at scale.

What can PyMongoArrow do?

Read data into Pandas DataFrame, NumPy Array, and Arrow Table Format

You can connect to your MongoDB instance through the PyMongoArrow library and use the following functions to output the query result sets into the desired data format:

find_pandas_all(): lets you output MongoDB query result sets as a Pandas DataFrame
find_arrow_all(): lets you output MongoDB query result sets as an Arrow Table
find_numpy_all(): lets you output MongoDB query result sets as a Numpy Array

Write to other data formats

Not only does PyMongoArrow allow you to output MongoDB query results sets as Pandas DataFrames, as NumPy arrays, and as Arrow tables, but it also allows you to write data to many other data formats. Once the MongoDB query result sets have been loaded as Arrow table type, it can be easily written to one of the other formats supported by PyArrow such as Parquet file, CSV, JSON etc.

Write data back to MongoDB

PyMongoArrow not only enables you to perform analytics tasks efficiently but lets you write the analyzed data back into the MongoDB database, ensuring permanent persistence for your valuable insights.

Result sets that have been loaded as Arrow’s table type, Pandas’ DataFrame type, or NumPy’s array type can be easily written to your MongoDB database using the write() function.

Use MongoDB's powerful aggregation pipeline with PyMongoArrow

In addition to basic find operations, you can also take advantage of MongoDB's powerful aggregation pipeline for even more complex analytical use cases.

Simply use the aggregate_pandas_all() function to query your MongoDB data using an aggregation pipeline and return the result sets as Pandas DataFrames. You can also use the aggregate_numpy_all() function and aggregate_arrow_all() functions to return the result sets as NumPy arrays and Arrow tables.

Get started today

We have plenty of resources available to guide you in your journey to quickly get started with the PyMongoArrow library. Here are some great resources:

Once you’ve given it a try, please share your feedback with us through the MongoDB Feedback Engine. Your feedback helps us understand what features will make the most impact for our users.

Head to the MongoDB.local hub to see where we'll be showing up next.

← Previous

ADS: Edge Server + Data Ingest

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more . Maintaining data across an increasingly diverse set of devices – such as mobile phones, kiosks, IoT devices, sensors, and more – is becoming increasingly sophisticated. Requirements for low latency experiences, accurate visibility in real-time, management through hostile network conditions, and compatibility across an expanding set of device types all make this extremely challenging. We are thrilled to announce products that address this expanding challenge: Atlas Device Sync : Edge Server, Data Ingest, and C++ support. These capabilities are all key additions to MongoDB’s developer data platform, empowering teams with an out-of-the-box data synchronization layer that ensures uninterrupted operations and productivity across an organization’s ecosystem of distributed devices. Atlas Device Sync: Edge Server Traditionally, edge devices require cloud connectivity to sync with each other and reflect changes across users. This meant that if there was no internet connection, devices used for critical operations like inventory management or package tracking were not showing accurate data until the internet connection resumed. Many use cases require more reliability across local devices – for example, in a retail warehouse where tablets are used for real-time package management – for which some teams develop and implement their own local syncing solutions at their remote branch locations. Atlas Device Sync: Edge Server enables teams to leverage a pre-built local-first data synchronization layer. They can deploy a local Edge Server at their remote location which allows devices to sync directly with each other without the need for a roundtrip to the cloud. Once back online, the data is also synchronized with the cloud. This approach ensures swift and efficient synchronization, enhancing overall performance and enabling smooth operations even in unreliable network conditions. With MongoDB’s Atlas Device Sync: Edge Server, organizations can cut down the time it takes to build, test, and maintain a local sync solution from scratch, and instead focus on other pressing innovative business initiatives. The advantages of Atlas Device Sync extend beyond convenience and simplicity. Revisiting that retail warehouse example, in situations where the store operates in standalone mode, such as during a network outage caused by a natural disaster, our local Edge Server ensures that the in-store devices can sync with each other, providing a cohesive experience for both customers and employees. The applications of Atlas Device Sync are diverse, catering to a range of industries and scenarios. For example, mobile devices in an airplane can maintain a shared state across the cockpit and flight crew, facilitating efficient communication and collaboration. Cruise ship sales across multiple gift shops can keep a common inventory while at sea by syncing with the local server, thereby ensuring accurate stock management. Even medical records on a Navy ship can be updated during checkups and saved to the local Tiered Device Sync, ready to sync with the full backend once a network connection is established. Atlas Device Sync: Edge Server is now public preview. Sign up to get started. Atlas Device Sync: Data Ingest Data Ingest, now generally available, serves as a synchronization strategy tailored for applications that predominantly involve writing data on the client side, without requiring frequent reads. By enabling Data Ingest for one or more collections, businesses can experience accelerated write speeds while bypassing some of the processing involved in bi-directional sync. This feature supports writing data to any collection type, including Atlas time-series collections, making it suitable for a wide range of use cases. Consider an Internet of Things (IoT) application that continually logs sensor data, generating a significant workload in terms of data writes but with minimal read requirements. This IoT device may also experience prolonged periods of offline operation. With Data Ingest, the processing overhead associated with bi-directional synchronization is circumvented, resulting in significantly improved write speeds to an Atlas collection. This ensures that crucial sensor data is efficiently captured and stored, even under challenging network conditions. Data Ingest is not limited to IoT applications alone; it can be leveraged for various use cases where write operations dominate and conflict resolution is unnecessary. For instance, retail applications that generate invoices or log application events can benefit from the streamlined and accelerated data writing offered by Data Ingest. By eliminating the need for conflict resolution, businesses can optimize their processes, enhance performance, and improve overall operational efficiency. This feature can be selectively applied to individual collections, allowing your application to utilize Data Ingest for specific data sets while utilizing bi-directional Device Sync for other collections. This enables full flexibility, allowing businesses to tailor their synchronization approach based on their unique requirements. With the powerful capabilities of Atlas Device Sync: Edge Server and Data Ingest, our developer data platform enables enterprises to fully embrace the potential of edge computing. By eliminating the overhead of traditional methods of building these sophisticated synchronization mechanisms from scratch and instead leveraging pre-built solutions embodying industry best practices, teams can operate at peak performance levels, even in scenarios with limited connectivity or heavy data generation requirements. Atlas Device Sync: Data Ingest is now generally available. Read the docs and register for Atlas to get started today. C++ Support Lastly, we are proud to announce the beta release of our highly anticipated C++ support through our C++ client SDK . This addition further expands the reach of our synchronization solution, catering to applications running on embedded, lightweight, low-footprint devices, and Windows platforms. Developers can use this SDK to immediately add Atlas Device Sync to their C++ applications, enabling teams to leverage the full potential of edge computing without compromising on performance or ease of development. This announcement also includes improvements in schema definitions and API methods, providing a natural and intuitive experience for native C++ developers. The introduction of C++ Support is a testament to our commitment to providing comprehensive solutions that address the diverse needs of our customers. By expanding our compatibility to include C++ and Windows platforms, we aim to empower developers to create innovative and efficient applications that seamlessly synchronize data in edge environments. The C++ SDK is now in beta. Ready to get started? Use the C++ SDK by installing the SDK . Read our docs , and follow our repo . Then, register for Atlas to connect to Atlas Device Sync, a fully-managed mobile backend as a service. Leverage out-of-the-box infrastructure, data synchronization capabilities, network handling, and much more to quickly launch enterprise-grade mobile apps. Finally, let us know what you think, and get involved in our forums . See you there! Stay tuned for more updates as we continue to enhance our offerings and empower you with cutting-edge solutions.

July 3, 2023

Next →

MongoDB: Powering Digital Natives

Today's rapidly evolving digital landscape is dominated by digital native companies, driving innovation . These are companies born in the digital age and who operate through digital channels with a business model enabled by technology and data. They are not only adept at using technology but are also reshaping the way software is developed and deployed. This article delves into the challenges and opportunities facing digital natives in modern application development, with a particular focus on the complexities of managing data. We’ll explore how the right data platform can empower your digital native organization to build high-quality software faster, adapt to changing market demands, and unlock the full potential of your business. Strong foundations: The four pillars of tech-fueled growth for digital natives Achieving explosive growth requires a strong foundation built on specific principles, which empower rapid scaling and success. Here, we explore the four key pillars that fuel tech-driven growth for digital natives: Product-market fit, fast: As a digital native, you must continuously ship and iterate products to achieve a quick product-market fit. This builds customer trust and captures opportunities before competitors can in an evolving market. Data and AI-driven decisions: You must leverage data to personalize experiences, automate processes, and guide product decisions. A robust data architecture feeds real-time data into AI models, enabling data-driven decisions organization-wide. Balance of freedom and control: Your developers must have the freedom to choose technologies, even as your organization maintains control over the infrastructure to manage risks and costs at scale. Selected technologies must integrate within your overall technology estate. Extensible and open technologies: You must explore disruptive technologies while maintaining existing systems. Freedom from platform and vendor lock-in enables quick adoption of innovations, from current generative AI capabilities to future technological advances. Data: The unsolved challenge in modern application development From cloud platforms and managed services to gen AI code assistants, advancements have transformed how engineering teams build, ship, and run applications: Agile methods and programmatic APIs streamline development, while CI/CD and infrastructure as code automate processes. Containerization, microservices, and serverless architectures enable modularity, while new languages and frameworks boost capabilities. Enhanced logging and monitoring tools provide deep application health insights. Figure 1: Tools and processes to maximize velocity. But none of these advancements address where developers spend most of their time— data . In fact, 73% of developers share time and again that working with data is the hardest part of building an application or feature. So why is data the problem? Traditionally, selecting a database, often an open-source relational one, is the first step in development. However, these databases can struggle with the characteristics of modern data: it’s high volume, unstructured, and constantly evolving. As applications mature and their data demands grow, development teams may encounter challenges with achieving scalability and maintaining service resilience. Some teams turn to NoSQL databases, but even then they find there are limited capabilities, pushing them back to relational databases. As the application gains traction, the business’s appetite for innovation grows, compelling development teams to incorporate an expanding array of database technologies. This results in an architectural sprawl, imposing on teams the challenges of mastering, sustaining, and harmonizing new technologies. Concurrently, the dynamic technology landscape undergoes constant evolution, demanding teams to swiftly adjust. As a result, self-contained, autonomous teams encounter these hurdles recurrently, highlighting the pressing need for streamlined solutions to mitigate complexity and enhance agility. Figure 2: The evolving tech landscape. Data sprawl: A major threat to developer productivity and business agility Data sprawl is slowing everyone down. The more systems we add, the harder it is for developers to keep up. Each new database brings its own unique language, format, and way of working. This creates a huge headache for managing everything—from buying new systems to making sure they all work together securely. It’s a constant battle to keep data accessible, consistent, and backed up across all these different platforms. Figure 3: Teams building on separate stacks leads to data sprawl and manageability issues across the organization It compromises every single one of the four outcomes your technology foundation should be providing, yielding the opposite results: Missed opportunities, lost customers: Fragmented development experiences consume time as engineers struggle with multiple technologies, frameworks, and extract, transform, and load mechanisms for duplicating data between systems. This slows down releases, degrades digital product quality, and impedes engineers from achieving product-market fit and effective competition. Flying blind: With your operational data siloed across multiple systems, you lack the data foundations necessary to use live data in shaping customer experiences or reacting to market changes. This is because you are unable to feed reliable, consistent, real-time data into your AI models to take action within the flow of the application or to provide the business with up-to-the-second visibility into operations. High attrition, high costs: Complex data architecture impacts development team culture, leading to siloed knowledge, inefficient collaboration, and decreased developer satisfaction. This complexity also consumes substantial resources in maintaining existing systems by diverting resources from new projects that are vital for business competition in new markets. Disruption from new technologies: Dependence on any one cloud provider can stifle innovation for development teams by restricting access to the latest technologies. Developers are confined to the tools and services offered by a single provider, hindering their ability to explore and integrate new, potentially more efficient, or advanced technologies. Speed: A unified developer experience for building high-quality software faster In today’s digital world, speed is king. Your customers expect seamless experiences, but clunky applications leave them frustrated. But traditional databases can be a bottleneck, struggling to keep pace with your ever-evolving data and slowing down development. The future of data is here, and it’s flexible: a data platform built for digital natives . It leverages a flexible document model, letting you store and work with your data exactly how you need it. This eliminates rigid structures and complex migrations, freeing your developers to focus on what matters—building amazing applications faster. Flexible document data models empower developers to handle today’s rapidly evolving application data ( 80%+ unstructured) that relational databases struggle with. MongoDB documents are richly typed, boosting developer productivity by eliminating the need for lengthy schema migrations when implementing new features. Developers get to use their preferred tools and languages. Through its drivers and integrations, MongoDB supports all of the most popular programming languages, frameworks, integrated development environments, and AI-code assistance tools. MongoDB scales! It starts small and scales globally. Built for elasticity and horizontal scaling, it handles massive workloads without app changes. Figure 4: A unified developer experience, integrating all necessary data services for building sophisticated modern applications Introducing MongoDB Atlas : a fully-managed cloud database built for the modern developer. It enables the integration of real-time data from devices with AI capabilities (through vector embeddings and large language models ) to personalize user experiences. Stream processing empowers constant data analysis, while in-app analytics provides real-time insights without needing separate data warehouses, all while automatically managing data movement and storage for cost-effectiveness. MongoDB Atlas simplifies database management with the following: Easy deployment via UI, API, CLI, Kubernetes, and infrastructure as code tools. Automated operations for cost-effective performance and real-time monitoring. MongoDB Atlas customer success stories: Development with speed, scale, and efficiency Delivery Hero Delivery Hero, a global leader in online food delivery, leverages MongoDB Atlas to power its rapid service. Founded in 2011, Delivery Hero now serves millions of customers in over 70 countries through brands like PedidosYa, foodpanda, and Glovo. Having replaced its legacy SQL database, Delivery Hero optimized operations and bolstered performance by using MongoDB Atlas. By leveraging MongoDB Atlas Search, Delivery Hero revolutionized its search functionality, ensuring a seamless user experience for its extensive customer base through simplified indexing and real-time data accuracy. MongoDB’s scalability has empowered Delivery Hero to manage over 100 million products in its catalog without encountering latency issues, enabling the company to expand its services while maintaining peak performance. This agility, coupled with MongoDB’s cost-effectiveness, has enabled Delivery Hero to swiftly adapt to evolving customer demands, solidifying its position in the fiercely competitive delivery market. MongoDB Atlas Search was a game changer. We ran a proof of concept and discovered how easy it is to use. We can index in one click, and because it’s a feature of MongoDB, we know data is always up-to-date and accurate. Andrii Hrachov, Principal Software Engineer, Delivery Hero Read the full customer story to learn more. Coinbase Coinbase, a prominent cryptocurrency exchange boasting 245,000 ecosystem partners and managing assets worth $273 billion , trusts MongoDB to handle its extensive data workload. As the company grew, MongoDB scaled seamlessly to accommodate the increased demand. To further improve performance in the fast-paced crypto world, Coinbase partnered with MongoDB to develop a system that significantly accelerated data transfer to reporting tools, reducing processing time from days to a mere 5-6 hours. This near real-time data access enables Coinbase to rapidly analyze trends and make informed decisions, maintaining a competitive edge in the ever-evolving crypto landscape. Watch Coinbase's full session at MongoDB.local Austin, 2024 to learn more. MongoDB: Your flexible platform for digital growth With MongoDB, you can freely explore, experiment, develop, and deploy according to your digital-native business needs. If you would like to learn more about how MongoDB can empower your digital-native business to conquer market trends, visit: Innovate With AI: The Future Enterprise Application-Driven Intelligence: Defining the Next Wave of Modern Apps AI-Driven Real-Time Pricing with MongoDB and Vertex AI

November 7, 2024