RegData & MongoDB: Streamline Data Control and Compliance
While navigating the requirements of keeping data secure in highly regulated markets, organizations can find themselves entangled in a web of costly and complex IT systems. Whether it's the GDPR safeguarding European personal data or the Monetary Authority of Singapore's guidelines on outsourcing and cloud computing , the greater the number of regulations organizations are subjected to, particularly across multiple geographical locations, the more intricate their IT infrastructure becomes, and organizations today face the challenge of adapting immediately or facing the consequences. In addition to regulations, customer expectations have become a major driver for innovation and modernization. In the financial sector, for example, customers demand a fast and convenient user experience with real-time access to transaction info, a fully digitized mobile-first experience with mobile banking, and personalization and accessibility for their specific needs. While these sorts of expectations have become the norm, they conflict with the complex infrastructures of modern financial institutions. Many financial institutions are saddled with legacy infrastructure that holds them back from adapting quickly to changing market conditions. Established financial institutions must find a way to modernize, or they risk losing market share to nimble challenger banks with cost-effective solutions. The banking market today is increasingly populated with nimble fintech companies powered by smaller and more straightforward IT systems, which makes it easier for them to pivot quickly. In contrast, established institutions often operate across borders, meaning they must adhere to a greater number of regulations. Modernizing these complex systems requires the simultaneous introduction of new, disruptive technology without violating any regulatory constraints, akin to driving a car while changing a tire. The primary focus for established banks is safeguarding existing systems to ensure compliance with regulatory constraints while prioritizing customer satisfaction and maintaining smooth operations as usual. RegData: Compliance without risk Multi-cloud application security platform, RegData embraces this challenge head-on. RegData has expertise across a number of highly regulated markets, from healthcare to public services, human resources, banking, and finance. The company’s mission is clear—delivering a robust, auditable, and confidential data protection platform within their comprehensive RegData Protection Suite (RPS), built on MongoDB. RegData provides its customers with more than 120 protection techniques , including 60 anonymization techniques, as well as custom techniques (protection of IBANs, SSNs, emails, etc), giving them total control over how sensitive data is managed within each organization. For example, by working with RegData, financial institutions can configure their infrastructure to specific regulations, by masking, encrypting, tokenizing, anonymizing, or pseudonymizing data into compliance. With RPS, company-wide reports can be automatically generated for the regulating authorities (i.e., ACPR, ECB, EU-GDPR, FINMA, etc.). To illustrate the impact of RPS, and to debunk some common misconceptions, let’s explore before and after scenarios. Figure 1 shows the decentralized management of access control. Some data sources employ features such as Field Level Encryption (FLE) to shield data, restricting access to individuals with the appropriate key. Additionally, certain applications implement Role-Based Access Control (RBAC) to regulate data access within the application. Some even come with an Active Directory (AD) interface to try and centralize the configuration. Figure 1: Simplified architecture with no centralized access control However, each of these only addresses parts of the challenge related to encrypting the actual data and managing single-system access. Neither FLE nor RBAC can protect data that isn’t on their data source or application. Even centralizing efforts like the AD interface exclude older legacy systems that might not have interfacing functionalities. The result in all of these cases is a mosaic of different configurations in which silos stay silos, and modernization is risky and slow because the data may or may not be protected. RegData, with its RPS solution, can integrate with a plethora of different data sources as well as provide control regardless of how data is accessed, be it via the web, APIs, files, emails, or others. This allows organizations to configure RPS at a company level. All applications including silos can and should interface with RPS to protect all of the data with a single global configuration. Another important aspect of RPS is its functions with tokenization, allowing organizations to decide which columns or fields from a given data source should be encrypted according to specific standards and govern the access to corresponding tokens. Thanks to tokenization, RPS can track who accesses what data and when they access it at a company level, regardless of the data source or the application. This is easy enough to articulate but quite difficult to execute at a data level. To efficiently manage diverse data sources, fine-grained authorization, and implement different protection techniques, RegData builds RPS on top of MongoDB's flexible and document-oriented database. The road to modernization As noted, to fully leverage RegData’s RPS, all data sources should go through the RPS. RPS works like a data filter, putting in all of the information and extracting protected data on the other side, to modernize and innovate. Just integrating RegData means being able to make previously siloed data available by masking, encrypting, or anonymizing it before sending it out to other applications and systems. Together, RegData and MongoDB form a robust and proven solution for protecting data and modernizing operations within highly regulated industries. The illustration below shows the architecture of a private bank utilizing RPS. Data can only be seen in plain text to database admins when the request comes from the company’s headquarters. This ensures compliance with regulations, while still being able to query and search for data outside the headquarters. This bank goes a step further by migrating their Customer Relationship Management (CRM), core banking, Portfolio Management System (PMS), customer reporting, advisory, tax reporting, and other digital apps into the public cloud. This is achieved while still being compliant and able to automatically generate submittable audit reports to regulating authorities. Figure 2: Private bank business care Another possible modernization scheme—given RegData’s functionalities—is a hybrid cloud Operational Data Layer (ODL), using MongoDB Atlas . This architectural pattern acts as a bridge between consuming applications and legacy solutions. It centrally integrates and organizes siloed enterprise data, rendering it easily available. Its purpose is to offload legacy systems by providing alternative access to information for consuming applications, thereby breaking down data silos, decreasing latency, allowing scalability, flexibility, and availability, and ultimately optimizing operational efficiency and facilitating modernization. RegData integrates, protects, and makes data available, while MongoDB Atlas provides its inherent scalability, flexibility, and availability to empower developers to offload legacy systems. Figure 3: Example of ODL with both RegData and MongoDB In conclusion, in a world where finding the right solutions can be difficult, RegData provides a strategic solution for financial institutions to securely modernize. By combining RegData's regulatory protection and modern cloud platforms such as MongoDB Atlas, the collaboration takes on the modernizing challenge of highly regulated sectors. Are you prepared to harness these capabilities for your projects? Do you have any questions about this? Then please reach out to us at firstname.lastname@example.org or email@example.com You can also take a look at the following resources: Hybrid Cloud: Flexible Architecture for the Future of Financial Services Implementing an Operational Data Layer
Atlas Data Federation and Online Archive Can Now Be Deployed in Azure
Exciting developments are on the horizon for users of Microsoft Azure, marking a significant leap in data management capabilities. First off, Atlas Data Federation is now Generally Available on Azure. This means you can now deploy it directly within Azure and even query data from Microsoft Azure Blob Storage. And that's not all. We've also launched the General Availability of Atlas Online Archive on Azure. These advancements usher in a new era of efficient archiving solutions for Azure-based data solutions. Both updates are big steps forward in making data management on Azure more powerful and flexible. Let's dive into what this means for you! Azure support in Atlas Data Federation (General Availability) With Atlas Data Federation, users can seamlessly query, transform, and create views across multiple Atlas databases and cloud object storage solutions, such as Amazon S3 and now Microsoft Azure Blob Storage. This feature, previously exclusive to AWS, is a game-changer, allowing direct deployment within Azure and the ability to tap into Microsoft Azure Blob Storage for data insights. Figure 1: Tap into Azure Blob Storage easily from the Atlas UI Key features of Atlas Data Federation Cloud flexibility: Choose between AWS and Azure for hosting federated database instances. Diverse data sources: Incorporate MongoDB Atlas clusters or Azure storage solutions (Azure Blob Storage and Azure Data Lake Storage Gen2) as data sources for comprehensive queries, including cross-region. Advanced aggregation: Comprehensive aggregation capabilities with operators inclusive of $match, $lookup, $queryHistory, $merge, $out, etc. Direct $out support for Azure Blob Storage and Azure Data Lake Storage Gen2. Atlas SQL queries on Azure: Execute SQL queries on Azure, integrating MongoDB data for a unified analysis experience. Atlas Data Federation simplifies accessing and analyzing complex data sets by combining data across multiple sources into a single, federated view, providing valuable insights for more informed business decisions. Explore Atlas Data Federation on Azure Today . Azure support in Atlas Online Archive (General Availability) Atlas Online Archive's expansion to Azure ensures that data tiering is not only efficient but also integrated, keeping archival data within the Azure ecosystem. This integration addresses the previous limitation of defaulting to AWS for storage, even for Azure-hosted clusters. Figure 2: Seamlessly select Azure when choosing a region Key features of Atlas Online Archive Provider choice: Opt for AWS or Azure to align with your cloud strategy. Automatic archiving: Set rules to move older data to cost-effective cloud storage automatically, eliminating manual offloading. Unified querying endpoint: Access all data through a single endpoint, ensuring quick insights without compromising data availability. Integrated MongoDB Atlas UI management: Manage your data tiering and archiving within the familiar Atlas interface, streamlining operations and maintenance. Seamlessly manage your MongoDB Atlas data tiering at scale with Atlas Online Archive. Atlas Online Archive empowers you to manage your data lifecycle efficiently, balancing cost and accessibility with ease. Finally, here are a few points to consider: Any newly created archive on an Azure cluster on or after 02/28 will default to Azure regions. Note that storage regions for Online Archive will default to Azure clusters only if there are no pre-existing AWS archives on that specific Azure cluster If there are any pre-existing AWS Online Archives on Azure clusters, then all newly created archives on that specific cluster will remain on AWS. Cloud providers or storage regions cannot be edited or modified once configured Embrace the Full Potential of Atlas Online Archive on Azure Today . We're thrilled to support your data management journey, offering enhanced control and flexibility over your data through these new Azure capabilities. As MongoDB Atlas continues to expand as a multi-cloud solution, we're here to ensure your data strategy is as dynamic and versatile as your business needs. For guidance on getting started, check out our documentation on Atlas Data Federation or Atlas Online Archive . Thank you for trusting MongoDB Atlas as your Developer Data Platform. Welcome to the future of multi-cloud data management!
They Asked, We Answered: A Q&A on Joining MongoDB’s Remote Solutions Center
Our Remote Solutions Center (RSC) team offers those with technical backgrounds interested in working with customers an opportunity to jumpstart a career in pre-sales. We asked Soheyl Rafi, Solutions Architect and former Remote Solutions Center team member, some common questions candidates have about joining the team. What is the day-to-day like on the Remote Solutions Center team? Working in the Remote Solutions Center is a dynamic and ever-changing experience, with each day bringing unique challenges and opportunities. You can expect a blend of calls, hands-on activities, customer interactions, and problem-solving. The variety keeps the role super interesting. Part of the diverse work environment comes from the collaboration that this role inherently entails. Throughout the day, you’ll engage in various customer interactions such as discussing project requirements and challenges to proposing tailored solutions. Another aspect could be your involvement in Technical Feasibility Workshops with customers. This is where your deep technical knowledge comes into play. You’ll be addressing intricate technical questions, providing insights, and ensuring that our solutions align with the customer’s needs. Enablement also plays a pivotal part in the day-to-day. You will spend a lot of time learning new technologies and features released by our Product team, understanding the competition, and staying abreast of the market as a whole. All in all, the day-to-day work is extremely diverse, and you’ll need to both enjoy and be comfortable wearing multiple hats. What can I expect during the onboarding phase? When you join the RSC, you can expect a comprehensive onboarding experience that covers both technical and sales aspects. Our onboarding plan is designed to provide hands-on training, ensuring that you become familiar with MongoDB technology and business processes. From a technical standpoint, you’ll have access to detailed training sessions and resources. You’ll be guided through the intricacies of our products and services, allowing you to build a strong foundation in your technical knowledge. On the sales front, we have a tailored onboarding plan that focuses on honing the skills required for successful client engagement. This includes understanding our market positioning, customer needs, and effective sales strategies. You will be exposed to real-world scenarios and practical exercises to enhance your sales acumen. To complement your onboarding experience, each new team member is paired with a more senior peer within the RSC. This mentorship helps you begin to build relationships within the team and provides valuable insights and guidance on navigating the role effectively. In addition, RSC leadership recognizes the importance of learning from industry veterans. That’s why each new hire is also paired with a seasoned professional from outside the RSC who acts as a Solution Architecture buddy. This experienced mentor with tenure in the industry will offer a unique perspective and share valuable insights to accelerate your learning. To gain practical exposure, you will have the opportunity to shadow calls and workshops conducted by experienced team members. This hands-on approach allows you to observe firsthand how we conduct business, manage client interactions, and collaborate within the team. In summary, our onboarding program within the RSC is a holistic approach that combines technical training, sales development, mentorship, and practical experience. How often can I expect to collaborate with Solutions Architects (SAs) in the field? As an SA within the RSC, collaboration with SAs in the field is consistently high. We are strategically positioned around account activities and proactively engage with our counterparts in the field teams. Depending on the opportunity during an engagement, you might assist in the early stages of the sales cycle, such as discovery and demos, before passing the deal onto the field SA. Collaboration in these engagements is pivotal, requiring alignment to ensure a seamless handover to the field teams. What internal career development opportunities are there for me? Upon joining the RSC team, you’ll immerse yourself in a dynamic learning environment. Unlike traditional career structures, the RSC team fosters an environment where you are in the driver’s seat, dictating the pace of your internal career development. No artificial roadblocks hinder you from taking on more challenging and senior-level tasks. Your dedication, skills, and initiative are the primary determinants of how quickly you progress. While there are specific tasks you’ll work on in your day-to-day, you’ll also have a wide range of internal projects available to enhance your skill sets and advance your career within MongoDB. In a team with diverse interests and skills, you can choose projects that align with your passion and interests. You’ll find yourself in a situation with structured learning paths, mentorship programs, project leadership opportunities, and a career trajectory limited only by your aspirations. In my career journey, I’ve achieved my goal of growing from an Associate SA to being promoted to a Solutions Architect for our enterprise customers. I am now a dedicated Solutions Architect to one of our biggest financial customers, helping them in their digital transformation and expanding their MongoDB footprint. What new things will I learn if I join the team? The question should be, “What will you not learn?” Databases are at the center of every tech stack. You will be exposed to and gain an understanding of the entire tech stack, including the underlying infrastructure and the application that will be built on top of MongoDB. In your role, you’ll find yourself engaging with customers to discuss the various technologies used in application development, the infrastructure decisions made, and other database solutions that are either part of their tech stack or under consideration. Each customer employs different methodologies for developing software and utilizes various programming languages and solutions. It is pivotal as an SA in the RSC to comprehend these diverse solutions and effectively communicate them to our customers. Beyond the technical aspects, you will start to see things from a macro perspective. Understanding your customer’s business is crucial. You will need to learn how to align technical solutions with business objectives, considering factors such as budget, timelines, and return on investment. Learn more about applying your technical skills and engaging with customers as part of our Remote Solutions Center.
Building AI With MongoDB: Story Tools Studio Brings Gen AI To Gaming With Myth Maker AI
Should I Begin a Pre-Sales Career at MongoDB? Insights from Our Remote Solutions Center
Do you have a technical background and enjoy working with customers? Have you considered beginning a career in pre-sales? Aicha Sarr, Solutions Architect at MongoDB, shares insights into how our Remote Solutions Center offers a path to building a career in pre-sales. Read on to learn more about how our Remote Solutions Center team builds off of each other’s strengths, applies their technical expertise, and focuses on customer success. Diverse backgrounds, shared attributes Our Remote Solutions Center team values both diverse backgrounds and sharing common attributes. Team members possess a blend of technical expertise and a customer-focused mindset with a strong affinity for technology and an inherent curiosity to grasp knowledge related to MongoDB, databases, and complex technical concepts. We all come from a variety of backgrounds, such as data engineers, software developers, cloud architects, and sales engineers. While direct experience with MongoDB is beneficial, it's not mandatory. Similarly, a background as an engineer can be advantageous, but it's not strictly necessary. To succeed on the team, one needs a keen interest in the technology and architecture of systems, an appetite for learning, and a commitment to ensuring customer success. We are trusted advisors, collaborating closely with customers to design and implement reliable systems, making a strong technical alignment between the platform and customer needs that are pivotal for success. Hands-on application of technical skills I’m frequently engaged in projects that require me to apply my technical skills. Many of these projects have been focused on building reusable demos that are distributed to our global team, providing an opportunity for my work to be showcased during technical calls or workshops. For instance, a teammate and I developed a mobile application centered around MongoDB Atlas for the Edge , our edge computing solution. The project included demonstrating the synchronization of user data between intermittently internet-connected devices and a central database, MongoDB Atlas. This demo showcased the practical application of Atlas Edge computing, maintaining data consistency across various devices, reflecting the real-world implications of mobile application development and data synchronization challenges. Additionally, I’ve developed an application illustrating the implementation of a Client-Side Field Level Encryption (CSFLE)-enabled application using Amazon Web Services Key Management Service with Java Spring Boot as the programming language. This hands-on experience allowed me to showcase the robust security features of CSFLE, a feature that enables the encryption of data in an application before it is sent over the network to MongoDB - a tangible example of how to enhance application security within a cloud environment. In my customer calls, I’ve used both my demos and demos built by my colleagues. Hence, these hands-on experiences play a crucial role in enhancing our team's collective knowledge, capabilities, and resources. This collaboration between team members fosters a shared understanding and serves as a valuable learning opportunities, allowing us to disseminate the acquired knowledge internally and promote a culture of continuous learning and skill development. The customer engagement journey Our customer engagement journey begins with establishing strong relationships during initial contact and holding in-depth discussions to grasp the customer's business requirements. This phase serves as the foundation for crafting tailored proposals that outline effective solutions. Technical presentations elucidate proposed solutions' intricacies, involving iterative, collaborative discussions and adjustments based on continuous feedback from customer stakeholders, including IT teams and decision-makers. Transitioning to the implementation phase, Proof of Concept projects may be initiated for real-world solution applications. The seamless move to implementation teams ensures ongoing support, addressing post-implementation issues for continuous customer satisfaction. This holistic approach emphasizes effective communication, active listening, and building lasting partnerships grounded in trust. Engaging with diverse customer personas, each playing a distinct role in decision-making, involves tailored interactions. Technical decision-makers engage in in-depth technical discussions, addressing integration concerns and compatibility. Business decision-makers require presentations emphasizing business value, ROI, and alignment with overarching goals. Interactions with developers or data engineers involve technical implementation discussions and collaborative problem-solving in specific programming languages. These interactions with end users, project managers, and technical teams are essential for gathering insights into practical requirements, project timelines, and seamless implementation. The shared goal is to create solutions that meet business and technical requirements, ensuring feasibility, efficiency, and alignment with customer capabilities. Conclusion: A tapestry of skills and collaboration The success of our Remote Solutions Center lies in the tapestry of diverse backgrounds, shared attributes, and a commitment to continuous learning. The opportunities for hands-on application of technical skills not only strengthen our individual capabilities but also contribute to a collective knowledge pool within our team. The customer engagement process is a collaborative journey, with an emphasis on the importance of understanding and adapting to evolving customer needs. This holistic approach to team dynamics, skill development, and customer engagement positions our Remote Solutions Center for success in a rapidly changing technological landscape. Learn more about careers in solutions consulting at MongoDB.
Building AI with MongoDB: Accelerating App Development With the Codeium AI Toolkit
Of the many use cases set to be transformed by generative AI (gen AI), the bleeding edge of this revolution is underway with software development. Developers are using gen AI to improve productivity by writing higher-quality code faster. Tasks include autocompleting code, writing docs, generating tests, and answering natural language queries across a code base. How does this translate to adoption? A recent survey showed 44% of new code being committed was written by an AI code assistant. Codeium is one of the leaders in the fast-growing AI code assistant space. Its AI toolkit is used by hundreds of thousands of developers for more than 70 languages across more than 40 IDEs including Visual Studio Code, the JetBrains suite, Eclipse, and Jupyter Notebooks. The company describes its toolkit as “the modern coding superpower,” reflected by its recent $65 million Series B funding round and five-star reviews across extension marketplaces. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. As Anshul Ramachandran, Head of Enterprise & Partnerships at Codeium explains, “Codeium has been developed by a team of researchers and engineers to build on the industry-wide momentum around large language models, specifically for code. We realized that our specialized generative models, when deployed on our world-class optimized deep learning serving software, could provide users with top-quality AI-based products at the lowest possible costs (or ideally, free). The result of that realization is Codeium." Codeium has recently trained its models on MongoDB code, libraries, and documentation. Now developers building apps with MongoDB can install the Codeium extension on the IDE of their choice and enjoy rapid code completion and codebase-aware chat and search. Developers can stay in the flow while they build, coding at the speed of thought, knowing that Codeium has ingested MongoDB best practices and documentation. MongoDB is wildly popular across the developer community. This is because Atlas integrates the fully managed database services that provide a unified developer experience across transactional, analytical, and generative AI apps. Anshul Ramachandran, Head of Enterprise & Partnerships, Codeium Ramachandran, goes on to say, “MongoDB APIs are incredibly powerful, but due to the breadth and richness of the APIs, it is possible for developers to be spending more time than necessary looking through API documentation or using the APIs inefficiently for the task at hand. An AI assistant, if trained properly, can effectively assist the developer in retrieval and usage quality of these APIs. Unlike other AI code assistants, we at Codeium build our LLMs from scratch and own the underlying data layer. This means we accelerate and optimize the developer experience in unique and novel ways unmatched by others.” Figure 1: By simply typing statement names, the Codeium assistant will automatically provide code completion suggestions directly within your IDE. In its announcement blog post and YouTube video , the Codeium team shows how to build an app in VSCode with MongoDB serving as the data layer. Developers can ask questions on how to read and write to the database, get code competition suggestions, explore specific functions and syntax, handle errors, and more. This was all done at no cost using the MongoDB Atlas free tier and Codeium 100% free, forever individual plan. You can get started today by registering for MongoDB Atlas and then downloading the Codeium extension . If you are building your own AI app, sign up for the MongoDB AI Innovators Program . Successful applicants get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem.
Reducing Bias in Credit Scoring with Generative AI
Credit scoring plays a pivotal role in determining who gets access to credit and on what terms. Despite its importance, however, traditional credit scoring systems have long been plagued by a series of critical issues from biases and discrimination, to limited data consideration and scalability challenges. For example, a study of US loans showed that minority borrowers were charged higher interest rates (+8%) and rejected loans more often (+14%) than borrowers from more privileged groups. The rigid nature of credit systems means that they can be slow to adapt to changing economic landscapes and evolving consumer behaviors, leaving some individuals underserved and overlooked. To overcome this, banks and other lenders are looking to adopt artificial intelligence to develop increasingly sophisticated models for scoring credit risk. In this article, we'll explore the fundamentals of credit scoring, the challenges current systems present, and delve into how artificial intelligence (AI), in particular, generative AI (genAI) can be leveraged to mitigate bias and improve accuracy. From the incorporation of alternative data sources to the development of machine learning (ML) models, we'll uncover the transformative potential of AI in reshaping the future of credit scoring. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. What is credit scoring? Credit scoring is an integral aspect of the financial landscape, serving as a numerical gauge of an individual's creditworthiness. This vital metric is employed by lenders to evaluate the potential risk associated with extending credit or lending money to individuals or businesses. Traditionally, banks rely on predefined rules and statistical models often built using linear regression or logistic regression. The models are based on historical credit data, focusing on factors such as payment history, credit utilization, and length of credit history. However, assessing new credit applicants poses a challenge, leading to the need for more accurate profiling. To cater to the underserved or unserved segments traditionally discriminated against, fintechs and digital banks are increasingly incorporating information beyond traditional credit history with alternative data to create a more comprehensive view of an individual's financial behavior. Challenges with traditional credit scoring Credit scores are integral to modern life because they serve as a crucial determinant in various financial transactions, including securing loans, renting an apartment, obtaining insurance, and even sometimes in employment screenings. Because the pursuit of credit can be a labyrinthine journey, here are some of the challenges or limitations with traditional credit scoring models that often cloud the path to credit application approval. Limited credit history: Many individuals, especially those new to the credit game, encounter a significant hurdle – limited or non-existent credit history. Traditional credit scoring models heavily rely on past credit behavior, making it difficult for individuals without a robust credit history to prove their creditworthiness. Roughly 45 million Americans lack credit scores simply because those data points do not exist for them. Inconsistent income: Irregular income, typical in part-time work or freelancing, poses a challenge for traditional credit scoring models, potentially labeling individuals as higher risk and leading to application denials or restrictive credit limits. In 2023 in the United States , data sources differ on how many people are self-employed. One source shows more than 27 million Americans filed Schedule C tax documents, which cover net income or loss from a business – highlighting the need for different methods of credit scoring for those self-employed. High utilization of existing credit: Heavy reliance on existing credit is often perceived as a signal of potential financial strain, influencing credit decisions. Credit applications may face rejection or approval with less favorable terms, reflecting concerns about the applicant's ability to judiciously manage additional credit. Lack of clarity in rejection reasons: Understanding the reasons behind rejections hinders applicants from addressing the root causes – in the UK, a study between April 2022 and April 2023 showed the main reasons for rejection included “poor credit history” (38%), “couldn’t afford the repayments” (28%), “having too much other credit" (19%) and 10% said they weren’t told why. The reasons even when given are often too vague which leaves applicants in the dark, making it difficult for them to address the root cause and enhance their creditworthiness for future applications. The lack of transparency is not only a trouble for customers, it can also lead to a penalty for banks. For example, a Berlin bank was fined €300k in 2023 for lacking transparency in declining a credit card application. Lack of flexibility: Shifts in consumer behavior, especially among younger generations preferring digital transactions, challenge traditional models. Factors like the rise of the gig economy, non-traditional employment, student loan debt, and high living costs complicate assessing income stability and financial health. Traditional credit risk predictions are limited during unprecedented disruptions like COVID-19, not taking this into account in scoring models. Recognizing these challenges highlights the need for alternative credit scoring models that can adapt to evolving financial behaviors, handle non-traditional data sources, and provide a more inclusive and accurate assessment of creditworthiness in today's dynamic financial landscape. Credit scoring with alternative data Alternative credit scoring refers to the use of non-traditional data sources (aka. alternative data) and methods to assess an individual's creditworthiness. While traditional credit scoring relies heavily on credit history from major credit bureaus, alternative credit scoring incorporates a broader range of factors to create a more comprehensive picture of a person's financial behavior. Below are some of the popular alternative data sources: Utility payments: Beyond credit history, consistent payments for utilities like electricity and water offer a powerful indicator of financial responsibility and reveal a commitment to meeting financial obligations, providing crucial insights beyond traditional metrics. Rental history: For those without a mortgage, rental payment history emerges as a key alternative data source. Demonstrating consistent and timely rent payments paints a comprehensive picture of financial discipline and reliability. Mobile phone usage patterns: The ubiquity of mobile phones unlocks a wealth of alternative data. Analyzing call and text patterns provides insights into an individual's network, stability, and social connections, contributing valuable information for credit assessments. Online shopping behavior: Examining the frequency, type, and amount spent on online purchases offers valuable insights into spending behaviors, contributing to a more nuanced understanding of financial habits. Educational and employment background: Alternative credit scoring considers an individual's educational and employment history. Positive indicators, such as educational achievements and stable employment, play a crucial role in assessing financial stability. These alternative data sources represent a shift towards a more inclusive, nuanced, and holistic approach to credit assessments. As financial technology continues to advance, leveraging these alternative data sets ensures a more comprehensive evaluation of creditworthiness, marking a transformative step in the evolution of credit scoring models. Alternative credit scoring with artificial intelligence Besides the use of alternative data, the use of AI as an alternative method has emerged as a transformative force to address the challenges of traditional credit scoring, for a number of reasons: Ability to mitigate bias: Like traditional statistical models, AI models, including LLMs, trained on historical data that are biased will inherit biases present in that data, leading to discriminatory outcomes. LLMs might focus on certain features more than others or may lack the ability to understand the broader context of an individual's financial situation leading to biased decision-making. However, there are various techniques to mitigate the bias of AI models: Mitigation strategies: Initiatives begin with the use of diverse and representative training data to avoid reinforcing existing biases. Inadequate or ineffective mitigation strategies can result in biased outcomes persisting in AI credit scoring models. Careful attention to the data collected and model development is crucial in mitigating this bias. Incorporating alternative data for credit scoring plays a critical role in reducing biases. Rigorous bias detection tools, fairness constraints, and regularization techniques during training enhance model accountability: Balancing feature representation and employing post-processing techniques and specialized algorithms contribute to bias mitigation. Inclusive model evaluation, continuous monitoring, and iterative improvement, coupled with adherence to ethical guidelines and governance practices, complete a multifaceted approach to reducing bias in AI models. This is particularly significant in addressing concerns related to demographic or socioeconomic biases that may be present in historical credit data. Regular bias audits: Conduct regular audits to identify and mitigate biases in LLMs. This may involve analyzing model outputs for disparities across demographic groups and adjusting the algorithms accordingly. Transparency and explainability: Increase transparency and explainability in LLMs to understand how decisions are made. This can help identify and address biased decision-making processes. Trade Ledger , a lending software as a service (SaaS) tool, uses a data-driven approach to make informed decisions with greater transparency and traceability by bringing data from multiple sources with different schemas into a single data source. Ability to analyze vast and diverse datasets: Unlike traditional models that rely on predefined rules and historical credit data, AI models can process a myriad of information, including non-traditional data sources, to create a more comprehensive assessment of an individual's creditworthiness, ensuring that a broader range of financial behaviors is considered. AI brings unparalleled adaptability to the table: As economic conditions change and consumer behaviors evolve, AI-powered models can quickly adjust and learn from new data. The continuous learning aspect ensures that credit scoring remains relevant and effective in the face of ever-changing financial landscapes. The most common objections from banks to not using AI in credit scoring are transparency and explainability in credit decisions. The inherent complexity of some AI models, especially deep learning algorithms, may lead to challenges in providing clear explanations for credit decisions. Fortunately, the transparency and interpretability of AI models have seen significant advancements. Techniques like SHapley Additive exPlanations (SHAP) values and Local Interpretable Model-Agnostic Explanations (LIME) plots and several other advancements in the domain of Explainable AI (XAI) now allow us to understand how the model arrives at specific credit decisions. This not only enhances trust in the credit scoring process but also addresses the common critique that AI models are "black boxes." Understanding the criticality of leveraging alternative data that often comes in a semi or unstructured format, financial institutions work with MongoDB to enhance their credit application processes with a faster, simpler, and more flexible way to make payments and offer credit: Amar Bank, Indonesia's leading digital bank , is combatting bias by providing microloans to people who wouldn’t be able to get financial services from traditional banks (unbanked and underserved). Traditional underwriting processes were inadequate for customers lacking credit history or collateral so they have streamlined lending decisions by harnessing unstructured data. Leveraging MongoDB Atlas, they developed a predictive analytics model integrating structured and unstructured data to assess borrower creditworthiness. MongoDB's scalability and capability to manage diverse data types were instrumental in expanding and optimizing their lending operations. For the vast majority of Indians, getting credit is typically challenging due to stringent regulations and a lack of credit data. Through the use of modern underwriting systems, slice, a leading innovator in India’s fintech ecosystem , is helping broaden the accessibility to credit in India by streamlining their KYC process for a smoother credit experience. By utilizing MongoDB Atlas across different use cases, including as a real-time ML feature store, slice transformed their onboarding process, slashing processing times to under a minute. slice uses the real-time feature store with MongoDB and ML models to compute over 100 variables instantly, enabling credit eligibility determination in less than 30 seconds. Transforming credit scoring with generative AI Besides the use of alternative data and AI in credit scoring, GenAI has the potential to revolutionize credit scoring and assessment with its ability to create synthetic data and understand intricate patterns, offering a more nuanced, adaptive, and predictive approach. GenAI’s capability to synthesize diverse data sets addresses one of the key limitations of traditional credit scoring – the reliance on historical credit data. By creating synthetic data that mirrors real-world financial behaviors, GenAI models enable a more inclusive assessment of creditworthiness. This transformative shift promotes financial inclusivity, opening doors for a broader demographic to access credit opportunities. Adaptability plays a crucial role in navigating the dynamic nature of economic conditions and changing consumer behaviors. Unlike traditional models which struggle to adjust to unforeseen disruptions, GenAI’s ability to continuously learn and adapt ensures that credit scoring remains effective in real-time, offering a more resilient and responsive tool for assessing credit risk. In addition to its predictive prowess, GenAI can contribute to transparency and interpretability in credit scoring. Models can generate explanations for their decisions, providing clearer insights into credit assessments, and enhancing trust among consumers, regulators, and financial institutions. One key concern however in making use of GenAI is the problem of hallucination, where the model may present information that is either nonsensical or outright false. There are several techniques to mitigate this risk and one approach is using the Retrieval Augment Generation (RAG) approach. RAG minimizes hallucinations by grounding the model’s responses in factual information from up-to-date sources, ensuring the model’s responses reflect the most current and accurate information available. Patronus AI for example leverages RAG with MongoDB Atlas to enable engineers to score and benchmark large language models (LLMs) performance on real-world scenarios, generate adversarial test cases at scale, and monitor hallucinations and other unexpected and unsafe behavior. This can help to detect LLM mistakes at scale and deploy AI products safely and confidently. Another technology partner of MongoDB is Robust Intelligence . The firm’s AI Firewall protects LLMs in production by validating inputs and outputs in real-time. It assesses and mitigates operational risks such as hallucinations, ethical risks including model bias and toxic outputs, and security risks such as prompt injections and personally identifiable information (PII) extractions. As generative AI continues to mature, its integration into credit scoring and the broader credit application systems promises not just a technological advancement, but a fundamental transformation in how we evaluate and extend credit. A pivotal moment in the history of credit The convergence of alternative data, artificial intelligence, and generative AI is reshaping the foundations of credit scoring, marking a pivotal moment in the financial industry. The challenges of traditional models are being overcome through the adoption of alternative credit scoring methods, offering a more inclusive and nuanced assessment. Generative AI, while introducing the potential challenge of hallucination, represents the forefront of innovation, not only revolutionizing technological capabilities but fundamentally redefining how credit is evaluated, fostering a new era of financial inclusivity, efficiency, and fairness. If you would like to discover more about building AI-enriched applications with MongoDB, take a look at the following resources: Digitizing the lending and leasing experience with MongoDB Deliver AI-enriched apps with the right security controls in place, and at the scale and performance users expect Discover how slice enables credit approval in less than a minute for millions
Together AI: Advancing the Frontier of AI With Open Source Embeddings, Inference, and MongoDB Atlas
Founded in San Francisco in 2022, Together AI is on a mission to create the fastest cloud platform for building and running generative AI (gen AI). The company has so far raised over $120 million, counting Nvidia, Kleiner Perkins, Lux, and NEA as investors. Ce Zhang, Founder & CTO at Together AI says, “Together AI is a research-driven artificial intelligence company. We contribute leading open-source research, models, and datasets to advance the frontier of AI. Our cloud services empower developers and researchers at organizations of all sizes to train, fine-tune, and deploy generative AI models. We believe open and transparent AI systems will drive innovation and create the best outcomes for society." Check out our AI resource page to learn more about building AI-powered apps with MongoDB. The company has recently introduced its Together Embeddings endpoint — a new service for developers building a variety of applications, including one that is top of mind for nearly all gen AI-powered apps: retrieval-augmented generation (RAG) . With the RAG pattern, developers can feed gen AI models with their own up-to-date, domain-specific data. The results are more reliable gen AI outputs that are customized for the business along with reduced risks of hallucinations. The Together Embeddings endpoint offers access to eight leading open-source embedding models at up to 12x cheaper price than proprietary alternatives. The list of the models includes top models from the MTEB leaderboard (Massive Text Embedding Benchmark), such as UAE-Large-v1 and BGE models, and state-of-the-art long context retrieval models. Together Embeddings also offers integrations to MongoDB Atlas , LangChain, and LlamaIndex for RAG. To demonstrate this integration, the engineering team at Together AI created a tutorial for developers exploring how to build a RAG application with MongoDB Atlas. This tutorial shows how to use Together Embeddings and Together Inference to generate embeddings and language responses. Atlas Vector Search is used to store and index embeddings and then perform semantic search to retrieve relevant data examples for natural language queries against a sample Airbnb listing dataset. With this RAG pattern, the gen AI model can recommend properties that meet the user’s criteria while adhering to factual information. We prioritized integrating with MongoDB because of its relevance and importance in the AI stack. Vipul Ved Prakash, Founder & CEO at Together AI “Bringing together live application data synchronized right alongside vector embeddings in a single platform, MongoDB Atlas helps developers reduce complexity and cost, and bring cutting-edge apps to market faster,” says Prakash. “This is one example, and we are looking forward to seeing many amazing applications that will be built using Together AI and MongoDB’s Atlas Vector Search.” To learn more about its RAG integrations, take a look at Together AI’s documentation . To get started with MongoDB and Together AI, register for MongoDB Atlas and read the tutorial . If your team is building AI apps, sign up for the AI Innovators Program . Successful companies get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem.
Enhanced Atlas Functionality: Introducing Resource Tagging for Projects
We are thrilled to announce that Atlas has now extended its tagging functionality to include projects in addition to deployments . This enhancement enables users to apply resource tags to projects, further enriching the way you can associate metadata with your cloud resources. With this new capability, categorizing, organizing, and tracking your projects within Atlas becomes more intuitive and effective, offering a streamlined approach to managing your resources. Enhancing project management with resource tagging Incorporating resource tagging into projects significantly enhances visibility and streamlines project management. By applying tags, teams can categorize resources, making it easier to understand the purpose or specific metadata associated with a project. This practice is especially beneficial in large-scale projects, where organizing resources systematically can vastly improve productivity. Tags serve as versatile markers, representing various attributes of a project such as environment, criticality, cost center, or application, thereby simplifying project organization. Furthermore, tags lay the groundwork for supporting automation and policy enforcement within organizations. By utilizing tags, tasks related to access controls, compliance, and other policies can be automated, enhancing operational efficiency. Auditing processes also benefit from tagging, facilitating tracking, and ensuring resources meet specific business requirements. In environments where teamwork is essential, adding tags to projects aids in streamlined collaboration. Tags allow team members to quickly grasp the purpose or function of different resources, surfacing critical information about the project that can help reduce miscommunication and conflicts. Overall, adopting resource tagging in cloud resource management unlocks significant improvements in performance and efficiency, making it an invaluable tool for modern organizational needs. How to add tags to projects You can view and manage tagging on projects in multiple areas: Atlas UI: When creating a new project , on the Organization Project List, or within Project Settings. Admin API: Various operations on projects were enhanced to allow you to view, create, and manage tags applied to projects, such as CreateOneProject and ReturnAllProjects . Atlas CLI: various commands on projects were enhanced to all you to view, create, and manage tags applied to projects. Resource tagging best practices We recognize that the complexity of tagging use cases varies, tailored to an organization's unique structure and specific business requirements. With this in mind, we’ve designed resource tagging in Atlas to support a variety of use cases. We suggest defining tags that should be applied across all projects to get started. This will ensure your tagging approach is reliable and consistent across all resources. If you have multiple deployments within a project, apply more granular metadata on each deployment. In the simplified example below, an organization has three projects containing one or more deployments. Each project contains a deployment for each development environment. We’ve added common tags to the projects and more granular tags to identify the environment at the deployment level. Given the uniqueness of each organization, we've designed a flexible system with simplicity at its heart, using key-value pairs. If you have a flatter organization structure in Atlas (e.g. with one deployment per project), consider adding all tags at the level that makes the most sense for your organization. This may vary depending on how you manage your deployments, existing tag workflows, or where you desire to view tags in the Atlas UI. Finally, here are a few points to consider when tagging: Do not include any sensitive information such as Personally Identifiable Information (PII) or Protected Health Information (PHI) in your resource tag keys or values. Use a standard naming convention for all tags, including spelling, case, and punctuation. Define and communicate a strategy for enforcing mandatory tags. We recommend starting by identifying the environment and the application, service, or workload. Use namespaces or prefixes to easily identify tags owned by different business units. Use programmatic tools like Terraform or the Admin API to manage the database of your tags. In summary The introduction of resource tagging for projects marks an improvement in how users can intuitively categorize, organize, and track projects within Atlas, streamlining cloud resource management. We're eager to hear your thoughts and ideas on further applications of resource tagging in Atlas. Please share your feedback and suggestions at feedback.mongodb.com , as your input is invaluable in shaping the future of our platform.
Building AI with MongoDB: Putting Jina AI’s Breakthrough Open Source Embedding Model To Work
Founded in 2020 and based in Berlin, Germany, Jina AI has swiftly risen as a leader in multimodal AI, focusing on prompt engineering and embedding models. With its commitment to open-source and open research, Jina AI is bridging the gap between advanced AI theory and the real world AI-powered applications being built by developers and data scientists. Over 400,000 users are registered to use the Jina AI platform. Dr. Han Xiao, Founder and CEO at Jina AI, describes the company’s mission: “We envision paving the way towards the future of AI as a multimodal reality. We recognize that the existing machine learning and software ecosystems face challenges in handling multimodal AI. As a response, we're committed to developing pioneering tools and platforms that assist businesses and developers in navigating these complexities. Our vision is to play a crucial role in helping the world harness the vast potential of multimodal AI and truly revolutionize the way we interpret and interact with information." Jina AI’s work in embedding models has caught significant industry interest. As many developers now know, embeddings are essential to generative AI (gen AI). Embedding models are sophisticated algorithms that transform and embed data of any structure into multi-dimensional numerical encodings called vectors. These vectors give data semantic meaning by capturing its patterns and relationships. This means we can analyze and search for unstructured data in the same way we’ve always been able to with structured business data. Considering that over 80% of the data we create every day is unstructured, we start to appreciate how transformational embeddings — when combined with a powerful solution such as MongoDB Atlas Vector Search — are for gen AI. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Jina AI's jina-embeddings-v2 is the first open-source 8K text embedding model. Its 8K token length provides deeper context comprehension, significantly enhancing accuracy and relevance for tasks like retrieval-augmented generation (RAG) and semantic search . Jina AI’s embeddings offer enhanced data indexing and search capabilities, along with bilingual support. The embedding models are focused on singular languages and language pairs, ensuring state-of-the-art performance on language-specific benchmarks. Currently, Jina Embeddings v2 includes bilingual German-English and Chinese-English models, with other bilingual models in the works. Jina AI’s embedding models excel in classification, reranking, retrieval, and summarization, making them suitable for diverse applications, especially those that are cross-lingual. Recent examples from multinational enterprise customers include the automation of sales sequences, skills matching in HR applications, and payment reconciliation with fraud detection. Figure 1: Jina AI’s world-class embedding models improve search and RAG systems. In our recently published Jina Embeddings v2 and MongoDB Atlas article we show developers how to get started in bringing vector embeddings into their apps. The article covers: Creating a MongoDB Atlas instance and loading it with your data. (The article uses a sample Airbnb reviews data set.) Creating embeddings for the data set using the Jina Embeddings API. Storing and indexing the embeddings with Atlas Vector Search. Implementing semantic search using the embeddings. Dr. Xiao says, “Our Embedding API is natively integrated with key technologies within the gen AI developer stack including MongoDB Atlas, LangChain, LlamaIndex, Dify, and Haystack. MongoDB Atlas unifies application data and vector embeddings in a single platform, keeping both fully synced. Atlas Triggers keeps embeddings fresh by calling our Embeddings API whenever data is inserted or updated in the database. This integrated approach makes developers more productive as they build new, cutting-edge AI-powered apps for the business.” To get started with MongoDB and Jina AI, register for MongoDB Atlas and read the tutorial . If your team is building its AI apps, sign up for the AI Innovators Program . Successful companies get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem.
Building AI with MongoDB: Navigating the Path From Predictive to Generative AI
It should come as no surprise that the organizations unlocking the largest benefits from generative AI (gen AI) today have already been using predictive AI (a.k.a. classic, traditional, or analytical AI). McKinsey made this same observation back in June 2023 with its “Economic Potential of Generative AI 1 ” research. There would seem to be several reasons for this: An internal culture that is willing to experiment and explore what AI can do Access to skills — though we must emphasize that gen AI is way more reliant on developers than the data scientists driving predictive AI Availability of clean and curated data from across the organization that is ready to be fed into genAI models This doesn’t mean to say that only those teams with prior experience in predictive AI stand to benefit from gen AI. If you take a look at examples from our Building AI case study series , you’ll see many organizations with different AI maturity levels tapping MongoDB for gen AI innovation today. In this latest edition of the Building AI series, we feature two companies that, having built predictive AI apps, are now navigating the path to generative AI: MyGamePlan helps professional football players and coaches improve team performance. Ferret.ai helps businesses and consumers build trust by running background checks using public domain data. In both cases, Predictive AI is central to data-driven decision-making. And now both are exploring gen AI to extend their services with new products that further deepen user engagement. The common factor for both? Their use of MongoDB Atlas and its flexibility for any AI use case. Let's dig in. MyGamePlan: Elevating the performance of professional football players with AI-driven insights The use of data and analytics to improve the performance of professional athletes isn’t new. Typically, solutions are highly complex, relying on the integration of multiple data providers, resulting in high costs and slow time-to-insight. MyGamePlan is working to change that for professional football clubs and their players. (For the benefit of my U.S. colleagues, where you see “football” read “soccer.”) MyGamePlan is used by staff and players at successful teams across Europe, including Bayer Leverkusen (current number one in the German Bundesliga), AFC Sunderland in the English Championship, CD Castellón (current number one in the third division of Spain), and Slask Wroclaw (the current number one in the Polish Ekstraklasa). I met with Dries Deprest, CTO and co-founder at MyGamePlan who explains, “We redefine football analysis with cutting-edge analytics, AI, and a user-friendly platform that seamlessly integrates data from match events, player tracking, and video sources. Our platform automates workflows, allowing coaches and players to formulate tactics for each game, empower player development, and drive strategic excellence for the team's success.” At the core of the MyGamePlay platform are custom, Python-based predictive AI models hosted in Amazon Sagemaker. The models analyze passages of gameplay to score the performance of individual players and their impact on the game. Performance and contribution can be tracked over time and used to compare with players on opposing teams to help formulate matchday tactics. Data is key to making the models and predictions accurate. The company uses MongoDB Atlas as its database, storing: Metadata for each game, including matches, teams, and players. Event data from each game such as passes, tackles, fouls, and shots. Tracking telemetry that captures the position of each player on the field every 100ms. This data is pulled from MongoDB into Python DataFrames where it is used alongside third-party data streams to train the company’s ML models. Inferences generated from specific sequences of gameplay are stored back in MongoDB Atlas for downstream analysis by coaches and players. Figure 1: With MyGamePlans web and mobile apps, coaching staff, and players can instantly assess gameplay and shape tactics. On selecting MongoDB, Deprest says, We are continuously enriching data with AI models and using it for insights and analytics. MongoDB is a great fit for this use case. “We chose MongoDB when we started our development two years ago. Our data has complex multi-way relationships, mapping games to players to events and tracking. The best way to represent this data is with nested elements in rich document data structures. It's way more efficient for my developers to work with and for the app to process. Trying to model these relationships with foreign keys and then joining normalized tables in relational databases would be slow and inefficient.” In terms of development, Deprest says, “We use the PyMongo driver to integrate MongoDB with our Python ML data pipelines in Sagemaker and the MongoDB Node.js driver for our React-based, client-facing web and mobile apps.” Deprest goes on to say, "There are two key factors that differentiate MongoDB from NoSQL databases we also considered: the incredible level of developer adoption it has, meaning my team was immediately familiar and productive with it. And we can build in-app analytics directly on top of our live data, without the time and expense of having to move it out into some data warehouse or data lake. With MongoDB’s aggregation pipelines , we can process and analyze data with powerful roll-ups, transformations, and window functions to slice and dice data any way our users need it." Moving beyond predictive AI, the MyGamePlan team is now evaluating how gen AI can further improve user experience. Deprest says, "We have so much rich data and analytics in our platform, and we want to make it even easier for players and coaches to extract insights from it. We are experimenting with natural language processing via chat and question-answering interfaces on top of the data. Gen AI makes it easy for users to visualize and summarize the data. We are currently evaluating OpenAI’s ChatGPT LLM coupled with sophisticated approaches to prompt engineering, orchestration via Langchain, and retrieval augmented generation (RAG) using LlamaIndex and MongoDB Atlas Vector Search ." As our source data is in the MongoDB Atlas database already, unifying it with vector storage and search is a very productive and elegant solution for my developers. Dries Deprest, CTO and Co-founder, MyGamePlan By building on MongoDB Atlas, MyGamePlan’s team can use the breadth of functionality provided by a developer data platform to support almost any application and AI needs in the future. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Ferret.ai: Building trust with relationship intelligence powered by AI and MongoDB Atlas while cutting costs by 30% Across the physical and digital world, we are all constantly building relationships with others. Those relationships can be established through peer-to-peer transactions across online marketplaces, between tradespeople and professionals with their prospective clients, between investors and founders, or in creating new personal connections. All of those relationships rely on trust to work, but building it is hard. Ferret.ai was founded to remove the guesswork from building that trust. Ferret is an AI platform architected from the ground up to empower companies and individuals with real-time, unbiased intelligence to identify risks and embrace opportunities. Leveraging cutting-edge predictive and generative AI, hundreds of thousands of global data sources, and billions of public documents, Ferret.ai provides curated relationship intelligence and monitoring — once only available to the financial industry — making transparency the new norm. Al Basseri, CTO at Ferret tells us how it works: "We ingest information about individuals from public sources. This includes social networks, trading records, court documents, news archives, corporate ownership, and registered business interests. This data is streamed through Kafka pipelines into our Anyscale/Ray MLops platform where we apply natural language processing through our spaCy extraction and machine learning models. All metadata from our data sources — that's close to three billion documents — along with inferences from our models are stored in MongoDB Atlas . The data in Atlas is consumed by our web and mobile customer apps and by our corporate customers through our upcoming APIs." Figure 2: Artificial intelligence + real-time data = Relationship Intelligence from Ferret.ai. Moving beyond predictive AI, the company’s developers are now exploring opportunities to use gen AI in the Ferret platform. "We have a close relationship with the data science team at Nvidia,” says Basseri. “We see the opportunity to summarize the data sources and analysis we provide to help our clients better understand and engage with their contacts. Through our experimentation, the Mistral model with its mixture-of-experts ensemble seems to give us better results with less resource overhead than some of the larger and more generic large language models." As well as managing the data from Ferret’s predictive and gen AI models, customer data and contact lists are also stored in MongoDB Atlas. Through Ferret’s continuous monitoring and scoring of public record sources, any change in an individual's status is immediately detected. As Basseri explains, " MongoDB Atlas Triggers watch for updates to a score and instantly send an alert to consuming apps so our customers get real-time visibility into their relationship networks. It's all fully event-driven and reactive, so my developers just set it and forget it." Basseri also described the other advantages MongoDB provides his developers: Through Atlas, it’s available as a fully managed service with best practices baked in. That frees his developers and data scientists from the responsibilities of running a database so they can focus their efforts on app and AI innovation MongoDB Atlas is mature, having seen it scale in many other high-growth companies The availability of engineers who know MongoDB is important as the team rapidly expands Beyond the database, Ferret is extending its use of the MongoDB Atlas platform into text search. As the company moves into Google Cloud, it is migrating from its existing Amazon OpenSearch service to Atlas Search . Discussing the drivers for the migration, Basseri says, "Unifying both databases and search behind a single API reduces cognitive load for my developers, so they are more productive and build features faster. We eliminate all of the hassle of syncing data between database and search. Again, this frees up engineering cycles. It also means our users get a better experience because previous latency bottlenecks are gone — so as they search across contacts and content on our platform, they get the freshest results, not stale and outdated data." By migrating from OpenSearch to Atlas Search, we also save money and get more freedom. We will reduce our total cloud costs by 30% per month just by eliminating unnecessary data duplication between the database and the search engine. And with Atlas being multi-cloud, we get the optionality to move across cloud providers as and when we need to. Al Basseri, CTO at Ferret.ai Once the migration is complete, Basseri and the team will begin development with Atlas Vector Search as they continue to build out the gen AI side of the Ferret platform. What's next? No matter where you are in your AI journey, MongoDB can help. You can get started with your AI-powered apps by registering for MongoDB Atlas and exploring the tutorials available in our AI resources center . Our teams are always ready to come and explore the art of the possible with you. 1 https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
Atlas Stream Processing is Now in Public Preview
This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국인 , 简体中文 . Today, we’re excited to announce that Atlas Stream Processing is now in public preview. Any developer on Atlas interested in giving it a try has full access. Learn more in our docs or get started today. Listen to the MongoDB Podcast to learn about the Atlas Stream Processing public preview from Head of Streaming Products, Kenny Gorman. Developers love the flexibility and ease of use of the document model, alongside the Query API, which allows them to work with data as code in MongoDB Atlas. With Atlas Stream Processing, we are bringing these same foundational principles to stream processing. A report covering the topic published by S&P Global Market Intelligence 451 Research had this to say, “A unified approach to leveraging data for application development — the direction of travel for MongoDB — is particularly valuable in the context of stream processing where operational and development complexity has proven a significant barrier to adoption. First announced at .local NYC 2023, Atlas Stream Processing is redefining the experience of aggregating and enriching streams of high velocity, rapidly changing event data, and unifying how to work with data in motion and at rest. How are developers using the product so far? And what have we learned? During the private preview, we saw thousands of development teams request access and we have gathered useful feedback from hundreds of engaged teams. One of those engaged teams is marketing technology leader, Acoustic : "At Acoustic, our key focus is to empower brands with behavioral insights that enable them to create engaging, personalized customer experiences. To do so, our Acoustic Connect platform must be able to efficiently process and manage millions of marketing, behavioral, and customer signals as they occur. With Atlas Stream Processing, our engineers can leverage the skills they already have from working with data in Atlas to process new data continuously, ensuring our customers have access to real-time customer insights." John Riewerts, EVP, Engineering at Acoustic Other interesting use cases include: A leading global airline using complex aggregations to rapidly process maintenance and operations data, ensuring on-time flights for their thousands of daily customers, A large manufacturer of energy equipment using Atlas Stream Processing to enable continuous monitoring of high volume pump data to avoid outages and optimize their yields, and An innovative enterprise SaaS provider leveraging the rich processing capabilities in Atlas Stream Processing to deliver timely and contextual in-product alerts to drive improved product engagement. These are just a few of the many use case examples that we’re seeing across industries. Beyond the use cases we’ve already seen, developers are giving us tons of insight into what they’d like to see us add to in the future. In addition to enabling continuous processing of data in Atlas databases through change streams, it’s exciting to see developers using Atlas Stream Processing with their Kafka data hosted by valued partners like Confluent , Amazon MSK , Azure Event Hubs , and Redpanda . Our aim with developer data platform capabilities in Atlas has always been to make for a better experience across the key technologies relied on by developers. What’s new in the public preview? That brings us to what’s new. As we scale to more teams, we’re expanding functionality to include the most requested feedback gathered in our private preview. From the many pieces of feedback received, three common themes emerged: Refining the developer experience Expanding advanced features and functionality Improving operations and security Refining the developer experience In private preview, we established the core of the developer experience that is essential to making Atlas Stream Processing a natural solution for development teams. And in public preview, we’re doubling down on this by making two additional enhancements: VS Code integration The MongoDB VS Code plugin has added support for connecting to Stream Processing instances. For developers already leveraging the plugin, teams can create and manage processors in a familiar development environment. This means less time switching between tools and more time building your applications! Improved dead letter queue (DLQ) capabilities DLQ support is a key element for powerful stream processing and in public preview, we’re expanding DLQ capabilities. DLQ messages will now display themselves when executing pipelines with sp.process() and when running .sample() on running processors, allowing for a more streamlined development experience that does not require setting up a target collection to act as a DLQ. Expanding advanced features and functionality Atlas Stream Processing already supported many of the key aggregation operators developers are familiar with in the Query API used with data at rest. We've now added powerful windowing capabilities and the ability to easily merge and emit data to an Atlas database or to a Kafka topic. Public preview will add even more functionality demanded by the most advanced teams relying on stream processing to deliver customer experiences: $lookup Developers can now enrich documents being processed in a stream processor with data from remote Atlas clusters, performing joins against fields from the document and the target collection. Change streams pre- and post-imaging Many developers are using Atlas Stream Processing to continuously process data in Atlas databases as a source through change streams. We have enhanced the change stream $source in public preview with support for pre- and post-images . This enables common use cases where developers need to calculate deltas between fields in documents as well as use cases requiring access to the full contents of a deleted document. Conditional routing with dynamic expressions in merge and emit stages Conditional routing lets developers use the value of fields in documents being processed in Atlas Stream Processing to dynamically send specific messages to different Atlas collections or Kafka topics. The $merge and $emit stages also now support the use of dynamic expressions. This makes it possible to use the Query API for use cases requiring the ability to fork messages to different collections or topics as needed. Idle stream timeouts Streams without advancing watermarks due to a lack of inbound data can now be configured to close after a period of time emitting the results of the windows. This can be critical for streaming sources that have inconsistent flows of data. Improving operations and security Finally, we have invested heavily over the past few months in improving other operational and security aspects of Atlas Stream Processing. A few of the highlights include: Checkpointing Atlas Stream Processing now performs checkpoints for saving a state while processing. Stream processors are continuously running processes, so whether due to a data issue or infrastructure failure, they require an intelligent recovery mechanism. Checkpoints make it easy to resume your stream processors from wherever data stopped being collected and processed. Terraform provider support Support for the creation of connections and stream processing instances (SPIs) is now available with Terraform. This allows for infrastructure to be authored as code for repeatable deployments. Security roles Atlas Stream Processing has added a project-level role, giving users just enough permission to perform their stream processing tasks. Stream processors can run under the context of a specific role, supporting a least privilege configuration. Auditing Atlas Stream Processing can now audit authentication attempts and actions within your Stream Processing Instance giving you insight into security-related events. Kafka consumer group support Stream processors in now use Kafka consumer groups for offset tracking. This allows users to easily change the position of the processor in the stream for operations and easily monitor for potential processor lag. A final note on what’s new is that in public preview, we will begin charging for Atlas Stream Processing, using preview pricing (subject to change). You can learn more about pricing in our documentation . Build your first stream processor today Public preview is a huge step forward for us as we expand the developer data platform and enable more teams with a stream processing solution that simplifies the operational complexity of building reactive, responsive, event-driven applications, while also offering an improved developer experience. We can’t wait to see what you build! Login today or get started with the tutorial , view our resources , or follow the Learning Byte on MongoDB University.