10gen

22 results

Speaker Lineup for MongoDB San Francisco

MongoDB San Francisco , 10gen’s most popular event of the year, is coming up on May 10. San Francisco has become a stronghold of tech innovation, and our lineup of exceptional speakers is testament to the exceptional MongoDB-powered ecosystem in the Bay Area. Here are just a few of the awesome talks on the agenda for MongoDB San Francisco: Using MongoDB for Groupon's Place Data, by Peter Bakkum, Member of Technical Staff, Groupon The Merchant data team at Groupon uses MongoDB to create “the most comprehensive database of places and merchants in the world.” This is a mission-critical part of the Groupon platform providing real-time data for the business. In this session, get an inside view of Groupon’s MongoDB cluster: Peter will introduce attendees to the data model, data processing pipeline and the dynamics of parallel querying in their Storm cluster. Managing a Maturing MongoDB Ecosystem, by Charity Majors, Systems Engineer, Parse Parse, which was recently acquired by Facebook , provides scalable, cross-platform services and tools for developers. Parse engineer, Charity Majors, has a tremendous amount of experience managing Parse’s MongoDB clusters from their infancy into their golden years and will show best practices for keeping MongoDB clusters healthy. Charity’s scaling and performance tuning tips will help you become a MongoDB ops specialist. How ServiceSource Revolutionized Its Business and Moved to the Cloud with MongoDB, by Greg Olsen, CTO, ServiceSource In late 2012, ServiceSource released Renew OnDemand, designed to increase recurring revenue for the world's largest technology companies. Built on MongoDB, Renew is representative of a new generation of cloud-native enterprise applications that exploit innovative datastore and compute approaches to achieve fundamental improvements in capability and scale. Greg Olsen, CTO of ServiceSource, will discuss how his team has implemented MongoDB in a sharded environment, describe some of the unique characteristics of the platform and provide insight into how other service providers can be equally adaptive using MongoDB. Storing eBay's Media Metadata on MongoDB , by Yuri Finkelstein, Architect, eBay eBay is the largest secondary marketplace on the web. The eBay development team has been using MongoDB for project Zoom, where they store all of the website’s metadata, which includes references to every item’s photos on eBay. This cluster is eBay's first of many MongoDB installations on the platform, and was chosen for its flexible data model and improved performance. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of this mission-critical project and its underlying architecture and discuss why the team chose MongoDB for project Zoom. MongoDB and Meteor: an Architecture for Real-time Web Apps, Matt DeBergalis, Architect, Meteor Meteor is a new JavaScript application platform -- specifically designed to work with MongoDB -- for building modern real-time web applications. These applications, like live analytics dashboards or those that show live data feeds, all have a way to send real-time updates to connected users when documents in their database change. Meteor and MongoDB offer an elegant architecture for managing the flow of data in a realtime app using the familiar MongoDB APIs. This talk will dig into the architecture of a realtime app built on MongoDB. Matt will cover tips and tricks for using MongoDB in a realtime app and demonstrate some of the design patterns they’ve developed. A Year of Monitoring Production Deployments with MongoDB , Simon Maynard, Co-Founder, Bugsnag Bugsnag is a fast growing error monitoring service for web and mobile applications that is processing millions of errors every day, and was designed from the ground up to utilize MongoDB and its strengths. In this talk, CTO Simon Maynard will discuss hints and tips from two years of running production MongoDB deployments. The talk will cover all aspects of developing for and maintaining a MongoDB deployment, including using the profiler to tune performance, as well as schema and index design considerations and what to monitor and how to monitor it. Want to see these talks and more? Join the community at MongoDB San Francisco : use the discount code mongodb_blog for 25% off on tickets.

April 30, 2013

Forward Intel uses Mongo for ...causalâ?? analytics

This was originally posted to the Forward Intel blog Forward Intelligence Systems has chosen MongoDB to serve as the backend datastore to support DataPoint , a new analytics system designed to reveal deeper meaning behind typical business intelligence data. Endless tables, graphs and charts are replaced with simple decision-aiding ...plain English analytics“ to help make the job of the business analyst and company owner quicker and easier. DataPoint is designed to run on top of existing analytics systems like Google Analytics, Piwik, Open Web Analytics and HubSpot. Using the hosted solution, users simply connect their analytics account to their DataPoint profile and the application does the rest. DataPoint will import the data from multiple sources and will identify trends and patterns, removing the guesswork out of why a web site may have seen a decrease in traffic in the month of July. Using Bayesian math, DataPoint determines the causal relationship between an event and its most likely cause. MongoDB is a powerful semi-structured database engine built to withstand the increased traffic that today's web applications endure. It is fast, light-weight and extremely scalable, making it a clear and convincing choice for large scale business intelligence and analytics systems. Mongo stores data using a key/value paradigm within entities known as ...documents“, and queried using simple and straight-forward syntax similar to that of Structured Query Language (SQL). Mongo is schema-less, which means database developers are not confined to the typical column and row structure of relational databases. Dynamic data structures are essential for managing big data applications. Further - and critical to its power and flexibility - Mongo contains support for MapReduce, which is an engine that allows for rapid processing of large amounts of data. Implementing algorithms designed to chug through incredibly large volumes of data simply would not be feasible without Mongo's batch processing support. ...At Forward Intel, we're incredibly excited to start using MongoDB,“ said the company's CEO, Steve Adcock. “Mongo's recent award of over $40 million to further its development ensures that it is here to stay, and we are confident that Mongo will serve us quite well.“ Tagged with: analytics, production uses, mongodb good, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 28, 2012

Building Big Data Portal through Liferay and MongoDB integration

This was originally posted to the CIGINEX Blog The CIGNEX Datamatics Big Data Portal is a web based solution which combines the powerful presentation capabilities of a portal such as rich user interfaces, collaboration, and secures access, with centralized & massively scalable data storage as the back end, consisting of a variety of content (Audio, Video, Images, Documents, Metadata) in large volume. We have been providing content management and portal solutions for the past 12 years. While serving our customers we have observed the following: The content is growing big, really big (volume) The unstructured (variety of) content is now becoming a business need for customers to process Customers wanting to capture smart information (meta-data) about the content Need to provide high performance and secure access of content to variety of applications and devices We have integrated two leading Open Source technologies, Liferay and MongoDB to build a powerful, cost effective solution as ...BIG DATA PORTAL“ to the ever growing need to manage the vast amount of information available. Big Data is at the foundation of many technologies that are trending today. With years of proven global experience in Open Source, we are excited to be pioneering solutions that solve many of today's growing challenges. CIGNEX Datamatics Big Data Portal is represented in the following diagram. Liferay is the leading Open source portal with a strong community; with 4+ million downloads, with 500,000 deployments worldwide. Liferay is featured as Leader in Gartner's Magic Quadrant for Horizontal Portals. MongoDB is an immensely scalable, NoSQL and agile document-oriented database based on JSON-like document storage with dynamic schemas. MongoDB's flexible data structure, ability to index & query and auto-sharding makes it a strong tool that adapts to changes and reduces the complexity. MongoDB's GridFS enables large binary objects like Images, Video or Audio. Big Data Portal with MongoDB and Liferay provides lower total cost of ownership and higher ROI to the businesses. CIGNEX Datamatics have developed a connector which enables Liferay to manage content in the clustered environment in the MongoDB solution. The architecture diagram is given below: Key Benefits with our solution include: Elimination of high end storage systems such as SAN and Oracle Clusters, which is a huge cost savings Secure access to data Flexibility leading to high performance Simplified Data Management through a single system managing structured and unstructured data Consistent look and feel accessible across myriad gadgets and devices For more details, download the presentation from CIGNEX Datamatics website at, http://www.cignex.com/resources/presentations/psc Munwar Shariff, CTO, CIGNEX Datamatics For more details, contact: munwar at cignex dot com Tagged with: big data, portal, cloud, open source, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

July 27, 2012

Fluentd + MongoDB: The Easiest Way to Log Your Data Effectively.

Log everything! But how? All of you must know by now how valuable your data is to your product and business: KPI calculation, funnel analysis, A/B testing, cohort analysis, cluster analysis, logictic regressionâ€Â_none of this is possible without a lot of data, and the most obvious way to get more data is logging. But how? As we started talking to our customers at Treasure Data , we realized that there was no effective tool to log data in a flexible yet disciplined way. So, we rolled up our sleeves and authored our own log collector and open-sourced it as Fluentd under the Apache 2.0 license. Fluentd is a lightweight, extensible logging daemon that processes logs as a JSON stream. It's designed so that the user can write custom plugins to configure their own sources and sinks (input and output plugins in Fluentd parlance). In just six months, Fluentd users have contributed almost 50 plugins . These plugins combined with the loggers written in several programming languages ( Ruby , Python , PHP , Perl , Java and more ) allow Fluentd to be a great polyglot service. Apache, TSV or CSV. TCP or UDP. MongoDB or MySQL. S3, HDFS or flat files. Chances are good Fluentd can talk to your existing system fluently (Okay, this pun was intended). fluent-mongo-plugin, the most popular Fluentd plugin Yes, that's right. fluent-mongo-plugin, the output plugin that lets Fluentd write data to MongoDB directly, is by far the most downloaded plugin! fluent-plugin-mongo's popularity should come with little surprise: MongoDB is based on schema-free, JSON-based documents, and that's exactly how Fluentd handles events. In other words, there is a one to one correspondance between Fluend events and Mongo documents. Also, MongoDB and Fluentd both aim to be easy to install and get up and running. If you love the agility and flexibility of MongoDB, chances are good you will also like Fluentd. How to send data into MongoDB from Fluentd I assume the reader already has MongoDB up and running [1]. There are a couple of ways to install Fluentd: Ruby gem Fluentd and its plugins are available as Ruby gems. It's as easy as $ gem install fluentd $ gem install fluent-mongo-plugin Debian/RPM packages We have also packaged Fluentd and some of its plugins as td-agent (...td“ stands for Treasure Data). Of course, fluent-plugin-mongo is pre-packaged with td-agent for you :-p Here are the links to the packages. Debian package RPM package Now that we have everything, let's configure Fluentd to send data into MongoDB! In this example, we will import Apache logs into MongoDB. The location of your configuration file depends on how you installed Fluentd. If you went the Ruby gem route, it should be /etc/fluentd/fluentd.conf , and if you downloaded td-agent , it should be /etc/td-agent/td-agent.conf . Open your config file and add <source> type tail format apache path /var/log/apache2/access_log tag mongo.apache </source> These lines tell Fluentd to tail the Apache log at /var/log/apached/access_log . The tailed lines are parsed into JSON and given the tag ...mongo.apache“. The tag decides how these events will be routed later. In the same config file, add # plugin type type mongo # mongodb db + collection database apache collection access # mongodb host + port host localhost port 27017 # interval flush_interval 10s </match> If your MongoDB instance is not running locally with the default port of 27017, you should change the host and port parameters. Otherwise, this is it. All of your Apache logs will be imported to MongoDB immediately. Fluentd + MongoDB = Awesome Sauce The popularity of MongoDB suggests a paradigm shift in data storage. Traditional RDBMs have their time and place, but sometimes you want more relaxed semantics and adaptability. MongoDB's schema-less document is a good example: it's flexible enough to store ever-changing log data but structured enough to query the data later. In contrast, logging is moving in the opposite direction. Logging used to be structure-free and ad hoc with bash-based poorman's data analysis tools running everywhere. However, such quick and dirty solutions are fragile and unmaintenable, and Fluentd tries to fix these problems. It's exciting to see this synergy between Fluentd and MongoDB. We are confident that more and more people will see the value of combining a flexible database (like MongoDB) with a semi-structured log collection mechanism (like Fluentd) to address today's complex data needs. Acknowledgement Many thanks to 10gen for inviting us to give a talk on Fluentd and letting us write this guest post. Also, we thank Masahiro Nakagawa for authoring and maintaining fluent-plugin-mongo . Tagged with: fluentd, logs, log, logging, apache, open source, treasure data, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

July 20, 2012

Xtify: Powered by MongoDB

Engaging audiences on mobile is a key strategy for advertisers in the age of apps. Xtify helps brand can easily engage with customers through hyper-local targeting on smartphones, and they do it all with MongoDB. What is Xtify Xtify is a mobile engagement platform focused on helping our customers reach their users with highly contextualized messages when and where they will be most effective. Create rich individualized promotions through our web console and REST APIs and specify their delivery profiles using custom geofences, application usage paramaters, and user tags. Our cutting edge complex event engine makes sure the right users get the right messages at the right time. You can also review promotion progress in real time with our analytics tool to increase message effectiveness and reach. Can you tell me a little bit about your technology stack? Here at Xtify we’ve implemented a highly distributed and scalable service oriented architecture on top of ActiveMQ, MongoDB, MySQL, and Enterprise Java. How do you use MongoDB for your location data store? One of the reasons we love MongoDB is for its geo-indexes. As location enabled devices send updates back to our system we dump their latitude and longitude to our location datastore which allows us to dynamically track user movement through time and space. MongoDB’s speed and scalability allows us to store and index well over 25,000 of these updates a minute. What were the driving factors in deciding to use MongoDB? Several persistence use cases in our system aligned well with MongoDB’s schemaless document based structure, including our location and user entities. We also liked the potential for scale we could get with mongos, which we have found to be intuitive and easy to manage. What recommendations would you make to companies deciding to use MongoDB for mobile solutions? As with any solution, the ability to quickly ramp up the performance of a database at a moment’s notice is critical to keeping your customers happy. By pre-sharding key collections, even if you only need one shard initially, you get horizontal scale and stable read/write latencies when you need it most. What are your future plans? Anything exciting happening on the horizon? We have several exciting things in the pipeline including updates to our configuration API, and the addition of an arbitrary event based API. Of course the coolest new products are still top secret so you’ll have to find out with everyone else! Tagged with: mobile, geolocation, applications, apis, api, solutions, software, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

July 11, 2012

Mobilize Your MongoDB: Building MongoDB Mobile Apps with OpenShift PaaS Part II

Summary: This is the second part of a blog series that details how to develop a mobile application that is backed by MongoDB and a PaaS. MongoDB makes a great companion to this mobile application given its ability to shard and the nature of being able to store JSON documents with little data manipulation required. In this blog post, part two of the series, we will go over the required components and software in order to develop cross platform mobile applications for the iPhone and Android operating systems. We will also install and configure the backend systems, including mongodb , which makes a perfect data store for the BeerShift mobile application. We will be using the following applications and software stack components: Titanium Studio by Appcelerator Titanium Studio is an all-inclusive powerful Eclipse-based IDE that simplifies the mobile development process. Use Titanium Studio to rapidly build, test, package and publish mobile, desktop and web applications. Take advantage of new functionality like advanced code assisting, ACS integration, module management, Git integration, an enhanced publishing workflow and a full-featured editor. Manage Titanium projects, test your mobile apps in the simulator or on device, automate app packaging deploy to a public or private App Store and much more. * Xcode by Apple Even though we will be using Titanium Studio for our development, we will still need to have Xcode installed and configured so that we have access to several important tools. Not only will we be using the simulator to test out our iPhone application, we will also need the Xcode IDE in order to bundle and submit our application to the Apple App Store. Android SDK Since we are targeting both iOS and Android based devices, we will also need to install and configure the Android SDK for emulating the Android hardware for testing. OpenShift Client Tools OpenShift is Red Hat’s free, auto-scaling Platform as a Service (PaaS) for applications. As an application platform in the cloud, OpenShift manages the stack so you can focus on your code. We will be using this for our backend services and our cloud hosted MongoDB . While not required for this blog post series, I would suggest that a user also install a quality image editing application for sizing of icons and splash screens for your application. I prefer to use an open source software application called Gimp that will provide the user with most of the image editing capabilities they need. Step 1: Installing Xcode Note: If you are planning on targeting iOS platforms, you will typically need an iOS developer account . This will allow you to publish your application to the Apple App Store and receive product updates and announcements about new iOS platforms. This program typically costs 99.00USD per year. There are generally two ways to install Xcode on Mac OS. You can either install via the app store or you can download it directly from the Apple Developer Center. During this blog post, I will assume that you have access to the Apple App Store and will be detailing that route in order to install the IDE. Once you start the App Store application, search for Xcode and you should be directed to the following page: Once on this page, click the free button under the short description in order to install the IDE on your local operating system. Once the installation starts, be patient! Xcode is 1.5 gigs and can take up to a significant amount of time to install even on the fastest of connections. To check the status of the installation, you can go back to the App Store application and click on the Purchases tab at the top of the screen. This will display your current download position and how much time is remaining. Step 2: Installing OpenShift Client Tools Note: If you would rather watch a screencast of this step, check out this video where I demo how to install the client tools on OSX. The OpenShift client tools are written in a very popular programming language called Ruby . With OSX 10.6 and later, ruby is installed by default so installing the client tools is a snap. Simply issue the following command on your terminal application: $ sudo gem install rhc If you don't already have an OpenShift account, head on over to http://openshift.redhat.com and signup. It is completely free and Red Hat gives every user three free applications running in the cloud. At the time of this writing, the combined resources allocated for each user is 1.5gb of memory and 3gb of disk space. Now that we have the client tools installed, we also need to install the GIT source code repository tools. In order to do this, download the package from the GIT website by clicking on the Download for Mac button on the right hand side of the screen. Once the download of is .dmg file is complete, mount the image by clicking on it and open up Finder. Once Finder is open, click on the .pkg file to install GIT to your local system. Follow the installation instructions and close the dialog box once the installation has finished. Open up a new terminal window to ensure that your environment variables, including your path, have been updated to reflect the new git installation. At this point, we can create the backend server for our BeerShift application including the mongo database. For this blog post, we will be using a PHP backend but I have also written a backend for Ruby, Python and Java. $ rhc app create -a beershift -t php-5.3 The above command will provision some space for us on the Red Hat Cloud. It will also create a templated website for us to verify that the application creation was successful. Once the command has finished, verify that the application and server space was created by pointing your browser to the URL provided by the RHC tools. Now that we have an application created, lets create a mongodb data store to house our application data. This can be done by performing the following command. $ rhc-ctl-app -a beershift -e add-mongodb-2.0 This will return the database hostname, port, root user and root password for you to access the database. Don't worry, we will go into more detail on how all of this works with the blog post that covers the backend system for this application. Step 3: Install the Android SDK Appcelerator provides excellent instructions on how to install and configure the Android SDK for use with Titanium Studio. Instead of re-inventing the wheel, I suggest that you follow the instructions already provided for this step. Step 4: Install Titanium Studio In order to install and use Titanium Studio, you will need to register for a developer account with Appcelerator. Head on over and click the Download Titanium button on the right hand side of the screen. This will redirect you to a sign up screen. Fill in the required details and submit the form and check your inbox for a validation email. Once you have validated your email, you will be redirected back to the Appcelerator site where you can download Titanium Studio. Once the .dmg file has downloaded, mount the image and follow the instruction to drag Titanium Studio to your Applications folder. Note: When you start the application for the first time, you may be prompted to install a Java runtime. If so, following the instructions that are presented and OSX will automatically find and install the Java runtime for you. Once Titanium Studio starts, you will be prompted for a location to store your workspace. The workspace is a location on your local machine where all of your source files and project settings will be stored. After you select your workspace location, you will be asked for your username and password. This is the username and password that you used to signup for an Appcelerator account. Once you are logged in, that IDE may perform an update to ensure that you are running the latest available code. Now that you have the IDE setup and your SDKs setup, get familiar with the IDE and play around with a few of the sample projects. In the next blog post we will begin development of the backend application and create our REST API that handle communication between the mobile application and the cloud hosted server. * http://www.appcelerator.com/platform/titanium-studio Tagged with: openshift, sdk, iphone, iphone development, objective c, red hat, open source, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

May 31, 2012

Mobilize Your MongoDB: Building MongoDB Mobile Apps with OpenShift PaaS

This is the first in a 4-part series by Grant Shipley, Cloud Evangelist for Red Hat’s OpenShift Platform-as-a-service. Grant’s series will cover the development of “Beershift”, a mobile app for iPhone and Android built using Titanium, OpenShift and MongoDB. MongoDB makes a great companion to this mobile application given its ability to shard store JSON documents with little data manipulation required. In this blog post, we will go over the background of the application and discuss the features we plan to build. Background: I started developing iOS based applications shortly after the arrival of the iPhone on the market. Having been a Java and PHP developer for my entire career, switching to objective-c was a tough challenge for me. I had to remember basic programming methodologies and patterns that I haven't used since college. It took me nearly two months of work at the cadence of 30-40 hours per week to build my first iOS application. To my delight, after releasing the application, the market for the application was larger that I had anticipated. Users were writing great reviews and requesting more features. Shortly after releasing my first iOS based application, Google decided to enter the smartphone market with their android based sdk and devices. This should have been great news for most software developers but for me, a part time mobile developer, it wasn't. I now had users requesting my application for android devices as well as for the new iPad and other tablets that were hitting the market. I didn't have the free time to port my application to the android sdk as it would have required another two months of software development as well as maintaining two separate code streams for patches and updates. About 8 months ago, I heard about a company called Appcelerator and their Titanium SDK . This SDK would allow me to code using javascript but target native UI controls for an array of devices. This sounded like heaven as most of the applications that I write are productivity or novelty based applications that don't rely heavily on 3D graphics. I set out to learn the titanium SDK and was able to develop the BeerShift sample application over a period of two days. About BeerShift: At OpenShift , we enjoy local craft beers and the social aspects of having a pint while discussing the latest trends in software development and deployment. One night, over a pint, we thought it would be cool if we could quickly read a description of the beer and brewery before ordering. We kept discussing the app and of course feature creep started setting in. By the end of the night, we decided to develop a mobile-based application that would allow a user to search for beers, and then log when and where they drank it. Because the team was split between using iOS and Android based phones, we needed it to work on both devices and sync the information via a backend service. Of course, all of this had to be available via the web as well. This was a great opportunity for me to learn Titanium so I set out to develop the application. The biggest unknown was where to get a freely available database of beers that I could search. I researched this question and did some google searching but didn't really come up with any providers that met my needs. Luckily, while speaking at a PHP Users Group in Raleigh, NC, I met a couple of guys who owned a startup called brewerydb.com . With their growing repository of beers and breweries, it had all of the information that I needed in order to develop the sample app. I invited them out for a pint after the user group and we discussed the details. A few days later I had an API key and was ready to get my Titanium Javascript on. Want a quick preview of what we will be building? Check out the video showing the application. BeerShift has a tabbed based UI that consists for 4 main screens. Drink, Drank, Kegstand, and Settings. The settings tab presents the user with username and password input fields. If the username does not exist in the MongoDB database, the user will be prompted if they want to create a new user. The drink tab is the heart of the application. This tab allows the uses to enter in a beer name and will return a result of all beers and breweries that match the search string. The results are retrieved via a REST API call to the openshift server and presented to the user in a table view. The user can select a a beer from the list and then select to ...Drink It“. Once the user has decided to log drinking a beer, the drinking event will be recorded on both the drank tab and the keg stand tab. The keg stand tab will allow the user of the application to view the 50 most recent beers drank by any user of the application. In the next blog post of this series, I will detail the installation of applications and tools needed to begin with development of the BeerShift application. Source Code: All of the source code for this application, including the backend REST API and MongoDBa> integration, is available on github.com/gshipley Tagged with: red hat, open shift, openshift, mobile, apps, application, titanium, sdk, java, objective c, open source, breweries, beer, brewerydb, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

May 15, 2012

MongoDB Hadoop Connector Announced

10gen is pleased to announce the availability of our first GA release of the MongoDB Hadoop Connector , version 1.0. This release was a long-term goal, and represents the culmination of over a year of work to bring our users a solid integration layer between their MongoDB deployments and Hadoop clusters for data processing. Available immediately, this connector supports many of the major Hadoop versions and distributions from 0.20.x and onwards. The core feature of the Connector is to provide the ability to read MongoDB data into Hadoop MapReduce jobs, as well as writing the results of MapReduce jobs out to MongoDB. Users may choose to use MongoDB reads and writes together or separately, as best fits each use case. Our goal is to continue to build support for the components in the Hadoop ecosystem which our users find useful, based on feedback and requests. For this initial release, we have also provided support for: writing to MongoDB from Pig (thanks to Russell Jurney for all of his patches and improvements to this feature) writing to MongoDB from the Flume distributed logging system using Python to MapReduce to and from MongoDB via Hadoop Streaming . Hadoop Streaming was one of the toughest features for the 10gen team to build. To that end, look for a more technical post on the MongoDB blog in the next week or two detailing the issues we encountered and how to utilize this feature effectively. This release involved hard work from both the 10gen team, as well as our community. Testing, pull requests, email ideas and support tickets have all contributed to moving this product forward. One of the most important contributions was from a team of students participating in a New York University class in Information Technology Projects which is designed to have students apply their skills to real world projects. Under the guidance of Professor Evan Korth , four students worked closely with 10gen to test and improve the functionality of the Hadoop Connector. Joseph Shraibman, Sumin Xia, Priya Manda, and Rushin Shah all worked to enhance and improve support for splitting up MongoDB input data, as well as adding a number of testing improvements and consistency checks. Thanks to the work done by the NYU team as well as improvements to the MongoDB server, the MongoDB Hadoop Connector is capable of efficiently splitting input data in a variety of situations - in both sharded and unsharded setups - to parallelize the Hadoop input as efficiently as possible for maximum performance. In the next few months we will be working to add additional features and improvements to the Hadoop Connector including Ruby support for Streaming, Pig input support, and support for reading and writing MongoDB Backup Files for offline batch processing. As with all of our MongoDB projects, you can always monitor the roadmap, request features, and report bugs via the MongoDB Jira and let us know on the MongoDB User Forum if you have any questions. Tagged with: mongodb hadoop, nyu, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

April 10, 2012