Polyglot persistence

74 results

10gen at DrupalCon Munich

DrupalCon Munich came together last week with over 1800 Drupal fans from around the world. The 10gen booth had stickers, t-shirts, and — of course — MongoDB mugs aplenty to share with attendees. From 10gen's perspective, we are interested in how Drupal and MongoDB work together, and what we can do to help make their integration better. Derick Rethans gave a Birds of a Feather session on Wednesday on Practical MongoDB with Drupal, and was co-hosted by Drupal expert, Károly Négyesi (known to the community as chx ). Chx and Derick together were able to give advice on the MongoDB module for Drupal, as well as the EntityFieldQuery module. Wednesday evening had Derick doing more work directly with the MongoDB community when he gave a talk for approximately 30 people at the MÃÂ_nchen MongoDB User Group on “Indexing and Query Optimisation” in MongoDB, covering what is indexing, understanding different types of indexing, and working with indexes. It was well received, and feedback on the talk was complimentary. You can see the slides from the talk here: http://derickrethans.nl/talks/mongo-drupalcon12.pdf Thursday afternoon was the highlight of DrupalCon for us. Anyone who visited the 10gen stand over the previous two days or who had come along to the previous day's BoF had heard about Derick's Introduction to MongoDB talk, and it gave attendees a firm grounding in getting started with MongoDB. The slides from this talk can be seen here: http://derickrethans.nl/talks/mongo-drupalcon12.pdf DrupalCon 2013 will be held in Prague, and we plan to have a MongoDB presence at the conference again. With the improvements heralded by Drupal 8, along with increased awareness in the community of how MongoDB can be used with Drupal, we expect next year to be an even bigger success. 10gen returns to Munich this October for a full-day conference dedicated to MongoDB. MongoDB Munich comes to the city on October 16 — tickets are available here . Tagged with: Drupal, Munich, DrupalCon, DrupalCon Munich, MongoDB Munich, Károly Négyesi, Derick Rethans, MÃÂ_nchen, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 30, 2012

Forward Intel uses Mongo for ...causalâ?? analytics

This was originally posted to the Forward Intel blog Forward Intelligence Systems has chosen MongoDB to serve as the backend datastore to support DataPoint , a new analytics system designed to reveal deeper meaning behind typical business intelligence data. Endless tables, graphs and charts are replaced with simple decision-aiding ...plain English analytics“ to help make the job of the business analyst and company owner quicker and easier. DataPoint is designed to run on top of existing analytics systems like Google Analytics, Piwik, Open Web Analytics and HubSpot. Using the hosted solution, users simply connect their analytics account to their DataPoint profile and the application does the rest. DataPoint will import the data from multiple sources and will identify trends and patterns, removing the guesswork out of why a web site may have seen a decrease in traffic in the month of July. Using Bayesian math, DataPoint determines the causal relationship between an event and its most likely cause. MongoDB is a powerful semi-structured database engine built to withstand the increased traffic that today's web applications endure. It is fast, light-weight and extremely scalable, making it a clear and convincing choice for large scale business intelligence and analytics systems. Mongo stores data using a key/value paradigm within entities known as ...documents“, and queried using simple and straight-forward syntax similar to that of Structured Query Language (SQL). Mongo is schema-less, which means database developers are not confined to the typical column and row structure of relational databases. Dynamic data structures are essential for managing big data applications. Further - and critical to its power and flexibility - Mongo contains support for MapReduce, which is an engine that allows for rapid processing of large amounts of data. Implementing algorithms designed to chug through incredibly large volumes of data simply would not be feasible without Mongo's batch processing support. ...At Forward Intel, we're incredibly excited to start using MongoDB,“ said the company's CEO, Steve Adcock. “Mongo's recent award of over $40 million to further its development ensures that it is here to stay, and we are confident that Mongo will serve us quite well.“ Tagged with: analytics, production uses, mongodb good, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 28, 2012

MongoDB Masters in the Spotlight: Flavio Percoco Premoli

10gen has a number of core contributorsâ€â€ùMongoDB User Group organizers, evangelists, contributors to the core server, connecting libraries and support forum. Last year, 10gen launched the MongoDB Masters program, to encourage the exchange of knowledge and expertise amongst MongoDB community evangelists and open source contributors. To introduce you to these core contributors, we're launching the MongoDB Masters in the Spotlight series on our blog. Flavio Percoco Premoli works in the Research and Development department at The Net Planet Europe and has been an avid MongoDB community contributor for over three years. His host of contributions include Pymongo, the Django MongoDB Engine (co-author and maintainer), the MongoDB plugin for eclipse, Half-Static , a distributed, GridFS based blog engine and the python virtual machine for MongoDB . He lives in Milan, Italy and is a frequent speaker at MongoDB and European technology conferences. What was it like getting started with MongoDB? It was a great experience. It was ~3 years ago when I first looked at mongodb and I was also kind of starting to dig into nosql technologies. It was easy to setup, fast and impressive even if the project was still very young. What advice do you have for other MongoDB users? Try to change the way you think about data and the well known data model paradigms. Models were created to ...model“ data of given a structure but models can be re-modeled too. Do not try to change the way mongodb data management works and forget about db-managed joins :) Oh, btw, Give GridFS a try. You've no idea how useful and powerful it is, I just love it! What has been your greatest accomplishment? I think that one of my biggest accomplishment so far has been making my way in this world and mostly in my professional life. I love what I do and every little goal I've reached is as important as the other ones. That's why making my way and keeping the right path is the most important / difficult one. What is your daily inspiration? ...Make sure you do what you're passioned about and smile while you're doing it; you're born to be happy.“ What do you do in your spare time? I code most of the time. I'm always reading, studying and coding on new projects, trying to find new things to do and to contribute to. If I'm not coding I'm sure you'll find me hanging around with my family and friends. What has been your greatest accomplishment with MongoDB? Every time I get started with MongoDB on a project is an accomplishment because, even for a young project, it has everything I need for that particular project I'm going into. I've done many things with MongoDB (private and public) and each one of them have been an amazing experience. I can't say much about the private projects but I can say that I managed to handle TBs of data but more important than that is that all of this required hundreds of operations per second. I stared at the process monitor amazed at what MongoDB is capable off. I most say that it was running on a really powerful hardware but that makes things even better ;) How has MongoDB helped you the most? In my case, It was helpful when choosing the right ...schema“ to use in our system. Its schema-less capabilities allowed me for making a more flexible, reusable and richer data structures. GridFS has been really helpful too, it allowed me to share big contents between nodes with a single operation without replicating the information or scarifying its consistency. Tagged with: mongodb masters, community, contributors, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 21, 2012

MongoDB at DrupalCon Munich 2012

Fans of MongoDB and Drupal have the chance to flock together at Drupalcon in Munich this week. Running from August 20 - 24, the official conference of the Drupal community features ...Birds of a Feather“ sessions alongside formally scheduled presentations. Included among these is a session organised by 10gen on ... Practical MongoDB and Drupal “. Birds of a Feather workshops are informal and openly scheduled workshops, giving like-minded individuals a chance to talk about a common problem and discuss topical issues. ...Practical MongoDB and Drupal“ will look at Drupal7’s plugable class architecture, and how we can easily swap Drupal’s underlying data storage for MongoDB's faster performance for reads and writes. Furthermore, Microsoft will be joining the session and have 50 pre-allocated Azure passes to share with attendees. Drupalcon will also feature 10gen's own Derick Rethans presenting an ...Introduction to MongoDB“. The session will introduce how to get the most out of MongoDB and explain how MongoDB offers viable alternatives to the standard, normalized, relational mode. Derick's expertise and experience is well known in both the NOSQL and the PHP community. He will explain how to get the most out of MongoDB, why the technology fits well with Drupal; and not only to set it up, but also how to get going with it. Places are still available for Drupalcon here . ... Practical MongoDB and Drupal “ will be held in Chamonix room at Drupalcon at 11.45 on Wednesday, Aug 22. Space is very limited, so arrive promptly! Derick Rethans gives his Introduction to MongoDB on Thursday, Aug 23 at 13.00. Tagged with: MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 20, 2012

Pig as Hadoop Connector, Part One: Pig, MongoDB and Node.js

This post was originally published on the Hortonworks blog . .wp_syntax {background-color: #EEE; border: 1px solid #CCC; padding: 10px; margin-top: 10px; margin-bottom: 10px; white-space: pre; font-size: 12px; font-family: monospace;} Series Introduction Apache Pig is a dataflow oriented, scripting interface to Hadoop . Pig enables you to manipulate data as tuples in simple pipelines without thinking about the complexities of MapReduce. But Pig is more than that. Pig has emerged as the 'duct tape' of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we're going to show you how to use Hadoop and Pig to connect different distributed systems, to enable you to process data from wherever and to wherever you like. Working code for this post as well as setup instructions for the tools we use are available at https://github.com/rjurney/enron-node-mongo and you can download the Enron emails we use in the example in Avro format here . You can run our example Pig scripts in local mode (without Hadoop) with the -x local flag: pig -x local . This enables new Hadoop users to try out Pig without a Hadoop cluster. Introduction In this post we'll be using Hadoop, Pig, mongo-hadoop , MongoDB and Node.js to turn Avro records into a web service. We do so to illustrate Pig's ability to act as glue between distributed systems, and to show how easy it is to publish data from Hadoop to the web. Pig and Avro Pig's Avro support is solid in Pig 0.10.0 . To use AvroStorage, we need only load piggbank.jar, and the jars for avro and json-simple. A shortcut to AvroStorage is convenient as well. Note that all paths are relative to your Pig install path. We load Avro support into Pig like so: /* Load Avro jars and define shortcut */ register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar register /me/pig/contrib/piggybank/java/piggybank.jar define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); /* Shortcut */ MongoDB's Java Driver To connect to MongoDB, we'll need the MongoDB Java Driver. You can download it here: https://github.com/mongodb/mongo-java-driver/downloads . We'll load this jar in our Pig script. Mongo-Hadoop The mongo-hadoop project provides integration between MongoDB and Hadoop. You can download the latest version at https://github.com/mongodb/mongo-hadoop/downloads . Once you download and unzip the project, you'll need to build it with sbt. . / sbt package This will produce the following jars: $ find . | grep jar . / core / target / mongo-hadoop-core-1.1.0-SNAPSHOT.jar . / pig / target / mongo-hadoop-pig-1.1.0-SNAPSHOT.jar . / target / mongo-hadoop-1.1.0-SNAPSHOT.jar We load these MongoDB libraries in Pig like so: /* MongoDB libraries and configuration */ register /me/mongo-hadoop/mongo-2.7.3.jar /* MongoDB Java Driver */ register /me/mongo-hadoop/core/target/mongo-hadoop-core-1.1.0-SNAPSHOT.jar register /me/mongo-hadoop/pig/target/mongo-hadoop-pig-1.1.0-SNAPSHOT.jar /* Set speculative execution off so we don't have the chance of duplicate records in Mongo */ set mapred.map.tasks.speculative.execution false set mapred.reduce.tasks.speculative.execution false define MongoStorage com.mongodb.hadoop.pig.MongoStorage(); /* Shortcut */ set default_parallel 5 /* By default, lets have 5 reducers */ Writing to MongoDB Loading Avro data and storing records to MongoDB are one-liners in Pig. avros = load 'enron.avro' using AvroStorage(); store avros into 'mongodb://localhost/enron.emails' using MongoStorage(); From Avro to Mongo in One Line I've automated loading Avros and storing them to MongoDB in the script at https://github.com/rjurney/enron-node-mongo/blob/master/avro_to_mongo.pig , using Pig's parameter substitution : avros = load '$avros' using AvroStorage(); store avros into '$mongourl' using MongoStorage(); We can then call our script like this, and it will load our Avros to Mongo: pig -l / tmp -x local -v -w -param avros =enron.avro \ -param mongourl = 'mongodb://localhost/enron.emails' avro_to_mongo.pig We can verify our data is in MongoDB like so: $ mongo enron MongoDB shell version: 2.0.2 connecting to: enron > show collections emails system.indexes > db.emails.findOne({message_id: "%3C3607504.1075843446517.JavaMail.evans@thyme%3E"}) { "_id" : ObjectId("502b4ae703643a6a49c8d180"), "message_id" : "", "date" : "2001-04-25T12:35:00.000Z", "from" : { "address" : "jeff.dasovich@enron.com", "name" : "Jeff Dasovich" }, "subject" : null, "body" : "Breathitt's hanging tough, siding w/Hebert, standing for markets. Jeff", "tos" : [ { "address" : "7409949@skytel.com", "name" : null } ], "ccs" : [ ], "bccs" : [ ] } To the Web with Node.js We've come this far, so we may as well publish our data on the web via a simple web service. Lets use Node.js to fetch a record from MongoDB by message ID, and then return it as JSON. To do this, we'll use Node's mongodb package . Installation instructions are available in our github project . Our node application is simple enough. We listen for an http request on port 1337, and use the messageId parameter to query an email by message id. // Dependencies var mongodb = require("mongodb"), http = require('http'), url = require('url'); // Set up Mongo var Db = mongodb.Db, Server = mongodb.Server; // Connect to the MongoDB 'enron' database and its 'emails' collection var db = new Db("enron", new Server("", 27017, {})); db.open(function(err, n_db) { db = n_db }); var collection = db.collection("emails"); // Setup a simple API server returning JSON http.createServer(function (req, res) { var inUrl = url.parse(req.url, true); var messageId = inUrl.query.messageId; // Given a message ID, find one record that matches in MongoDB collection.findOne({message_id: messageId}, function(err, item) { // Return 404 on error if(err) { console.log("Error:" + err); res.writeHead(404); res.end(); } // Return 200/json on success if(item) { res.writeHead(200, {'Content-Type': 'application/json'}); res.send(JSON.stringify(item)); res.end(); } }); }).listen(1337, ''); console.log('Server running at' ); Navigating to http://localhost:1337/?messageId=%3C3607504.1075843446517.JavaMail.evans@thyme%3E returns an enron email as JSON: We'll leave the CSS as an exercise for your web developer, or you might try Bootstrap if you don't have one. Conclusion The Hadoop Filesystem serves as a dumping ground for aggregating events. Apache Pig is a scripting interface to Hadoop MapReduce . We can manipulate and mine data on Hadoop, and when we're ready to publish it to an application we use mongo-hadoop to store our records in MongoDB . From there, creating a web service is a few lines of javascript with Node.js - or your favorite web framework. MongoDB is a popular NoSQL database for web applications. Using Hadoop and Pig we can aggregate and process logs at scale and publish new data-driven features back to MongoDB - or whatever our favorite database is. Note: we should ensure that there is sufficient I/O between our Hadoop cluster and our MongoDB cluster, lest we overload Mongo with writes from Hadoop. Be careful out there! I have however verified that writing from an Elastic MapReduce Hadoop cluster to a replicated MongoHQ cluster (on Amazon EC2) works well. About the Author Russell Jurney is a data scientist and the author of the book Agile Data (O'Reilly, Dec 2012) , which teaches a flexible toolset and methodology for building effective analytics applications using Apache Hadoop and cloud computing. About Hortonworks Hortonworks is a leading commercial vendor of Apache Hadoop, the preeminent open source platform for storing, managing and analyzing big data. Our distribution, Hortonworks Data Platform powered by Apache Hadoop, provides an open and stable foundation for enterprises and a growing ecosystem to build and deploy big data solutions. Hortonworks is the trusted source for information on Hadoop, and together with the Apache community, Hortonworks is making Hadoop more robust and easier to install, manage and use. Hortonworks provides unmatched technical support, training and certification programs for enterprises, systems integrators, and technology vendors. For more information, visit www.hortonworks.com . Tagged with: MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 17, 2012

10gen To Attend, Sponsor 'NoSQL Now!' for Second Year

10gen will be sponsoring the second annual NoSQL Now! , from August 21st through the 23rd in San Jose, California. The goal of the educational conference is to ...describe the diversity of NoSQL technologies available to all organizations to address their business needs, and to offer objective evaluation processes to match the right NoSQL solutions with the right business challenge.“ 10gen and members of the MongoDB community plan to discuss many of the advantages (and challenges) of MongoDB, and demonstrate how it is being used today in both enterprise IT and start-up companies. Over the past three years, the NoSQL movement has been growing at 82% compounded annually. Since the first annual NoSQL Now last year, 10gen has had over 1,000,000 downloads, tripled the size of its employee base and gained additional VC funding to be around 75 million in total. 10gen CEO and founder Dwight Merriman will deliver a talk entitled ...Common MongoDB Use Cases“ and will also serve on ...The NoSQL 'C Panel'“ alongside representatives from other NoSQL databases. 10gen Director of Product Marketing and Technical Alliances Jared Rosoff will present ...PM1: Benefits and Challenges of Using MongoDB in the Enterprise“ , which will introduce MongoDB and the benefits and challenges of adopting MongoDB within the enterprise. Other talks about MongoDB will be delivered by friends from Red Hat, Rocket Fuel, StudyBlue, Inc., Exadel, Inc. and Analytica, Inc. Christoph Bussler and Roger Bodamer from Analytica will give a talk called “Analytica: Analytics for MongoDB and NoSQL Databases.” Grant Shipley from Red Hat will lead “Mobilize Your MongoDB! Developing iPhone and Android Apps in the Cloud.” Josh Powell from Rocket Fuel will discuss the advantages of using MongoDB in a single page web application . Sean Laurent of StudyBlue, Inc will provide a MongoDB case study of databases at scale . Finally, Max Katz of Exadel, Inc. will teach attendees to build Mobile Apps with HTML5 and MongoDB . Also please be sure to sign up for a free exhibits pass and visit MongoDB in booth 120 for some swag. Tagged with: MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 15, 2012

10gen and Microsoft Partner to Deliver NoSQL in the Cloud

We at 10gen are excited about our ongoing collaboration with Microsoft. We are actively leveraging new features in Windows Azure to ensure our common customers using MongoDB on Windows Azure have access to the latest features and the best possible experience. In early June, Microsoft announced the preview version of Windows Azure VM, which enables customers to deploy and run Windows and Linux instances on Windows Azure. This provides more control over actual instances as opposed to using Worker Roles. Additionally, this is the paradigm that is most familiar to users who run instances in their own private clouds or on other public clouds. In conjunction with Azure's release, 10gen and Microsoft are now delivering the MongoDB Installer for Windows Azure. The MongoDB Installer for Windows Azure automates the process of creating instances on Azure, deploying MongoDB, opening up relevant ports, and configuring a replica set. The installer currently works when used on a Windows machine, and can be used to deploy MongoDB replica sets to Windows VMs on Azure. Additionally, the installer uses the instance OS drive to store MongoDB data, which limits storage and performance. As such, we recommend that customers only use the installer for experimental purposes at this stage. There are also tutorials that walk users through how to deploy a single standalone MongoDB server to Windows Azure VM for Windows 2008 R2 Server and CentOS . In both cases, by using Azure data disks, this implementation provides data safety given the persistent nature of disks, which allows the data to survive instance crashes or reboots. Furthermore, Azure's triple-replication of the data guards against storage corruption. Neither of these solutions, however, takes advantage of MongoDB's high-availability features. To deploy MongoDB to be highly available, one can leverage MongoDB replica sets; more information on this high-availability feature can be found here . Finally, customers who would like to deploy MongoDB replica sets to CentOS VMs can follow these basic steps: Sign up for the Windows Azure VM preview feature Create the required number of VM instances Attach disks and format Configure the ports to allow remote shell and mongodb access Install mongodb and launch Configure the replica set Detailed steps for this procedure are outlined in the tutorial, ...Deploying MongoDB Replica Sets to Linux on Azure.“ Tagged with: Windows Azure, Azure VM, Azure Worker Roles, CentOS, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen

August 14, 2012