Log everything! But how?
All of you must know by now how valuable your data is to your product and business: KPI calculation, funnel analysis, A/B testing, cohort analysis, cluster analysis, logictic regressionÃ¢â‚¬Â_none of this is possible without a lot of data, and the most obvious way to get more data is logging.
But how? As we started talking to our customers at Treasure Data, we realized that there was no effective tool to log data in a flexible yet disciplined way. So, we rolled up our sleeves and authored our own log collector and open-sourced it as Fluentd under the Apache 2.0 license.
Fluentd is a lightweight, extensible logging daemon that processes logs as a JSON stream. It's designed so that the user can write custom plugins to configure their own sources and sinks (input and output plugins in Fluentd parlance). In just six months, Fluentd users have contributed almost 50 plugins. These plugins combined with the loggers written in several programming languages ( Ruby, Python, PHP, Perl, Java and more) allow Fluentd to be a great polyglot service. Apache, TSV or CSV. TCP or UDP. MongoDB or MySQL. S3, HDFS or flat files. Chances are good Fluentd can talk to your existing system fluently (Okay, this pun was intended).
fluent-mongo-plugin, the most popular Fluentd plugin
Yes, that's right. fluent-mongo-plugin, the output plugin that lets Fluentd write data to MongoDB directly, is by far the most downloaded plugin!
fluent-plugin-mongo's popularity should come with little surprise: MongoDB is based on schema-free, JSON-based documents, and that's exactly how Fluentd handles events. In other words, there is a one to one correspondance between Fluend events and Mongo documents.
Also, MongoDB and Fluentd both aim to be easy to install and get up and running. If you love the agility and flexibility of MongoDB, chances are good you will also like Fluentd.
How to send data into MongoDB from Fluentd
I assume the reader already has MongoDB up and running . There are a couple of ways to install Fluentd:
Ruby gem Fluentd and its plugins are available as Ruby gems. It's as easy as
$ gem install fluentd $ gem install fluent-mongo-plugin
Debian/RPM packages We have also packaged Fluentd and some of its plugins as
td-agent (...tdâ€œ stands for Treasure Data). Of course, fluent-plugin-mongo is pre-packaged with
td-agent for you :-p Here are the links to the packages.
Now that we have everything, let's configure Fluentd to send data into MongoDB! In this example, we will import Apache logs into MongoDB.
The location of your configuration file depends on how you installed Fluentd. If you went the Ruby gem route, it should be
/etc/fluentd/fluentd.conf, and if you downloaded
td-agent, it should be
/etc/td-agent/td-agent.conf. Open your config file and add
<source> type tail format apache path /var/log/apache2/access_log tag mongo.apache </source>
These lines tell Fluentd to tail the Apache log at
/var/log/apached/access_log. The tailed lines are parsed into JSON and given the tag ...mongo.apacheâ€œ. The tag decides how these events will be routed later.
In the same config file, add
# plugin type type mongo
# mongodb db + collection database apache collection access # mongodb host + port host localhost port 27017 # interval flush_interval 10s </match>
If your MongoDB instance is not running locally with the default port of 27017, you should change the
port parameters. Otherwise, this is it. All of your Apache logs will be imported to MongoDB immediately.
Fluentd + MongoDB = Awesome Sauce
The popularity of MongoDB suggests a paradigm shift in data storage. Traditional RDBMs have their time and place, but sometimes you want more relaxed semantics and adaptability. MongoDB's schema-less document is a good example: it's flexible enough to store ever-changing log data but structured enough to query the data later.
In contrast, logging is moving in the opposite direction. Logging used to be structure-free and ad hoc with bash-based poorman's data analysis tools running everywhere. However, such quick and dirty solutions are fragile and unmaintenable, and Fluentd tries to fix these problems.
It's exciting to see this synergy between Fluentd and MongoDB. We are confident that more and more people will see the value of combining a flexible database (like MongoDB) with a semi-structured log collection mechanism (like Fluentd) to address today's complex data needs.
Many thanks to 10gen for inviting us to give a talk on Fluentd and letting us write this guest post.
Also, we thank Masahiro Nakagawa for authoring and maintaining fluent-plugin-mongo.