The legal landscape for cannabis in the United States is in constant flux. Each year, new states and other jurisdictions legalize or decriminalize it, and the regulations governing how it can be sold and used change even more frequently. For companies in the industry, this affects not only how they do business, but also how they manage their data. Responding to regulatory changes requires speedy updates to software and a database that makes it easy to change the structure of your data as needed – and that’s not to mention the scaling needs of an industry that’s growing incredibly rapidly.
Flowhub makes software for the cannabis industry, and the company is leaping these hurdles every day. I recently spoke with Brad Beeler, Lead Architect at Flowhub, about the company, the challenges of working in an industry with complex regulations, and why Flowhub chose MongoDB to power its business. We also discussed how consulting from MongoDB not only improved performance but also saved the company money, generating a return on investment in less than a month.
Eric Holzhauer: First, can you tell us a bit about your company?
Brad Beeler: Flowhub provides essential technology for cannabis dispensaries. Founded in 2015, Flowhub pioneered the first Metrc API integration to help dispensaries stay compliant. Today, over 1,000 dispensaries trust Flowhub's point of sale, inventory management, business intelligence, and mobile solutions to process $3B+ cannabis sales annually.
EH: How is Flowhub using MongoDB?
BB: Essentially all of our applications – point of sale, inventory management, and more – are built on MongoDB, and we’ve been using MongoDB Atlas from the beginning.
When I joined two and a half years ago, our main production cluster was on an M40 cluster tier, and we’ve now scaled up to an M80. The business has expanded a lot, both by onboarding new clients with more locations and by increasing sales in our client base. We’re now at $3 billion of customer transactions a year. As we went through that growth, we started by making optimizations at the database level prior to throwing more resources at it, and then went on to scale the cluster. One great thing about Atlas is that it gave us the metrics we needed to understand our growth. After we’d made some optimizations, we could look at CPU and memory utilization, check that there wasn’t a way to further improve query execution with indexes, and then know it was time to scale. It’s really important for usability that we keep latency low and that the application UI is responsive, and scaling in Atlas helps us ensure that performance.
We also deploy an extra analytics node in Atlas, which is where we run queries for reporting. Most of our application data access is relatively straightforward CRUD, but we run aggregation pipelines to create reports: day-over-day sales, running financials, and so forth. Those reports are extra intensive at month-end or year-end, when our customers are looking back at the prior period to understand their business trends. It’s very useful to be able to isolate that workload from our core application queries. I’ll also say that MongoDB Compass has been an amazing tool for creating aggregation pipelines.
EH: Can you tell us some more about what makes your industry unique, and why MongoDB is a good fit?
BB: The regulatory landscape is a major factor. In the U.S., there’s a patchwork of regulation, and it continues to evolve – you may have seen that several new states legalized cannabis in the 2020 election cycle. States are still exploring how they want to regulate this industry, and as they discover what works and what doesn’t, they change the regulations fairly frequently.
We have to adapt to those changing variables, and MongoDB facilitates that. We can change the application layer to account for new regulations, and there’s minimal maintenance to change the database layer to match. That makes our development cycles faster and speeds up our time to market. MongoDB’s flexibility is great for moving quickly to meet new data requirements.
As a few concrete examples:
- The state of Oregon wanted to make sure that consumers knew exactly how much cannabis they were purchasing, regardless of format. Since some dispensaries sell prerolled products, they need to record the weight of the paper associated with those products. So now that’s a new data point we have to collect. We updated the application UI to add a form field where the dispensary can input the paper weight, and that data flows right into the database.
- Dispensaries are also issuing not only purchase receipts, but exit labels like what you’d find on a prescription from a pharmacy. And depending on the state, that exit label might include potency level, percentage of cannabinoids, what batch and package the cannabis came from, and so on. All of that is data we need to be storing, and potentially calculating or reformatting according to specific state requirements.
- Everything in our industry is tracked from seed to sale. Plants get barcodes very early on, and that identity is tracked all the way through different growth cycles and into packaging. So if there’s a recall, for example, it’s possible to identify all of the products from a specific plant, or plants from a certain origin. Tracking that data and integrating with systems up the supply chain is critical for us.
That data is all tracked in a regulatory system. We integrate with Metrc, which is the largest cannabis tracking system in the country. So our systems feed back into Metrc, and we automate the process of reporting all the required information. That’s much easier than a manual alternative – for example, uploading spreadsheets to Metrc, which dispensaries would otherwise need to do. We also pull information down from Metrc. When a store receives a shipment, it will import the package records into our system, and we’ll store them as inventory and get the relevant information from the Metrc API.
EH: What impact has MongoDB had on your business?
EH: What version of MongoDB are you using?
BB: We started out on 3.4, and have since upgraded to MongoDB 4.0. We’re preparing to upgrade to 4.2 to take advantage of some of the additional features in the database and in MongoDB Cloud. One thing we’re excited about is Atlas Search: by running a true search engine close to our data, we think we can get some pretty big performance improvements.
Most of our infrastructure is built on Node.js, and we’re using the Node.js driver. A great thing about MongoDB’s replication and the driver is that if there’s a failover and a new primary is elected, the driver keeps chugging, staying connected to the replica sets and retrying reads and writes if needed. That’s prevented any downtime or connectivity issues for us.
EH: How are you securing MongoDB?
BB: Security is very important to us, and we rely on Atlas’s security controls to protect data. We’ve set up access controls so that our developers can work easily in the development environment, but there are only a few people who can access data in the production environment. IP access lists let us control who and what can access the database, including a few third-party applications that are integrated into Flowhub. We’re looking into implementing VPC Peering for our application connections, which currently go through the IP access list.
We’re also interested in Client-Side Field-Level Encryption. We already limit the amount of personally identifiable information (PII) we collect and store, and we’re very careful about securing the PII we do need to store. Client-Side Field-Level Encryption would let us encrypt that at the client level, before it ever reaches the database.
EH: You're running on Atlas, so what underlying cloud provider do your use?
BB: We’re running everything on Google Cloud. We use Atlas on Google Cloud infrastructure, and our app servers are running in Google Kubernetes Engine.
We also use several other Google services. We rely pretty heavily on Google Cloud Pub/Sub as a messaging backbone for an event-driven architecture. Our core applications initially were built with a fairly monolithic architecture, because it was the easiest approach to get going quickly. As we’ve grown, we’re moving more toward microservices. We’ve connected Pub/Sub to MongoDB Atlas, and we’re turning data operations into published events. Microservices can then subscribe to event topics and use the events to take action and maintain or audit local data stores.
Our data science team uses Google BigQuery as the backend to most of our own analytics tooling. For most uses, we migrate data from MongoDB Atlas to BigQuery via in-house ETL processes, but for more real-time needs we’re using Google Dataflow to connect to MongoDB’s oplog and stream data into BigQuery.
EH: As you grow your business and scale your MongoDB usage, what's been the most important resource for you?
BB: MongoDB’s Flex Consulting has been great for optimizing performance and scaling efficiently. Flowhub has been around for a number of years, and as we’ve grown, our database has grown and evolved. Some of the schema, query, and index decisions that we had made years ago weren’t optimized for what we’re doing now, but we hadn’t revisited them comprehensively. Especially when we were scaling our cluster, we knew that we could make more improvements. Our MongoDB Consulting Engineer investigated our data structure and how we were accessing data, performance, what indexes we had, and so on. We even got into the internals of the WiredTiger storage engine and what optimizations we could make there. We learned a ton about MongoDB, and the Consulting Engineer also introduced us to some tools so we could diagnose performance issues ourselves.
Based on our Consulting Engineer’s recommendations, we changed the structure of how we stored some data and reworked certain queries to improve performance. We also cleaned up a bunch of unnecessary indexes. We had created a number of indexes over the years for different query patterns, and our Consulting Engineer was able to identify which ones could be removed wholesale, and which indexes could be replaced with a single new one to cover different query patterns. We made some optimizations in Atlas as well, moving to a Low CPU instance based on the shape of our workload and changing to a more efficient backup option.
With the optimizations recommended in our consulting engagement, we were able to reduce our spend by more than 35%. MongoDB consulting paid for itself in less than a month, which was incredible. I had to develop a business case internally for investing in consulting, and this level of savings made it an easy sell.
The knowledge we picked up during our consulting engagement was invaluable. That’s something we’ll carry forward and that will continue to provide benefits. We’re much better at our indexing strategy, for example. Say you’re introducing a new type of query and thinking about adding an index: now we know what questions to ask. How often is this going to be run? Could you change the query to use an existing index, or change an existing index to cover this query? If we decide we need a new index, should we deprecate an old one?
With the optimizations recommended in our consulting engagement, we were able to reduce our spend by more than 35%. MongoDB consulting paid for itself in less than a month, which was incredible.Brad Beeler, Lead Architect, Flowhub
EH: What advice would you give to someone who's considering MongoDB for their next project?
BB: Take the time upfront to understand your data and how it’s going to be used. That’ll give you a good head start for structuring the data in MongoDB, designing queries, and implementing indexes. Obviously, Flex Consulting was very helpful for us on this front, so give that a look.
Try MongoDB in the Cloud
Create a free account and launch a cluster in minutes!