Drowning in Data: Why It's Time to End the Healthcare Data Lake

Jeff Needham

From digital check-ins, to connected devices and telehealth programs, patients expect the benefits of a more digitized healthcare experience. At the same time, they’re also demanding a more personalized approach from healthcare providers. This duality - the need to provide a more convenient experience with one that’s more tailored to the patient - is fueling a wave of technology modernization efforts and the replacement of monolithic legacy IT systems.

With limited re-use outside of the context they were built for and a reliance on nightly batch processing, legacy IT systems fail to deliver the services healthcare IT teams need or provide the experiences patients demand. Modernization should come with a move to microservices that can be used by multiple applications, agile teams that embrace domain driven design principles, and event busses like Kafka to deliver real-time data and functionality to users.

While this transformation is occurring, there’s an 800lb gorilla not being widely addressed. Analytics. What the healthcare industry doesn’t want to talk about, is how costly analytics has become; the people, the software, the infrastructure, and particularly how difficult it is to move data in and out of data lakes and warehouses. It's hindering the industry’s ability to deliver insights to patients and providers in a timely and efficient manner.

And yet, so many organizations are modernizing their analytics data warehouses and data lakes with an approach that simply updates the underlying technology. It’s a lift-and-shift effort of tremendous scale and cost, but one that is not addressing the underlying issues preventing the speedy delivery of meaningful insights.

Drowning in data: A 1980s model in the 2020s

While the business application landscape has changed, healthcare is still clinging to the same 1980’s paradigm when it comes to analytics data. It started by physically moving all the data from transactional systems into a single data warehouse or data lake (or worse, both), so as not to disrupt the performance of business applications by executing analytics queries against the transactional database.

Eventually, as data warehouses had enough relational tables and data in them, queries began to slow down, and even time-out before delivering results to end users. This gave rise to data marts, yet another database to copy the warehouse data into, using a star schema model to return query results more efficiently than in the relational warehouse.

In the last and current iteration of analytics data platforms, warehouses and data marts became augmented, and were even replaced in some cases, with data lakes. Technologies like Hadoop promised a panacea where all sorts of structured and unstructured data could be stored, and where queries against massive datasets could be executed. In reality it turned out to be a costly distraction, and one that did not make an organization's data easier to work with, or provide real-time data insights. Hence why it earned the nickname “data jail”. It was hard to load data into, and even harder to get data out of.

New technology, same challenges

While Hadoop and other technologies did not last long, they hung around just long enough to negatively alter the trajectory of many analytics shops, which are now investing heavily in migrating away from Hadoop, to cloud-based platforms. But, are these cloud alternatives solving the challenges of the Hadoop era? Can your organization rapidly experiment, innovate and serve up data insights from your data lake? Can you go from an idea to delivery in days? Or, is it weeks, months even?

Despite the significant amounts of time, money and people required to load data into these behemoth cloud data stores, they still exhibit the same challenges as their Hadoop-era predecessors. They are difficult to load and even more difficult to make changes to. They can never realistically offer real-time or even near-real-time processing, the response time that patients and providers expect.

Worse, they contain so much data, that making sense of it is a task often left to either a sophisticated add-on like AWS HealthLake, or specialized data engineering and data science teams. To add to this, the cloud based analytics systems are typically managed by a single team that’s responsible for collecting, understanding and storing data from all of the different domains within an organization.

This is what we like to call a modernized monolith, the pairing of updated technology with a failure to fundamentally address or improve the overall limitations or constraints of a system or process. It’s an outdated and inefficient approach that’s simply been “lifted and shifted” from one technology to another. Many data lake implementations take a modernized monolithic approach which, like their predecessors, results in a bottleneck and difficulty in getting information out, once it goes in.

In a world where data is at the center of every innovative business, and real-time analytics is top-of-mind for executives, product owners and architects alike, most data lakes don’t deliver. Transforming your organization into a data-driven enterprise requires a more agile approach to managing and working with ever-growing sums of data.

The rise of the operational data layer — an ODS renaissance

To provide meaningful insights to patients in a timely and efficient manner, two very important things need to happen. Healthcare organizations need to overcome the limitations of legacy systems, and they need to make sense of a lot of very complex data. A lift-and-shift approach migrating data into a data lake will not solve these problems. In addition, it’s not feasible or advisable to spend tens, or even hundreds of millions of dollars to replace legacy systems as a precursor to a digital engagement strategy. The competition will leap-frog you before your efforts are even half complete.

So, what can be done? Can your organization make better sense of its data, and at the same time mitigate the issues legacy systems impose? Can this be done without a herculean effort?

The answer is yes. The solution is an operational data layer (ODL), formerly known as the operational data store. It’s a method that’s been tried and tested by major corporations, and is the underlying technology that powers many of the apps you interact with on your phone.

An ODL lets you build new features without existing system limitations. It lets you summarize, analyze, and respond to data events, in real-time. It helps you migrate from legacy systems, without incurring the cost and complexity of replacing legacy systems. It can give your teams the speed and agility that working against a data lake will simply never have.

Data lakes and warehouses have their place, and the kinds of long-term data insights and data science benefits that can be gleaned from them are significant. The challenge, however, is reacting in real-time, and serving those insights to patients, quickly. An ODL strategy offers the best, most cost and time efficient approach to mitigate legacy system issues, without the pain of replacing legacy systems. Investing in an ODL strategy will both solve your legacy modernization dilemma, and it will help you deliver real-time data and analytics at the speed of an agile software delivery team.

MongoDB is an ideal ODL provider. Not only does it have the underlying, flexible document-based database, but it is also a data platform, empowering your developers to focus on building features, not managing databases and data.

If you’re interested in learning about how MongoDB has enabled organizations large and small to successfully implement ODL strategies and tackle other burning healthcare issues, click here.