The Evolution of Data

Mark Porter

It used to be so simple. Not that long ago, the universe of corporate data was a fraction of its current size. We fit most of it into rigid little rows and columns. And we didn’t ask all that much of it: some transaction processing, a few charts and graphs, a little business intelligence.

I’m exaggerating, of course. We’ve been pushing the boundaries of data processing since at least 1964, when SABRE, the world’s first airline passenger system, launched on two IBM mainframes and 1,500 terminals, processing an average of one transaction per second. But it’s not hyperbole to say that today’s data would barely recognize its early ancestors. First of all, there is more of it – 59 zettabytes and counting. Second, the very definition of data has changed dramatically, expanding well beyond payroll records and stock prices to include diverse data types like weblogs, fraud scores, maps, and fingerprints.

But perhaps the biggest change has been the role data plays in the enterprise. Data has always been used to inform business strategy. But today, data often is the business strategy. Consider this: 20 years ago, there was no such thing as a Chief Data Officer. Today? Almost two-thirds of Fortune 1000 companies have one.

Why? Because we are asking more of our data than ever before. And rightly so. In the digital economy, every company competes on the basis of insight-driven innovation. More and more, those innovations take the form of software built around clever algorithms. And the raw material to both craft and execute those algorithms is data. All of which means the ability to efficiently and effectively manage data -- at speed -- is a strategic imperative for any company, in any industry.

And yet, even as the volume, variety, and strategic importance of data has rapidly evolved over the last two decades, many enterprises haven’t changed how they manage it. Of course, I’m talking about the continued use of legacy relational databases, which are too rigid and don’t scale well enough to handle the demands of modern application development. Solving this problem was the entire reason for the “NoSQL” movement in the late 2000s, and MongoDB’s invention of the document-oriented database in the first place.

But I’m talking about something bigger; a longer-term trend that demands a fresh look at the way we work with data. Our customers are telling us that the fundamental requirements of their various data sets aren’t just changing, they’re converging. This is a surprise, and a reversal of the trend of siloization and specialized tools for the last 50 years.

Let’s take a step back: For decades, enterprises have maintained systems of record and systems of engagement. Systems of record are foundational, mission-critical, sources of truth that are accessed primarily by internal programs and users. Systems of engagement are the digital interfaces with which customers and employees interact. And recently we have seen the addition of systems of insight, which combine data from various sources to inform decision making across the enterprise. For a long time, each system lived on different computers, had different data management requirements, and were funded by different departments.

But that is changing. With the hard and fast divisions between back office and front office dissolving, we now need all of our data systems to do everything. They need to be both fast and accurate. They need to be both accessible and secure. They need to handle both transactions and analytics.

In particular, with the rise of model training and inference, a different kind of analytics is arriving; one where it is programs that are asking the systems of insight questions and reacting to them in real time, rather than humans asking questions and then writing programs to implement them. This is a fundamental shift; so fundamental that you could liken it to the change from the IBM 7090s that powered SABRE to those that (will?) power SKYNET.

This “convergence” of data requirements is both a challenge and an opportunity. Just like document databases enabled us to rethink how data was accessed and stored, convergence is forcing us to rethink the systems we use to manage data across the enterprise yet again. Companies across the industry, from Snowflake to Databricks to MongoDB, and every cloud provider, are working to provide the systems that let companies get more value from their data, using microservices-based networks or programs that drive informed, real-time decision making.

Interestingly, this comes at a time when most companies are undergoing radical digital transformation projects in order to become innovation-powered, software-driven, and cloud-based. In other words, even though everyone is already quite busy, there has never been a better time to think beyond the database, and architect an actual “data platform” that can process, store, secure, and analyze data in real-time, across all the relevant data sets - either without copying the data or making such copying invisible.

Over the coming months, I’ll be sharing more about what these data platforms look like and how they support the creation of modern applications. I’ll also try to peer into the foggy future with you, looking at the constraints and freedoms of the modern enterprise, and predicting what products will be built to address that matrix.

But for now, I’m interested in hearing what you’re seeing under the hood of your data(base) estate. Are you experiencing a similar convergence of data requirements? If so, is it factored into your digital transformation strategy? And if not, what changes are you seeing in the way you use data across the enterprise? Please reach out to me here or on Twitter, at @MarkLovesTech - because I really do.

Thanks for reading - may the data odds be in your favor until we chat again.