The History of Big Data: From BC to AD

Big data may be the big buzzword of our time, but the concept goes back hundreds of years.

By definition, big data refers to any data sets that are too large or complex to be easily dealt with. In the 1600s, John Graunt, the father of modern demography, also worked with huge, overwhelming amounts of data about the population of London. 

But it all starts with data collection, and we know this began millennia earlier with the practice of census-taking. 

Hunting and gathering (of data)

A census is basically a way of gathering information on a population, and it’s not restricted to people. The earliest known census was conducted in Babylon in about 3800BC. Records suggest that the census counted the numbers of people and livestock, along with quantities of butter, milk, honey and vegetables.

These numbers were recorded on clay tablets although unfortunately, none of the raw data has been preserved. The Daily Telegraph muses that it could be because “the Babylonians probably sent the tablets through the equivalent of clay shredders to make sure their privacy was protected!”

The Bible also relates several accounts involving censuses, the most well known being the birth of Jesus in Bethlehem where Mary and Joseph had gone for a Roman census. 

Censuses were used by the ancient Romans solely for the purpose of determining taxes. A shame they didn’t do more with the data – because maybe they could have used it to predict the eventual downfall of the Roman Empire! 

Data of the dead

In the early 1600s, a London hatmaker named John Graunt tapped into overlooked data sources to produce remarkable insights about life, health and mortality in his city. 

He started studying death records that had been kept by London parishes and compiled fifty years of data into his book, Natural and Political Observations Made Upon the Bills of Mortality. This is also the first known table of public health data, and its timely arrival coincided with the waves of bubonic plague that were sweeping the region.

His report painted a vivid picture of how Londoners lived and died, and he was the first person to give an estimate of the city’s population. He even predicted the percentage of people who would live to each successive age and their life expectancy year by year. But the data he collected was not always thorough or accurate – for instance, Graunt observed that syphilis was often covered up as the cause of death. 

All these records were publicly available but before Graunt, no one had thought about aggregating and analyzing the information in this way. His work helped to surface valuable insights that would have been instrumental for the city in mapping disease outbreaks and making better decisions. 

AD: The rise and rise of alternative data

Fast forward to present day, when the world is practically drowning in data. Yet we are meaningfully using only a fraction of it.

Businesses, investors and research firms are mostly guided by traditional data – that is, the usual government or company-issued data such as earnings and economic reports. But the frequency and depth of such data are often insufficient for identifying opportunities and emerging trends. That’s why more are turning to data outside the traditional realm, that is ‘alternative data’. 

It’s growing fast, with the number of alternative data providers tripling in the last three years alone. But the concept itself isn’t really new. 

There’s an oft-told tale about Walmart founder Sam Walton who would count cars in parking lots as a barometer of business, and once was so absorbed in the task that he crashed his car into the back of a Walmart truck. Now this can be done more easily and at scale with satellite imagery. But the moral of the story is that patience isn’t necessarily a virtue when it comes to business or investing – after all, why wait for the quarterly sales report when you can monitor foot traffic or point-of-sale purchases in real time? 

At its essence, alternative data is any data that is under the radar and underutilized. This data doesn’t need to be exotic or complicated. You could say that over 300 years ago, Graunt was also tapping into alternative data by examining mortality records.

While Graunt had to crunch through all this data manually, we now have the ability to process vast amounts of complex information pretty quickly. That enables businesses and investors to glean insights faster so that they can act on them before their competitors do. 

But there is one issue Graunt would have run into today…

What syphilis can tell us about privacy

As Graunt had astutely observed that syphilis deaths were likely under-reported due to social stigma, people suffering from the venereal disease in that era probably wouldn’t have been thrilled about such information being exposed. 

We live in a pro-privacy world now, where high-profile scandals have made consumers increasingly distrustful of companies handling personal data. Ensuring data privacy and security should rightly be a top concern for every company. 

It is for this reason that a clear and hard distinction must be made between personal data and non-personal data. While it is legally and morally wrong to expose an individual suffering from syphilis, there’s a huge public benefit in tracking and aggregating anonymized cases. In the US, the Centers for Disease Control and Prevention (CDC) emphasizes the importance of national syphilis surveillance to understand how it spreads so it knows how to focus prevention efforts.

While most companies may not be dealing with matters of public health, it’s imperative that they handle their customers’ data with just as much sensitivity. Suburbia offers point-of-sale transaction data but we make sure our data sets are stripped of all personal details to begin with. This means we take it one step further than simply anonymizing the data – it’s not just about masking John Doe’s identity, but leaving any demographic information out completely.

The future of data

More businesses will find ways of harnessing the treasure trove of underutilized insights hidden in plain sight all around us. The Internet of Things means that the variety of data available to us will grow exponentially. 

In turn, this data will become more accessible as our ability to harvest usable information from big data improves by leaps and bounds with advancements in AI and machine learning.

At the same time, the growing privacy movement will shake up the advertising practices and business model of many companies. But with constraints comes creativity and new inputs for decision-making.

This will drive more companies to embrace and leverage alternative data. If investors are able to use it to generate higher stock returns, why can’t companies use it to improve their operations and grow their business? 

Ultimately, alternative data won’t be so ‘alternative’ in the future, as data becomes the next frontier for competition. Those who are able to tap into new sources to generate insights will be the victors in this brave new world awash with data – and those who fail will end up victims of their own complacency, much like the ancient Romans.

Leave a Reply