The History of Big Data: From BC to AD

Big data may be the big buzzword of our time, but the concept goes back hundreds of years.

By definition, big data refers to any data sets that are too large or complex to be easily dealt with. In the 1600s, John Graunt, the father of modern demography, also worked with huge, overwhelming amounts of data about the population of London. 

But it all starts with data collection, and we know this began millennia earlier with the practice of census-taking. 

Hunting and gathering (of data)

A census is basically a way of gathering information on a population, and it’s not restricted to people. The earliest known census was conducted in Babylon in about 3800BC. Records suggest that the census counted the numbers of people and livestock, along with quantities of butter, milk, honey and vegetables.

These numbers were recorded on clay tablets although unfortunately, none of the raw data has been preserved. The Daily Telegraph muses that it could be because “the Babylonians probably sent the tablets through the equivalent of clay shredders to make sure their privacy was protected!”

The Bible also relates several accounts involving censuses, the most well known being the birth of Jesus in Bethlehem where Mary and Joseph had gone for a Roman census. 

Censuses were used by the ancient Romans solely for the purpose of determining taxes. A shame they didn’t do more with the data – because maybe they could have used it to predict the eventual downfall of the Roman Empire! 

Data of the dead

In the early 1600s, a London hatmaker named John Graunt tapped into overlooked data sources to produce remarkable insights about life, health and mortality in his city. 

He started studying death records that had been kept by London parishes and compiled fifty years of data into his book, Natural and Political Observations Made Upon the Bills of Mortality. This is also the first known table of public health data, and its timely arrival coincided with the waves of bubonic plague that were sweeping the region.

His report painted a vivid picture of how Londoners lived and died, and he was the first person to give an estimate of the city’s population. He even predicted the percentage of people who would live to each successive age and their life expectancy year by year. But the data he collected was not always thorough or accurate – for instance, Graunt observed that syphilis was often covered up as the cause of death. 

All these records were publicly available but before Graunt, no one had thought about aggregating and analyzing the information in this way. His work helped to surface valuable insights that would have been instrumental for the city in mapping disease outbreaks and making better decisions. 

AD: The rise and rise of alternative data

Fast forward to present day, when the world is practically drowning in data. Yet we are meaningfully using only a fraction of it.

Businesses, investors and research firms are mostly guided by traditional data – that is, the usual government or company-issued data such as earnings and economic reports. But the frequency and depth of such data are often insufficient for identifying opportunities and emerging trends. That’s why more are turning to data outside the traditional realm, that is ‘alternative data’. 

It’s growing fast, with the number of alternative data providers tripling in the last three years alone. But the concept itself isn’t really new. 

There’s an oft-told tale about Walmart founder Sam Walton who would count cars in parking lots as a barometer of business, and once was so absorbed in the task that he crashed his car into the back of a Walmart truck. Now this can be done more easily and at scale with satellite imagery. But the moral of the story is that patience isn’t necessarily a virtue when it comes to business or investing – after all, why wait for the quarterly sales report when you can monitor foot traffic or point-of-sale purchases in real time? 

At its essence, alternative data is any data that is under the radar and underutilized. This data doesn’t need to be exotic or complicated. You could say that over 300 years ago, Graunt was also tapping into alternative data by examining mortality records.

While Graunt had to crunch through all this data manually, we now have the ability to process vast amounts of complex information pretty quickly. That enables businesses and investors to glean insights faster so that they can act on them before their competitors do. 

But there is one issue Graunt would have run into today…

What syphilis can tell us about privacy

As Graunt had astutely observed that syphilis deaths were likely under-reported due to social stigma, people suffering from the venereal disease in that era probably wouldn’t have been thrilled about such information being exposed. 

We live in a pro-privacy world now, where high-profile scandals have made consumers increasingly distrustful of companies handling personal data. Ensuring data privacy and security should rightly be a top concern for every company. 

It is for this reason that a clear and hard distinction must be made between personal data and non-personal data. While it is legally and morally wrong to expose an individual suffering from syphilis, there’s a huge public benefit in tracking and aggregating anonymized cases. In the US, the Centers for Disease Control and Prevention (CDC) emphasizes the importance of national syphilis surveillance to understand how it spreads so it knows how to focus prevention efforts.

While most companies may not be dealing with matters of public health, it’s imperative that they handle their customers’ data with just as much sensitivity. Suburbia offers point-of-sale transaction data but we make sure our data sets are stripped of all personal details to begin with. This means we take it one step further than simply anonymizing the data – it’s not just about masking John Doe’s identity, but leaving any demographic information out completely.

The future of data

More businesses will find ways of harnessing the treasure trove of underutilized insights hidden in plain sight all around us. The Internet of Things means that the variety of data available to us will grow exponentially. 

In turn, this data will become more accessible as our ability to harvest usable information from big data improves by leaps and bounds with advancements in AI and machine learning.

At the same time, the growing privacy movement will shake up the advertising practices and business model of many companies. But with constraints comes creativity and new inputs for decision-making.

This will drive more companies to embrace and leverage alternative data. If investors are able to use it to generate higher stock returns, why can’t companies use it to improve their operations and grow their business? 

Ultimately, alternative data won’t be so ‘alternative’ in the future, as data becomes the next frontier for competition. Those who are able to tap into new sources to generate insights will be the victors in this brave new world awash with data – and those who fail will end up victims of their own complacency, much like the ancient Romans.

Lemon Lime and Data: How Sprite Has the Secret to Data Security

It’s cold, it’s refreshing and it pairs well with spicy food, but what can Sprite teach the world’s biggest tech companies?

In the raging debate about companies’ use of personal data for profit, people often think there are only two choices: Hand over all your personal data, or stop using online services like Facebook or Google Maps completely.

But this puts the burden of responsibility on consumers, who may not have the resources or information available to make the right decision. Instead, companies handling personal data should take proactive steps for better, safer products. And they only have to look to the soda industry for inspiration.

For decades, soda titans like Coca-Cola and Pepsi enjoyed uninterrupted growth, building global beverage empires and becoming household names. While there were always concerns linking soda to health problems, they didn’t start hitting the mainstream consciousness until the end of the 20th century. By then, soft drinks makers were often fingered as the sole culprits for rising obesity rates.

Today, dozens of countries around the world, including the UK, France and Norway, have slapped a tax on sugary drinks. While the tax has not yet been introduced in the Netherlands, Coca-Cola took an unprecedented step there to stay ahead of regulations.  

Coca-Cola’s game-changing decision

In 2017, the company pulled normal Sprite from the market, replacing it with the no-sugar Sprite Zero. This means when you order a Sprite in Holland, you will be served the sugar and calorie-free version by default. It has become the “regular” Sprite.


Coca-Cola said Sprite had been performing well, so it wasn’t just another move to boost sales. Instead, the beverage giant was making an important step to future-proof its business and provide a healthier product, without forcing customers to choose. Although they eliminated the bad choices, they were still able to offer variety to consumers, with new flavors like lemon lime and cucumber. 

So what if we take the same step for data? 


While businesses handling our personal data assure us that our privacy matters to them, the news headlines tell a radically different story. How can consumers trust companies when there are high-profile data breaches and incidents of companies misusing our data on a regular basis?

Most firms handling personal data are unlikely to make a change unless they feel the noose of legislation tightening. But as we’ve seen before, legislation is not a magic bullet. Consider Europe after new data and privacy protections (grouped under GDPR) went into effect in 2018. According to the International Association of Privacy Professionals, almost 100,000 privacy complaints have been filed but only a few have led to meaningful penalties.

In the case of soft drinks, Dutch experts have questioned whether a sugar tax would even make a serious dent in consumption unless the tax was a substantial one. 

Even when there are stricter rules in place, they can still fail to change consumer behavior or address the loopholes that allow companies to conduct business as usual. The ubiquity of those consent forms on websites have only encouraged people to adopt a click-and-ignore mentality, so that they can just make the pesky pop-up disappear as quickly as possible.

When it comes to data privacy, there are those who argue that people can actively choose not to use the services of companies that exploit their data. Well, maybe they shouldn’t have to make that choice themselves. 

Facebook Zero 

Just like how Coca-Cola offers only the zero-sugar Sprite in the Netherlands, zero personal data could also be the norm. Companies may need to collect some user data in the course of doing business but there should be limits as to how much information they can amass on an individual. Why does a social network even need to know your gender, in the first place?

It has become untenable for firms to say they value consumer privacy while collecting and hoarding user data, putting it at greater risk of breach or misuse. The same way it was impossible for soft drinks makers to say they care about their customers’ health while shilling beverages loaded with sugar.

More importantly, instead of trying to defend their key sales driver, the soda companies innovated and looked for new opportunities. They reformulated, they introduced smaller packages and they made it easier for consumers to embrace a healthier lifestyle. As a result, Coca-Cola’s revenues have stayed sweet even if their drinks haven’t.

Finally, what could be the most interesting parallel between sodas and personal data monetization is their innocuous beginnings. 

The first fizzy drinks were marketed as health drinks. If you were ordering a Sprite occasionally to wash down your meal, then soft drinks weren’t going to send you to an early grave. But over the years, with growing prosperity and the convenience of technology like vending machines, people started guzzling unhealthy amounts of soda.

It’s much the same with the harvesting of personal data. Initially, receiving services for free in exchange for your data didn’t seem like a bad trade-off. But increasingly, consumers are beginning to realize they are getting the raw end of the deal. A tectonic shift has occurred and companies, especially Big Tech, need to make major changes to their approach. 

This is already happening in the world of alternative data – for instance, Suburbia tracks sales of consumer products like Sprite, with zero personal information. It shows there can be real value in non-personal data and it is how we harness it that matters.

Can today’s companies follow in the footsteps of the soda giants, and come up with a new formula for monetization? It might seem impossible, but Sprite shows lemon, lime and consumer benefits can win together. 

How can Facebook solve its privacy crisis? Just ask Otis Elevator

You’d be hard-pressed to think of two terms that have captured the tech zeitgeist more than “big data” and “data privacy”. So what do they have to do with a 160-year-old machine?

Firstly, you might ride this humble box several times a day without realizing its significant contribution to urban life. The elevator was a transformative technology that ushered in the era of the modern city and made skyscrapers possible. 

Like any technology, its evolution over time has had ups and downs, but the advancements made in its history can teach us some important things:

  1. Focus on building trust through action, not communication.

When the first passenger elevators were introduced in the early-to-mid-nineteenth century, the rate of adoption was slow. After all, there was always the risk a cable would snap, plunging the elevator and all its occupants to their possible deaths. “Thanks, but I’ll take the stairs,” was likely the common rejoinder at the time.

The makers of elevators could have dismissed them as one-off incidents, or showed how statistically rare elevator-related injuries and fatalities were. But it wouldn’t have mattered as people simply didn’t feel safe getting in there.

What really changed people’s perception was a critical safety feature that was first demonstrated by Elisha Otis at a world’s fair in New York. As detailed in the book Lifted: A Cultural History of the Elevator, the American inventor stood on a platform high above the audience when the only rope holding it up was cut with an ax on his orders. The safety mechanism kicked in immediately, preventing the platform from plummeting to the ground. 

After this, public confidence in elevators soared, particularly in Otis’ safety elevators. He became inundated with orders, which doubled every year. 

It’s a crucial lesson to social media and tech companies that the elevator pitch for their technology matters less than than their ‘elevator moment’. Most will pay lip service to the notion of privacy, without demonstrating the tangible and practical steps they’re taking to ensure the safety of users’ data. Any organization dealing with personal data needs to plan for worst case scenarios and prepare for them appropriately by having safeguards in place. Only then can they truly protect individual privacy and earn consumer trust. 

2. What seems like an obstacle now will be a pivotal opportunity in hindsight.

When GDPR (General Data Protection Regulation) was first introduced, many companies viewed it as a hurdle to overcome. How could they now monetize their data or personalize their marketing? 

It helps to take a step back into a time when elevators were still manually controlled by an operator. Sitting in an elevator to press buttons all day was an actual paying job. Then, in the 1950s came automatic elevators that didn’t need human operators, though there was just one little problem: People hated them. 

As a professor of architectural history tells The Globe and Mail, there are “stories of people walking into elevators and walking back out”. In fact, it took a good part of a decade for the technology to become commonplace and for people to get used to it.

It seems laughable now, the idea that people didn’t see it as their job to push a button and simply felt uncomfortable doing so. But aren’t we going to also look back at this era, when companies regard privacy regulations as a demanding obstacle, with incredulity? 

After all, GDPR and the growing wave of legislation worldwide should be seen as a watershed moment for businesses. This is a turning point for marketers to stop microtargeting with personal data when there is a wealth of other types of data at their disposal that can be used to generate relevant and effective content. 

There are many ways to personalize marketing without the use of personal data. For instance, there is what GDPR categorizes as pseudonymous data (data that can’t be used to directly identify an individual) like the customer’s local weather. Is it more relevant for a brand to bombard a customer with ads for umbrellas because he viewed them once, or to offer an umbrella to everyone living within a particular area on a rainy day? Does a brand have to know about your allergies, or can it use available pollen count data by geographic region?

Companies simply need to ‘push the button’ and stop seeing compliance as a chore. Instead, they need to embrace data privacy as a valuable opportunity to build trust and use non-personal data more creatively. 

3. Fast and reliable data makes it possible to predict things before they happen.

The elevator has come a pretty long way since Otis brought it into the mainstream. They have not only gotten better, faster, safer – but also a lot smarter. 

On the surface, elevators may not seem to have changed much over the last decades. In reality, the technology that keeps them moving smoothly is cutting-edge. AI and real-time data are being used by major elevator manufacturers for predictive maintenance – so they can spot problems before they arise and better anticipate breakdowns. For instance, ThyssenKrupp’s elevators are connected to the cloud, collecting data from its sensors, and transforming that data into actionable analytics. 

KONE has a similar system that incorporates IBM’s Watson IoT. Using data points transmitted by elevators across the world, KONE can glean historic failure rates of different elevator parts and the preceding conditions. For example, a temperature reading that’s slightly above normal could be a sign of engine trouble, but the system can also note if it’s a hot day, which could be a factor too. Its forecasting also improves as more data is fed into the model. 

Similarly, faster access to better data is needed to make critical business or investment decisions. Relying on traditional sources of information like earnings, filings and economic reports is akin to elevator manufacturers depending on written maintenance records. 

But why wait 90 days for a quarterly report when one can access a steady stream of intelligent data? New sources of information, or what we call alternative data, are constantly generated around us and investment managers can leverage them to get an unprecedented level of transparency into company performance on a near real-time basis.

From anonymized transaction data to price trackers, these can be used to generate predictive insights so proactive decisions can be made, instead of mere reactions to events as they occur. For investors, that can help them forecast market movements and trends, and manage risk. 

To sum it up, businesses and investors need to use data and privacy as the vehicle of change, much as the elevator was once upon a time.


Data Monetization in a Pro-Privacy World

(First published on Dataconomy)

For over the last decade, some of the most successful companies on earth have made their riches by mining user data and selling it to advertisers. The big question is whether this will continue to be a sustainable business model with the ever-mounting scrutiny on data privacy and if not – what’s the alternative?

Many say the Cambridge Analytica scandal sparked a great data awakening by bringing to light the ways in which some companies were amassing and monetizing personal data about their users. As a result, Facebook was recently slapped with a record $5 billion fine and new privacy checks.

This isn’t a problem that is exclusive to the giants of Silicon Valley. In Europe, hefty fines have also recently been meted out to British Airways and Marriott for data breaches. As data protection complaints have doubled year-on-year, regulators will be getting tougher on companies to ensure their compliance with GDPR (General Data Protection Regulation).

Meanwhile, GDPR has driven a global movement as governments outside the EU, from Australia to Brazil, are set to introduce similar data protection regulations.

In addition, GDPR has helped to create greater awareness about data protection among the general public. The European Commission’s March 2019 Eurobarometer survey showed that about 67% of European citizens surveyed know what GDPR is.

The convergence of a compliance culture within organizations, stricter data privacy regulations globally, and consumers becoming more aware of their rights will continue to have a huge impact on businesses that profit from personal data, and even any business which collects it.

The situation demands urgency as the stakes have never been higher. According to a report by Gartner, by 2020, personal data will represent the largest area of privacy risk for 70% of organizations, up from 10% in 2018.

But better privacy for individuals doesn’t mean it’s bad for business. On the contrary, companies can use this opportunity to establish trust with customers while becoming more thoughtful and innovative about their approach to data monetization.

For many firms, data monetization has been inextricably linked with the personal data of their customers. However, they could be collecting, generating or archiving other types of non-personal data that could be valuable to certain end users. That is, the alternative data that may even be overlooked by the business generating it.

This data might be structured or unstructured, but new tools and technologies have made it easier to mine and process such data into insights. These insights could serve as timely intelligence to those in other sectors, like economists, analysts or investors looking to identify patterns and trends.

In fact, there are many use cases for such alternative data in the world of investing when every bit of timely information helps to gain an edge. This is where anonymized and aggregated data matters most and personally identifiable information has zero value. What economists and asset managers most want to know is how many soft drinks Coca Cola is selling across Europe this quarter, not whether John Doe bought a Coke.

The growing focus on privacy doesn’t mean data monetization has been taken off the table. Data will always be an important and valuable asset for any organization, but it needs to be harnessed with the full respect of individual rights to privacy. 

Valuing Your Data: A Checklist For Companies Looking To Monetize

Not all data is created equal.

When companies make their tentative first steps on the road to direct data monetization (that is, selling non-personal data they own), they have to start by understanding the value of their data. They need to assess what types of data they are collecting or generating in the course of business, and whether these could potentially be a new driver of revenue.


It’s often said that data is an important and valuable asset in any organization, but there’s a reason why it never appears on a balance sheet. Data valuation is a complex and challenging exercise. But knowing what their data is worth can help companies explore monetization, allocate resources and properly structure their technology infrastructure. 

In addition, they have to understand the market’s perception of value. Some firms might be overestimating the monetary value of their data, while others are unaware that their data is valuable, often to people in sectors and industries they may not have even thought of. A Forbes contributor compared the latter to the instance when companies realize they’re sitting on patents they don’t really need, but actually have value to someone else.

So what are the fundamental characteristics of high-quality data that organizations need to consider when trying to measure its value? Or more simply put – what makes data valuable and how much is your data worth?

1. Does it tell a story?

Does the data tell you something about the economy or market trends? Does it track which brands are growing or which products are in high demand? The better your data reflects real world behavior, the higher its value.

2. Is it unique? 

Generally, the more exclusive the dataset, the more lucrative it is. Do you have data that nobody else has, or is it already widely available from other sources? 

3. Is it anonymized and compliant?

If you are planning to share raw data, it needs to be stripped of all personally identifiable information (PII) to protect the privacy of individual customers. This is critical in order to monetize data responsibly, as data privacy is not optional but essential.

4. Is it timely? 

Is your dataset updated on a weekly, monthly or a near real-time basis? The latter is most desired, especially for economists and institutional investors that are looking for faster insights to stay ahead of the market. 

5. Is it specific? 

The more granular and detailed the data, the more valuable it is. (Though to reiterate, personal details should definitely be excluded!) For example, data showing a million smartphones were sold last week is valuable. But its value grows significantly if it also indicates how many of those smartphones were iPhone X or Samsung Galaxy, etc.

6. Is it complete?

Do you have data for every day, without any gaps? Missing data could be as bad as inaccurate data, as it provides only a partial view of the real trends. 

7. Is it reliable and consistent? 

If there are multiple servers where data is collected or stored, do they all add up properly? Or do they contradict with one another? Are there potential duplicates or other data errors?

8. Do you have archives of historical data? 

The further back your data goes, the better. Historical information is used in all kinds of analytics. In most use cases, two or more years of data are important to see how trends are changing over time.