The History of Big Data: From BC to AD

Big data may be the big buzzword of our time, but the concept goes back hundreds of years.

By definition, big data refers to any data sets that are too large or complex to be easily dealt with. In the 1600s, John Graunt, the father of modern demography, also worked with huge, overwhelming amounts of data about the population of London. 

But it all starts with data collection, and we know this began millennia earlier with the practice of census-taking. 

Hunting and gathering (of data)

A census is basically a way of gathering information on a population, and it’s not restricted to people. The earliest known census was conducted in Babylon in about 3800BC. Records suggest that the census counted the numbers of people and livestock, along with quantities of butter, milk, honey and vegetables.

These numbers were recorded on clay tablets although unfortunately, none of the raw data has been preserved. The Daily Telegraph muses that it could be because “the Babylonians probably sent the tablets through the equivalent of clay shredders to make sure their privacy was protected!”

The Bible also relates several accounts involving censuses, the most well known being the birth of Jesus in Bethlehem where Mary and Joseph had gone for a Roman census. 

Censuses were used by the ancient Romans solely for the purpose of determining taxes. A shame they didn’t do more with the data – because maybe they could have used it to predict the eventual downfall of the Roman Empire! 

Data of the dead

In the early 1600s, a London hatmaker named John Graunt tapped into overlooked data sources to produce remarkable insights about life, health and mortality in his city. 

He started studying death records that had been kept by London parishes and compiled fifty years of data into his book, Natural and Political Observations Made Upon the Bills of Mortality. This is also the first known table of public health data, and its timely arrival coincided with the waves of bubonic plague that were sweeping the region.

His report painted a vivid picture of how Londoners lived and died, and he was the first person to give an estimate of the city’s population. He even predicted the percentage of people who would live to each successive age and their life expectancy year by year. But the data he collected was not always thorough or accurate – for instance, Graunt observed that syphilis was often covered up as the cause of death. 

All these records were publicly available but before Graunt, no one had thought about aggregating and analyzing the information in this way. His work helped to surface valuable insights that would have been instrumental for the city in mapping disease outbreaks and making better decisions. 

AD: The rise and rise of alternative data

Fast forward to present day, when the world is practically drowning in data. Yet we are meaningfully using only a fraction of it.

Businesses, investors and research firms are mostly guided by traditional data – that is, the usual government or company-issued data such as earnings and economic reports. But the frequency and depth of such data are often insufficient for identifying opportunities and emerging trends. That’s why more are turning to data outside the traditional realm, that is ‘alternative data’. 

It’s growing fast, with the number of alternative data providers tripling in the last three years alone. But the concept itself isn’t really new. 

There’s an oft-told tale about Walmart founder Sam Walton who would count cars in parking lots as a barometer of business, and once was so absorbed in the task that he crashed his car into the back of a Walmart truck. Now this can be done more easily and at scale with satellite imagery. But the moral of the story is that patience isn’t necessarily a virtue when it comes to business or investing – after all, why wait for the quarterly sales report when you can monitor foot traffic or point-of-sale purchases in real time? 

At its essence, alternative data is any data that is under the radar and underutilized. This data doesn’t need to be exotic or complicated. You could say that over 300 years ago, Graunt was also tapping into alternative data by examining mortality records.

While Graunt had to crunch through all this data manually, we now have the ability to process vast amounts of complex information pretty quickly. That enables businesses and investors to glean insights faster so that they can act on them before their competitors do. 

But there is one issue Graunt would have run into today…

What syphilis can tell us about privacy

As Graunt had astutely observed that syphilis deaths were likely under-reported due to social stigma, people suffering from the venereal disease in that era probably wouldn’t have been thrilled about such information being exposed. 

We live in a pro-privacy world now, where high-profile scandals have made consumers increasingly distrustful of companies handling personal data. Ensuring data privacy and security should rightly be a top concern for every company. 

It is for this reason that a clear and hard distinction must be made between personal data and non-personal data. While it is legally and morally wrong to expose an individual suffering from syphilis, there’s a huge public benefit in tracking and aggregating anonymized cases. In the US, the Centers for Disease Control and Prevention (CDC) emphasizes the importance of national syphilis surveillance to understand how it spreads so it knows how to focus prevention efforts.

While most companies may not be dealing with matters of public health, it’s imperative that they handle their customers’ data with just as much sensitivity. Suburbia offers point-of-sale transaction data but we make sure our data sets are stripped of all personal details to begin with. This means we take it one step further than simply anonymizing the data – it’s not just about masking John Doe’s identity, but leaving any demographic information out completely.

The future of data

More businesses will find ways of harnessing the treasure trove of underutilized insights hidden in plain sight all around us. The Internet of Things means that the variety of data available to us will grow exponentially. 

In turn, this data will become more accessible as our ability to harvest usable information from big data improves by leaps and bounds with advancements in AI and machine learning.

At the same time, the growing privacy movement will shake up the advertising practices and business model of many companies. But with constraints comes creativity and new inputs for decision-making.

This will drive more companies to embrace and leverage alternative data. If investors are able to use it to generate higher stock returns, why can’t companies use it to improve their operations and grow their business? 

Ultimately, alternative data won’t be so ‘alternative’ in the future, as data becomes the next frontier for competition. Those who are able to tap into new sources to generate insights will be the victors in this brave new world awash with data – and those who fail will end up victims of their own complacency, much like the ancient Romans.

Lemon Lime and Data: How Sprite Has the Secret to Data Security

It’s cold, it’s refreshing and it pairs well with spicy food, but what can Sprite teach the world’s biggest tech companies?

In the raging debate about companies’ use of personal data for profit, people often think there are only two choices: Hand over all your personal data, or stop using online services like Facebook or Google Maps completely.

But this puts the burden of responsibility on consumers, who may not have the resources or information available to make the right decision. Instead, companies handling personal data should take proactive steps for better, safer products. And they only have to look to the soda industry for inspiration.

For decades, soda titans like Coca-Cola and Pepsi enjoyed uninterrupted growth, building global beverage empires and becoming household names. While there were always concerns linking soda to health problems, they didn’t start hitting the mainstream consciousness until the end of the 20th century. By then, soft drinks makers were often fingered as the sole culprits for rising obesity rates.

Today, dozens of countries around the world, including the UK, France and Norway, have slapped a tax on sugary drinks. While the tax has not yet been introduced in the Netherlands, Coca-Cola took an unprecedented step there to stay ahead of regulations.  

Coca-Cola’s game-changing decision

In 2017, the company pulled normal Sprite from the market, replacing it with the no-sugar Sprite Zero. This means when you order a Sprite in Holland, you will be served the sugar and calorie-free version by default. It has become the “regular” Sprite.


Coca-Cola said Sprite had been performing well, so it wasn’t just another move to boost sales. Instead, the beverage giant was making an important step to future-proof its business and provide a healthier product, without forcing customers to choose. Although they eliminated the bad choices, they were still able to offer variety to consumers, with new flavors like lemon lime and cucumber. 

So what if we take the same step for data? 


While businesses handling our personal data assure us that our privacy matters to them, the news headlines tell a radically different story. How can consumers trust companies when there are high-profile data breaches and incidents of companies misusing our data on a regular basis?

Most firms handling personal data are unlikely to make a change unless they feel the noose of legislation tightening. But as we’ve seen before, legislation is not a magic bullet. Consider Europe after new data and privacy protections (grouped under GDPR) went into effect in 2018. According to the International Association of Privacy Professionals, almost 100,000 privacy complaints have been filed but only a few have led to meaningful penalties.

In the case of soft drinks, Dutch experts have questioned whether a sugar tax would even make a serious dent in consumption unless the tax was a substantial one. 

Even when there are stricter rules in place, they can still fail to change consumer behavior or address the loopholes that allow companies to conduct business as usual. The ubiquity of those consent forms on websites have only encouraged people to adopt a click-and-ignore mentality, so that they can just make the pesky pop-up disappear as quickly as possible.

When it comes to data privacy, there are those who argue that people can actively choose not to use the services of companies that exploit their data. Well, maybe they shouldn’t have to make that choice themselves. 

Facebook Zero 

Just like how Coca-Cola offers only the zero-sugar Sprite in the Netherlands, zero personal data could also be the norm. Companies may need to collect some user data in the course of doing business but there should be limits as to how much information they can amass on an individual. Why does a social network even need to know your gender, in the first place?

It has become untenable for firms to say they value consumer privacy while collecting and hoarding user data, putting it at greater risk of breach or misuse. The same way it was impossible for soft drinks makers to say they care about their customers’ health while shilling beverages loaded with sugar.

More importantly, instead of trying to defend their key sales driver, the soda companies innovated and looked for new opportunities. They reformulated, they introduced smaller packages and they made it easier for consumers to embrace a healthier lifestyle. As a result, Coca-Cola’s revenues have stayed sweet even if their drinks haven’t.

Finally, what could be the most interesting parallel between sodas and personal data monetization is their innocuous beginnings. 

The first fizzy drinks were marketed as health drinks. If you were ordering a Sprite occasionally to wash down your meal, then soft drinks weren’t going to send you to an early grave. But over the years, with growing prosperity and the convenience of technology like vending machines, people started guzzling unhealthy amounts of soda.

It’s much the same with the harvesting of personal data. Initially, receiving services for free in exchange for your data didn’t seem like a bad trade-off. But increasingly, consumers are beginning to realize they are getting the raw end of the deal. A tectonic shift has occurred and companies, especially Big Tech, need to make major changes to their approach. 

This is already happening in the world of alternative data – for instance, Suburbia tracks sales of consumer products like Sprite, with zero personal information. It shows there can be real value in non-personal data and it is how we harness it that matters.

Can today’s companies follow in the footsteps of the soda giants, and come up with a new formula for monetization? It might seem impossible, but Sprite shows lemon, lime and consumer benefits can win together. 

Suburbia Goes to Japan: A Note from the CEO

The first Dutch ship arrived in Japan in the 17th century. It was called De Liefde, meaning love. Its arrival led to such strong links that, between 1639 and 1853, the Netherlands was the only European country allowed to trade with Japan. 

This trade was not only in physical goods, but in art, culture and knowledge. This knowledge sharing continues to this present day – in the shape of data.


As a data company, this special historical relationship between both nations sprang to my mind in September, when I was informed that Suburbia had become the first ever Dutch startup to be selected for Fintech Business Camp Tokyo – an accelerator program run by the office of the mayor of Tokyo along with Accenture Japan.


Over the last few months, I have spent a lot of time understanding Tokyo and eating my weight in kashiage and yakiniku. Apart from gaining weight, I have also gained new perspectives into the Japanese market and made many valuable connections within the industry. 


Many say it’s not easy for foreign firms to crack the Japanese market because of complex bureaucracy and cultural factors. This is precisely why the support of the Tokyo Metropolitan Government (TMG) and Accenture has been so valuable, in providing us with access to top domestic companies and counseling us on things big and small, including the intricacies of Japanese business etiquette


We recently concluded the program with a pitch in front of members of TMG, media and some of Japan’s leading companies. We showed how our Amsterdam-based startup is building innovative technology to solve some of the biggest problems facing both data providers and data users. This technology transcends borders – we can process data from anywhere in the world and transform it into a rich source of insights. 


Japan is interesting for us for several reasons. There is a growing shift from a cash-focused economy to contactless and payment apps, which will generate a flood of raw data. If collected and structured, properly and safely, this data has tremendous value. The Japanese government has already proposed policies to encourage the sharing of this ‘industrial data’ and companies are beginning to take notice. As the use of alternative data in investment decisions rises rapidly, Japan is uniquely positioned to leverage new data and use it to make better decisions for its large pension funds and asset management industry.


While we have been working with mostly early adopters based in Europe and the US, we are witnessing the global rise of alternative data, especially from the frontlines of great initiatives like the Fintech Business Camp. With four hundred years of history between the Netherlands and Japan, we hope to contribute to four hundred more.

-Hamza Khan, CEO, Suburbia

Suburbia launches luxury cosmetics dataset for investment insights

29 October 2019, Amsterdam – Suburbia, a technology company specializing in alternative data solutions, has introduced its second offering that leverages millions of anonymized transactions to predict the performance of luxury brands in the beauty and personal care space.

Suburbia has partnered with companies in the payments ecosystem to collect receipt line-level data from multiple sources. This unique dataset tracks sales of luxury cosmetics and fragrances in France, the largest market for this segment in Europe and the fourth largest in the world, with a total value of three billion euros. 

“France is not just a large market for the world’s biggest luxury companies and beauty brands, it is also a trendsetter and tastemaker,” said Hamza Khan, CEO of Suburbia. “By collecting accurate sales across the country, our product is a powerhouse for what is popular in France, and a strong indicator of what will be popular globally.”

The new dataset delivers daily signals on publicly listed and private companies, including key players in the industry such as L’Óréal Luxe, Coty Inc., Estée Lauder Companies and LVMH. It tracks over 100 brands including Dior, Chanel, Hermès, Hugo Boss, Kenzo, Lancôme and YSL. The data product has been built specifically for investors who want granular insights into how these companies’ main revenue drivers are performing, whether by brand, category or product.

According to a recent report, among all luxury goods sectors, cosmetics and fragrances have witnessed the highest sales growth.* The market is expected to grow annually by 3.3% (CAGR 2019-2023).**

Suburbia’s proprietary technology is capable of processing millions of consumer purchases. No personal information is ever used or shared in the process. This data is updated on a daily basis with a one-day lag, so investors can get up-to-date insights for making decisions faster.

Other highlights of this product include:

  • Ticker mapping to easily see performance of publicly traded companies over time
  • Granularity such as EAN, brand name, item pricing, product category, basket composition, geography and time of transaction. Anonymized merchant ID is provided in order to compare same-store or like-for-like sales.
  • Historical coverage, with over three years of data available for backtesting

*  Deloitte, Global Powers of Luxury Goods, April 2019
** Statista, 2019

Suburbia launches European consumer transaction data solution for investment community

19 September 2019, Amsterdam – Suburbia, a technology company specializing in alternative data solutions, today launched its first-ever offering that leverages millions of anonymized transactions across Europe to provide predictive insights into consumer goods companies.

A multi-source platform with granular insights into brand performance, it delivers daily signals on over a hundred publicly listed and large private companies. This data product has been built specifically for hedge funds, asset managers and other institutional investors to generate alpha and manage risk.

“Investors have long been tapping into transactional data to anticipate trends and consumer behavior,” said Hamza Khan, CEO, Suburbia. “But we realized most of the existing data out there is generated by a panel of users which could lead to an opt-in bias and less accuracy. In addition, this data is much harder to come by in Europe because it’s such a diverse and fragmented landscape – every country has its preferred payment methods. We believe our unique approach has resulted in the industry’s most actionable dataset.”

Suburbia’s proprietary technology is capable of processing millions of consumer purchases from thousands of hospitality and retail channels across Europe, with a strong focus on Germany and the Benelux. No personal information is ever used or shared in the process. This data is updated on a daily basis so investors can get up-to-date insights for making decisions faster.

Other highlights of this product include:

  • Ticker mapping to easily see performance of publicly traded consumer packaged goods (CPG) companies over time
  • Granularity such as product details, item pricing, basket composition, geography and time of transaction
  • Historical coverage, with over two years of data available for backtesting