Tracing the "90% of the World's Data Was Created in the Last Two Years" and "5 Exabytes" Big Data Myths
In pitch decks for cloud storage, data analytics, artificial intelligence, and enterprise software, presenters frequently use mind-boggling statistics about the "data deluge" to establish immediate urgency. Two of the most famous and persistent data-growth statistics are:
- "90% of the world's data has been created in the last two years alone."
- "Every two days we create as much information as we did from the dawn of civilization up until 2003 (5 exabytes)."
While these numbers sound impressive, tracing them back to their origins reveals they are classic "zombie statistics"—frozen-in-time marketing claims built on apples-to-oranges comparisons, outdated snapshots, and highly questionable definitions of "data."
1. The "90% of Data in the Last Two Years" Time Freeze
For nearly 15 years, marketing blogs and startup decks have claimed that "90% of the world's data was created in the last two years." Because the "two years" timeframe never shifts, this statistic has become a permanent, frozen-in-time zombie.
The Origin
The claim gained massive viral traction between 2011 and 2013 through parallel marketing pushes by major tech companies and research institutes:
- IBM's Marketing Campaign (circa 2011-2012): IBM frequently repeated this statistic to promote its "Big Data" and "Watson" initiatives. As recorded in public discussions:
"Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone." — IBM claim cited on Skeptics Stack Exchange (2012)
- The SINTEF Press Release (2013): On May 22, 2013, the Norwegian research institute SINTEF published a widely syndicated press release that cemented the statistic in the tech press:
"A full 90% of all the data in the world has been generated over the last two years. The internet companies are awash with data that can be grouped and utilized." — SINTEF, "Big Data, for better or worse..." on ScienceDaily (2013)
The Zombie Mechanism
At the time of its calculation (the early 2010s), the statistic was a mathematical reflection of a specific transition period: the rapid global proliferation of smartphones, HD video recording, and social media uploads. If total digital data was doubling roughly every 1.2 years, then mathematically, about 90% of all accumulated data at that specific moment would indeed be less than two years old.
However, once published, the statistic was stripped of its context and date. Instead of updating the math or the timeframe, authors simply copied the "last two years" text verbatim year after year. As one observer on Quora noted:
"the article that had the '90%...' stuff was written in 2013. But many articles written in 2016, 2017, 2018, 2019 have the same exact line '... in the last 2 years'. So yeah… Internet." — Nguyen Hoang on Quora
2. Eric Schmidt's "5 Exabytes" Apples-to-Oranges Claim
Another foundation of the big data narrative is the claim that modern society produces vast historical quantities of data in mere days.
The Origin
At the Techonomy conference in August 2010, Google CEO Eric Schmidt famously declared:
"There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days." — Eric Schmidt quote cited on Quora
The Debunking
In February 2011, Robert J. Moore (co-founder of RJMetrics) published an empirical analysis titled "Eric Schmidt's '5 Exabytes' Quote is a Load of Crap". Moore traced the "5 exabytes" figure back to its primary source: a famous 2003 study by Peter Lyman and Hal Varian at UC Berkeley titled "How Much Information?".
Moore identified a massive methodological flaw in Schmidt's comparison:
- UC Berkeley's definition of historical data: Lyman and Varian were strictly measuring stored, unique, non-redundant information (such as print, film, magnetic, and optical media). They specifically excluded transient, unrecorded data.
- Google's definition of modern data: The "every two days" figure Google cited represented all broadcast and transit data (including duplicate backups, automated machine-to-machine logs, unrecorded television broadcasts, and transient internet packets).
- The Apples-to-Oranges Error: By comparing UC Berkeley's highly restricted metric of unique historical stored knowledge with Google's all-encompassing metric of transient digital noise, Schmidt manufactured a sensationalized ratio that made modern data production look artificially massive.
What It Means for Presenters and Researchers
These statistics persist because they serve a powerful rhetorical purpose: they create a sense of overwhelming urgency ("the data deluge") that makes the presenter's solution (cloud storage, AI analytics, database management) seem indispensable.
When evaluating data-growth claims:
- Watch for the "Two-Year" Freeze: If a statistic claims something happened "in the last two years" but does not explicitly cite a recent, dated study (e.g., from 2024 or 2025), it is likely a recycled copy of the 2013 SINTEF or IBM marketing campaigns.
- Distinguish "Transit" from "Stored" Data: Much of the reported "data growth" consists of automated system logs, empty network packets, and redundant backups—not meaningful information that humans actually consume or store.