Cabecera personalizada

El blog de Mikel Niño
Industria 4.0, Big Data Analytics, emprendimiento digital y nuevos modelos de negocio

Historical evolution of key terms related to data analysis

[Accede aquí a la versión en español]

From the definition of key terms around data analysis that are related today to the concept of Big Data, how could we have a clearer idea of their origin, when the relevance of each concept dates back to and how that relevance has evolved along the years? Modern web search tools allow us to visualize historical information that has been extracted automatically from different sources and present it in this interesting example of analysis on those terms related to data analysis itself and Big Data.

If we want to examine the historical trend of the use of these terms, but we also want to go back to a period of time before the emergence of the Web and its massive use in our society, we can use Google Books and its search feature on the frequency of occurrence of terms and expressions along all books that have been digitally processed in this service. This is the chart comparing the frequency of occurrence for "big data", "business intelligence", "data mining", "data science" and "machine learning", between the years 1970 and 2008 (last available year):

Google Books chart comparing the frequency of occurrence of “big data”, “business intelligence”, “data mining”, “data science” y “machine learning” in the historical records of this service.
[Click on the image to enlarge it]

This graph allows us to notice interesting trends. For instance, we see that the concept of Machine Learning is the first historically to obtain a certain relevance along the 80s, when it begins to consolidate as a field of study in its own right, apart from other research lines in artificial intelligence. Adding onto that, although the concepts of Business Intelligence of Data Mining begin to "take off" together in the beginning of the 90s (right before the definition of both terms had consolidated), the progression of Data Mining is tremendously steeper during the second half of the 90s when its adoption in business contexts starts to spread among some pioneering sectors in this area (financial companies, banks and insurance firms). We can also observe that the idea of Big Data barely appears in books published in this period of time, when the first tools that would later popularize the concept had just begun to be developed. We can notice, too, how the concept of Data Science had not been widely recognized yet as an evolution of the technical areas of Statistics.

If we focus on web searches and we compare how often these terms have been searched on Google since 2004 until today, we can identify a "second part" of this evolution showing very different trends from those seen before:

Google Trends chart comparing the volume of Google searches for “big data”, “business intelligence”, “data mining”, “data science” and “machine learning”.
[Click on the image to enlarge it]

During the first years of this interval of time it can be clearly noticed that the concept of Big Data had not obtained yet any popularity, something that changes drastically in 2011. We can easily relate this milestone with the publication that same year of the report by McKinsey Global Institute titled "Big Data: The next frontier for innovation, competition and productivity", which puts this concept of the radar of every analysis on global trends in technology. We can also see an increasing interest in the concept of Data Science since 2012, just when an article in Harvard Business Review paraphrases the words by Hal Varian in 2009 about the future of statisticians, introducing the profile of "data scientist" as "the sexiest job of the 21st century".

By contrast, the old preponderance of the concept of Data Mining drops significantly during this period and it currently has a level of relevance very similar to Business Intelligence, although both stay in the shadow of Big Data. This can also be related to the usual confusion of miscalling "big data" to every data analysis, regardless of the analytics approach or the volume or complexity on those data. The concept that clearly increments its popularity during these last years is Machine Learning, which fits perfectly with the "second youth" that machine learning techniques (which, as we have seen, date back to decades ago) have been experimenting since the rise of Big Data.

No hay comentarios:

Publicar un comentario