What is Big Data? It is symptomatic of the question that the first place where I looked for an answer was Wikipedia, which defines it as “data sets that are so large or complex that traditional data processing software is inadequate to deal with them.”  Handling massive amounts of information poses a number of technological challenges, including the search, analysis, transfer, and visualization of data sets whose size is simply hard to comprehend. For example, the Chemical Abstracts Service (CAS) SciFinder database, which contains detailed records for more than 127 million organic and inorganic substances, is only a few terabytes in size (tera = 1012). Compare that with the millions of transactions processed each day by retailers such as Amazon or Walmart, which are imported into databases containing several petabytes of data (peta = 1015). It is perhaps not surprising that the appraisal of such data sets is increasingly being used as a predictive tool to monitor consumer trends, optimize production and distribution costs, protect computer systems (“cybersecurity”), and improve many other facets of business and manufacturing.
Big data in fields of science as diverse as particle physics, meteorology, and genomics rely on even bigger data sets, and applications in the chemical industry are becoming more prevalent as the cost of computing power decreases. The stamps that illustrate this note represent some of the areas where Big Data analytics already play an important role in research and development. In chemistry, the question is not really a matter of if but when will its practitioners fully engage with and benefit from the tools of Big Data.  Until that happens, we all have the responsibility to introduce some basic concepts of big data in chemical education and, why not, have Big Dreams.
Written by Daniel Rabinovich <firstname.lastname@example.org>.
2. Two recent papers that describe the mounting role of big data in chemistry include: (1) Pence, H. E. and Williams, A. J. J. Chem. Educ. 93:504-508, 2016. and (2) Tetko, I. V.; Engkvist, O.; Koch, U.; Reymond, J.-L.; Chen, H. Mol. Inf. 35:615-621, 2016.Search in Google Scholar
©2017 by Walter de Gruyter Berlin/Boston