Skip to content
Publicly Available Published by De Gruyter May 24, 2017

Research Data, Big Data, and Chemistry

Richard Hartshorn
From the journal Chemistry International


The IUPAC centenary in 2019 is fast approaching, and this will naturally lead people to look back at the significant achievements of the organisation and its dedicated volunteers over the past one hundred years. Equally important, however, will be the need to look forward to the roles for IUPAC in its second century. This special issue of Chemistry International (CI) could well feature in that assessment, as technology in the digital age, and particularly the data that technology produces, will clearly be an essential tool for the future of chemistry as a discipline.

The IUPAC vision, as espoused in our new strategic plan, is to be an indispensable resource for chemistry through the development of tools for the application and communication of chemical knowledge. In this issue of CI, you will find examples of the ways that data analysis can assist chemists and lead to the evolution of new chemical knowledge, and also of the ways that the effective utilization of data can assist in the communication of that knowledge.

Throughout its nearly-completed first century, IUPAC has been recognised particularly for its contributions to nomenclature, terminology, and the symbols of chemistry; for its standardisation of chemical methods; and for its critical evaluation of data and the development of standards for data exchange. The colour books and the curation of the periodic table, along with the atomic weight data within it, are particularly well-known, widely-used, and appreciated by students and researchers alike—even if they may sometimes appear to be a necessary evil. The periodic table and atomic weight data will always be essential to the discipline, and some of the uses to which it has been put are both fascinating and educational. [1] By contrast, many people have commented on the reduced importance of conventional nomenclature (and by implication, the colour books), as the quality of structure drawings and the ease with which such drawings can be incorporated into documents, websites, and other media has improved. Indeed, the rise of the graphical representation of molecules in documents has created challenges for database manipulation and searchability, and it is within this context that the IUPAC International Chemical Identifier (InChI) was invented, implemented, and developed. [2] The InChI identifier is now globally embraced and is being used in a wide variety of applications. In fact, in this issue of CI you will find InChI mentioned numerous times under a variety of topics. This issue of CI will also address a wide-range of issues in data management and data usage across the entire discipline.

From a personal perspective, my involvement with IUPAC mirrors, at least in a small way, the evolution of IUPAC activity over recent times. I began as part of the team producing a new version of the "Red Book", Nomenclature of Inorganic Chemistry–IUPAC Recommendations 2005 [3] and then took on leadership roles in the Division of Chemical Nomenclature and Structure Representation (Division VIII). In those roles I was involved in the development of standards for graphical representation, [4, 5] which collectively were guides to drawing chemical structure diagrams that are as unambiguous and informative as possible. I also began to learn more about InChI, particularly about its use in database management/merging. It gradually became clear to me that there was potential for significant application of InChI’s beyond databases, and I have become involved in InChI development, at least in a small way, through projects on the development of InChI QR codes [6] and InChI for mixtures. [7]

Now, in my role as IUPAC Secretary General, one of my major responsibilities is to help identify and encourage the development of new IUPAC activities and projects, particularly those that have strategic importance: those that will shape future IUPAC activities and enhance IUPAC’s relevance in its second century. One of the key steps in doing this is to collaborate with other organisations and groups that have similar interests. I have been very pleased to see the development of collaborations between the IUPAC Committee on Publications and Cheminformatics Data Standards (CPCDS) and the Chemistry Interest Group of the Research Data Alliance (RDA) and those individuals and organisations who are involved with it.

This special issue of CI describes many of the recent activities that I believe will have future significance, given the likely importance of “Big Data,” the potential of data mining, and the benefits that will derive from being able to properly search, access, and mine all of the research data that scientists around the globe are busily accumulating.


1. www.isotopesmatter.comSearch in Google Scholar

2. www.inchi-trust.orgSearch in Google Scholar

3. N. G. Connelly, T. Damhus, R. M. Hartshorn, A. T. Hutton, Nomenclature of Inorganic Chemistry, Royal Society of Chemistry, ISBN 0-85404-438-8, 2005.Search in Google Scholar

4. J. Brecher, K. N. Degtyarenko, H, Gottlieb, R. M. Hartshorn, G. P. Moss, P. Murray-Rust, J. Nyitrai, W. Powell, A. Smith, S. Stein, K. Taylor, W. Town, A. Williams, A. Yerin, “Graphical Representation of Stereochemical Configuration”, Pure Appl. Chem. 78(10):1897-1970, 2006. in Google Scholar

5. J. Brecher, K. N. Degtyarenko, H, Gottlieb, R. M. Hartshorn, K.-H. Hellwich, J. Kahovec, G. P. Moss, A. McNaught, J. Nyitrai, W. Powell, A. Smith, K. Taylor, W. Town, A. Williams, A. Yerin, “Graphical Representation Standards for Chemical Structure Diagrams”, Pure Appl. Chem. 80(2):277-410, 2008. in Google Scholar

6. in Google Scholar

7. in Google Scholar

Online erschienen: 2017-5-24
Erschienen im Druck: 2017-7-26

©2017 by Walter de Gruyter Berlin/Boston