XML in Chemical Education
The IUPAC Committee on Printed and Electronic Publication is concerned with good practice in information and data exchanges at all levels and is supporting the eXtensible Markup Language (XML)-based standard in some areas. The committee believes that the following article is a helpful contribution to the subject. The views are the author’s own and are not a part of IUPAC’s strategy.
by Daniel Tofan
In the July 2002 issue of Chemistry International, Jonathan Goodman raised the question of “How well are we using XML in chemistry?” He also stated that “from an academic and educational viewpoint, one could say, unfortunately, not too well right now.” This is absolutely true, and indeed unfortunate, but may change in the very near future.
|“How Well Are We Using XML in Chemistry?” by J. Goodman, reprinted from Jul-Aug 2002 CI, p.7.|
In a conversation I had at the recent American Chemical Society (ACS) meeting with Peter Murray-Rust, the developer of the Chemical Markup Language (CML), he noted that most chemistry journals do not require or even encourage authors to submit chemical data using CML. This is unfortunate, since CML has been specifically designed to facilitate the exchange of chemistry data in a format that is easy to use and understand, yet powerful enough to allow the interchange of such data among applications, web browsers, and text processors. For example, rather than having an “experimental” section in a chemistry paper, that describes the preparation of various compounds and solutions as a narrative, it would be more useful to submit such details in CML, so that the experimental details can then be imported into other programs automatically. Such XML-aware programs would know how to convert the information from CML to a variety of other formats for display or use somewhere else. As another example, crystallographic data that are currently submitted as crystallographic information files (CIFs) should probably be encoded in CML instead, considering the flexibility that the latter offers and its validation capabilities.
|The field of chemical education, seems likely to benefit the most from the existence of a markup language . . .|
The field of chemical education, on the other hand, seems likely to benefit the most from the existence of a markup language designed specifically to exchange assessments, question banks, data on student performance, and so on. With the widespread use of course management systems such as Blackboard or WebCT, there is the need to exchange items and assessments among users at various institutions. The Instructional Management System (IMS)—the global learning consortium dedicated to developing specifications for distributed learning—has developed XML specifications for the interoperability of such items and assessments, among others. While these specifications provide for basic response types such as strings, numbers, multiple choice, simple drag and drop, and more, they offer no support for chemistry types. Anybody who wants to write chemistry questions that can be used in a course management system must limit themselves to those basic types of responses. But what if we want to ask and grade a question such as “Predict the products and balance the net ionic chemical reaction between copper sulfate and ammonium hydroxide?” How do we provide the correct answer, which consists of a balanced chemical reaction that contains states and charges, into the XML format proposed by IMS?
Currently, I can see two ways of doing that. One is to use a multiple choice format and to offer several options that include the correct answer, so the students can choose among them. The other way is to use a string response, and somehow decide on a format that the students can use to enter chemical formulae with states and charges (for example HTML fragments with <sub> and <sup> tags). The challenge would be to enforce that format among users and to provide a parsing algorithm in the implementing software that can figure out whether a submitted response is the correct one. I find none of these approaches even close to satisfactory. Not only would I want students to enter a response on their own rather then pick one from a list of presented possible options, but I would also want them to actually enter a chemical reaction with formulas that contain indexes, states and charges as superscripts and subscripts. I would also like to have a software tool that knows what to look for and how to analyze and grade such a response. So what is the solution, then?
The answer, in my view, is that there should be some XML format that can be used to encode a response that consists, for example, of a complete chemical reaction (or any other kind of chemistry specific information) and to place that XML into the response analysis section of an IMS-compliant item. Thus, by having a defined ontology (set of XML tags) that encode basic chemistry entities and concepts such as reactions, electron configurations, Lewis structures, and so on, chemistry teachers will be able to write questions that contain chemical information and that require responses with chemical content from the students. Authoring tools will then be able to generate this XML when the questions and test items are written to files for export, and other systems will be able to import such items and use them in chemistry activities and tests. While it is not too difficult to generate XML, as Peter Murray-Rust also noted, it is more challenging to read in XML and to make use of it in a real-world course management system, or any other application for that matter.
Another reason why XML is necessary in chemical education becomes apparent when considering the ongoing development of digital libraries. One example is the National Science Digital Library initiative, funded by the National Science Foundation. Such collections have a need to store, among other things, question banks and data that pertain to chemical education. It seems that a text-based format that is easy to understand and use, yet powerful and flexible, is needed for this purpose. XML provides all these characteristics, and in the absence of an existing standard, a tag set needs to be invented to provide the vocabulary that allows chemical education data to be included in digital libraries.
Having stated the reasons why XML is needed in chemical education, the only remaining problem is to actually define the set of XML tags that potential users can agree upon, adopt, and implement. This task seems to be a rather laborious one, and the result is likely to be debated by various groups that can argue over what elements and attributes should or should not be included, and even what their names should be. Community consensus on this subject is absolutely critical. I thought, however, that a step forward can be taken by actually proposing a specification and making it known to the academic community, therefore inviting constructive criticism and support so that useful chemistry markup can eventually be developed.
One other issue to address is interoperability with other chemistry markup languages that are currently available or under development. Examples include ThermoML for thermophysical and thermochemical property data, AnIML, GAML and SpectroML for analytical chemistry data, NDML and UnitML for encoding units of measurement, and expML for experimental data interchange in science and engineering. These are generally industry-strength formats, many of which are being developed by, or in collaboration with, the National Institute for Standards and Technology. There are also XML initiatives from the Journal of Chemical Education, which is developing a digital library for chemical education and needs an XML vocabulary for storing question and test items, as well as other digital libraries such as ASDL for analytical chemistry. Some of the XML elements and/or attributes present in the specifications for these markups may be similar. While from an educational point of view it may be more difficult to see why simple concepts such as balanced chemical reactions would need to be encoded in other markups as well, it is nonetheless a possibility that should be taken into account. Having many ways of representing things like chemical formulas unavoidably leads to the “multiplicity of formats” problem that has been identified by the chemistry XML community. It is therefore important that some consensus exist when developing these various chemistry XML formats, so that standardization is possible. After all, this is what the IUPAC is aiming for.
The specification that I propose is what I call the Chemical Education Markup Language, or ChEdML. Whether this will be a full-blown markup language remains to be seen, but it is a start and I think the name is representative of its purpose. ChEdML is currently intended to be an XML namespace for educational chemistry. By defining a namespace rather than a standalone markup language, ChEdML fragments can be included into larger XML files, such as items and assessments that comply with the IMS specifications. IMS itself encourages the development of extensions for their current response types. It is quite evident that chemistry provides a lot of room for extending the IMS markup to cover many new types of responses that are specific to chemistry. It just needs a mechanism to do this, and ChEdML will provide that mechanism.
|By combining together fragments from various XML namespaces and specifications, truly powerful, structured XML documents can be produced.|
As far as its integration with other XML types is concerned, ChEdML is looking at both the past and the future. It seems natural that encoding of molecular structure or spectral information should be done by using CML elements, as they already exist and have been developed for just this purpose. Mathematical equations, which are always present in educational chemistry material, can be coded using MathML. Units can be coded in UnitML, laboratory experimental data can be submitted in expML and so on. By combining together fragments from various XML namespaces and specifications, truly powerful, structured XML documents can be produced. The development of software tools that can handle this XML should be just a matter of time.
At present, ChEdML is a project under development, and only a few tags have been invented to code concepts such as isotopic symbols, molecular formulas, chemical reactions, electronic configurations, Lewis structures, electrochemical cell notations, and other subjects taught and assessed in general chemistry. ChEdML also has provisions for the formatting of chemical symbolism for display in Web browsers and other specialized tools, and also for the inclusion of numeric parameters and units into quantitative problems. It should also extend to cover other areas besides introductory and general chemistry, but right now it is mostly GenChemML (which can be very useful, although limited in scope).
A Web site is being maintained at <www.chedml.org> to bring awareness to the project and inform the community about the progress being made. Currently, the information available online is rather limited—a few examples are up to show what can be done, and future directions are proposed. It seemed prudent not to develop a full specification before gaining the support of the academic community, especially in the absence of external funding. However, feedback received at the recent ACS meeting has been very positive, which indicates that perhaps ChEdML will be useful. New ideas and real contributions to the markup from interested parties are kindly invited and criticism is also welcome. The Web site provides a way to submit comments and feedback.
It should be possible to make ChEdML a truly useful tool for chemical education. With time, perhaps other sciences will develop their own markup languages. Mathematics already has MathML. Physics could be next, and a general purpose Science Education Markup Language (ScEdML) might be able to incorporate these formats into a larger one. One thing seems certain—the academic community does need XML.
Daniel Tofan <email@example.com> is a postdoctoral associate at the Chemistry Department at SUNY Stony Brook, New York.
Page last modified 18 July 2004.
Copyright © 2003-2004 International Union of Pure and Applied Chemistry.
Questions regarding the website, please contact firstname.lastname@example.org