Managing and Documenting Legacy Scientific Workflows

Ruben Acuña 1 , Jacques Chomilier 2 , 3 ,  and Zoé Lacroix 4 , 5
  • 1 Scientific Data Management Laboratory, Arizona State University, Tempe, United States of America
  • 2 Institut de Minéralogie, de Physique des Milieux Condensés et de Cosmochimie (IMPMC), Centre National de la Recherche Scientifique (CNRS), Institut de Recherche pour le Développement (IRD), Muséum National d’Histoire Naturelle (MNHN), Université Pierre et Marie Curie, Sorbonne Universités, 4 Place Jussieu, Paris, France
  • 3 Ressource Parisienne de Bioinformatique Structurale (RPBS), Université Paris Diderot, 35 Rue Hélène Brion, Paris, France
  • 4 Scientific Data Management Laboratory, Arizona State University, Tempe, France
  • 5 Institut de Minéralogie, de Physique des Milieux Condensés et de Cosmochimie (IMPMC), Centre National de la Recherche Scientifique (CNRS), Institut de Recherche pour le Développement (IRD), Muséum National d’Histoire Naturelle (MNHN), Université Pierre et Marie Curie, Sorbonne Universités, 4 Place Jussieu, Paris, France

Summary

Scientific legacy workflows are often developed over many years, poorly documented and implemented with scripting languages. In the context of our cross-disciplinary projects we face the problem of maintaining such scientific workflows. This paper presents the Workflow Instrumentation for Structure Extraction (WISE) method used to process several ad-hoc legacy workflows written in Python and automatically produce their workflow structural skeleton. Unlike many existing methods, WISE does not assume input workflows to be preprocessed in a known workflow formalism. It is also able to identify and analyze calls to external tools. We present the method and report its results on several scientific workflows.

If the inline PDF is not rendering correctly, you can download the PDF file here.

OPEN ACCESS

Journal + Issues

The Journal of Integrative Bioinformatics is an international journal dedicated to methods and tools of computer science and electronic infrastructure applied to biotechnology. The journal covers mainly but not exclusively data/method integration, modeling, simulation and visualization in combination with applications of theoretical/computational tools and any other approach supporting an integrative view of complex biological systems.

Search