22 Feb 2012

The case for publishing computer source code for scientific research papers

Collecting and analysing data for most scientific communication requires computer software. Rather than releasing the computer code, it is often reported in natural language as part of a research paper. A new study published in Nature, states that only the full release of source code will allow for reproduction of the central research finding.

Contributing author Darrel Ince, Professor of Computing at The Open University, said researchers should declare to what degree the source code associated with the paper is accessible. He also called for funders to provide metadata repositories describing both programs and data produced by researchers.

He said: “The data accompanying published scientific papers is normally available on request, but journal policies differ in terms of their requirements for the release of source code, ranging from publishing full source to a description of the computational algorithms used. Releasing the full code does not guarantee reproducibility of results, but the ambiguities of natural language descriptions and the vagaries of computer hardware and software will increase the chance of failure in reproducing results.”

Barriers to releasing the code include a shortage of tools that package up code and data in research articles; lack of central repositories for program code; and low awareness of the computational problems with scientific code.

“A good way forward would be for journals to adopt a standard for declaring the degree of source code accessibility associated with a scientific paper, whether it is full, partial, marginal or no code provided,” Professor Ince continued.

Some of the stumbling blocks can easily be resolved. If there is evidence of potential commercial use, journals may state no code until the code goes into the public domain or is released under free licence. While researchers may not have access to some software packages, a partial source code designation would be appropriate. Open source communities have provided the solution to releasing and storing code, and funding agencies should be encouraged to adopt similar solutions.

The team argued that journal and funding body structures relating to code implementation need to be updated. By adopting a standard for disclosing source code, the reproducibility of findings could increase. It would also have the effect of minimising errors due to ambiguity, numerical implementation or machine architecture.

Editor’s Notes
The case for open computer programs is published in Nature, volume 482, 23 February 2012.

The research team consists of Professor Darrel C. Ince, The Open University; Leslie Hatton, Kingston University; and John Graham-Cumming, Independent Computer Scientist.

back to All News stories

back to previous page

back to top