Word-processing software has saved many a term paper, resume and work presentation from spelling and grammatical errors. What if a similar tool existed to catch the many different kinds of errors that, surprisingly, plague top-level scientific research papers?
That’s the dream of OMRF scientist Jonathan Wren, Ph.D. And with new software he’s developed, that hope could soon become a reality.
At OMRF, Wren specializes in bioinformatics, a scientific field dedicated to developing computer methods and software tools that help researchers understand biological data. Specifically, he uses sophisticated algorithms to find patterns within massive databases.
In new research published in the journal Bioinformatics, Wren has shown how a computer program he’s developed can use algorithms to sniff out a wide range of mistakes that commonly appear in scientific studies: statistical and mathematical errors, misspelled chemical names, broken web links and more.
“Mistakes happen. For example, in terms of web links, somewhere around 11 percent are invalid from the day they are published because of a spelling error or minor typo in the web address,” said Wren.
Math errors also pop up with regularity. A basic example, Wren said, would be “if someone writes ‘3-of-10 patients’ followed by ’40 percent’ in parentheses; we know one of those two numbers is wrong.”
More complicated mistakes can skew the results of things like clinical trials, which are used to test the efficacy and safety of new drugs. For example, the statistical significance of how effective a drug is versus a disease is often distilled into a single number called a p-value, and Wren’s study found thousands of discrepancies between the reported p-values and their underlying numbers. Errors in this realm could spell the difference between a drug reaching the market or being rejected by the Food and Drug Administration.
Wren’s program is designed to catch several types of technical mistakes—and is adding more. In an initial test, Wren found more than 27,000 statistical errors in the pool of published scientific papers he scanned.
Disturbingly, he discovered errors in about five percent of all published statistics and computations in scientific literature. “That number does not have to be that high and we are creating a way to cut it down significantly,” said Wren.
Wren hopes to develop the program into a web service that analyzes papers before publication, helping scientists quickly find and fix their errors.
“There are 1.3 million scientific papers published every year and this program can do things reviewers normally don’t do, like double-check calculations,” he said. “If we can create a service that can scan for these easy-to-miss errors, then we can stop them from ever appearing in published literature.”
Wren’s research was supported by grant No. ACI1345426 from the National Science Foundation. OMRF scientist Constantin Georgescu, Ph.D., also contributed to the research.