CoronaCentral enables researchers to analyse vast coronavirus literature

·2-min read
Representative image
Representative image

Washington DC [US], May 21 (ANI): Researchers have created a resource named CoronaCentral that uses machine learning to process and categorise Covid-19 research for the benefit of the scientific community.

In an article titled "Analyzing the vast coronavirus literature with CoronaCentral," Jake Lever and Russ B. Altman argue that the SARS-CoV-2 pandemic has caused a surge in research exploring all aspects of the virus and its effects on human health.

The study published in Proceedings of the National Academy of Sciences of the United States of America (PNAS), shows the overwhelming publication rate means that researchers are unable to keep abreast of the literature.

According to the researchers, to ameliorate this, the authors have presented the CoronaCentral resource that uses machine learning to process the research literature on SARS-CoV-2 together with SARS-CoV and MERS-CoV.

"We categorize the literature into useful topics and article types and enable analysis of the contents, pace, and emphasis of research during the crisis with integration of Altmetric data. These topics include therapeutics, disease forecasting, as well as growing areas such as "long COVID" and studies of inequality. This resource, available at https://coronacentral.ai, is updated daily," the researchers said.

The COVID-19 pandemic has led to the greatest surge in biomedical research on a single topic in documented history. This research is valuable both to current and future researchers as they examine the long-term effects of the virus on different aspects of society.

Unfortunately, the vast scale of the literature makes it challenging to navigate. Machine-learning systems that can automatically identify topics and article types of papers would greatly benefit researchers who are searching for relevant coronavirus research.

"Our approach improves on the existing methods, including LitCovid, by covering a larger set of papers with the inclusion of PubMed and CORD-19 along with SARS/MERS papers, a larger and more specific set of topics, identification of article types (e.g., Reviews), integration of Altmetric esteem data, and indexing by a wide set of biomedical terms (e.g., drugs, viral lineages, and so forth). All data are available for download and the full codebase is available on GitHub," the authors of the article said.

To provide more detailed and higher-quality topics, the researchers pursue a supervised learning approach and have annotated over 3,200 articles with a set of 32 topics and 8 article types. Individual papers may be tagged with multiple topics and typically a single article type.

Several other topics and article types are identified using simple rule-based methods, including clinical trials and retractions.

As of March 3, 2021, CoronaCentral covers 128,921 papers. (ANI)