The evolution and impact of scientific preprints in academic communication

By Amanda N. Weiss

This is the fourth post in our series about how science is communicated and the consequences thereof.

Peer-reviewed manuscripts often serve as the primary way to disseminate information within academia. People trust that the information they’re reading is reliable and rigorous, as other experts in the field have already vetted the paper to make sure that it’s of high quality. However, peer review can be a slow process, preventing valuable information from reaching audiences in a timely manner. This can be a barrier in fields that are rapidly advancing, and is especially problematic when information is needed as quickly as possible, as is the case during public health emergencies. 

In order to drive science forward faster, some STEM fields have popularized the use of preprints, wherein researchers post their manuscripts to publicly-accessible repositories at the same time as or before submitting to journals. Although some fields have long had preprint distribution as a core method of communication, others have historically opposed preprints. While broader acceptance has grown over time, there is still debate within the scientific community regarding whether the benefits of preprints outweigh the drawbacks. 

A brief history of STEM preprints

The idea of sharing science information prior to publication is not a new one. As early as 1921, the American Chemical Society’s Division of Petroleum Chemistry preprinted conference papers so that meetings could be spent on discussing the information therewithin, rather than first reading it then. Later on, the need for secrecy surrounding scientific advancements during World War II prevented free flow of information. However, this provided an impetus for more accessible information sharing after the war had concluded. In 1949, the Medical Sciences Information Exchange was established to provide information about ongoing research projects to administrators to prevent multiple groups from pursuing the same research questions. Though this was not a public repository of preprints akin to those around today, it faced similar resistance on the basis of providing recognition of unreviewed work.

The 1950s saw a boom of preprint sharing in the physical sciences. Physicists often printed out and mailed letters, reports, and preprint manuscripts to each other to share their findings prior to the many-months-long process of formal review and publication. The European Organization for Nuclear Research (CERN) opened the first preprint repository in the late 1950s, and this persisted via physical filing cabinets until the emergence of web-based repositories decades later. Likewise, in 1962, the Stanford Linear Accelerator Center (SLAC) National Accelerator Laboratory Library started cataloging preprints, including boxes of CERN reports. In 1969, the Library began distributing lists of its preprints throughout the physics community, and by 1975, the SLAC Library was receiving, on average, 70 preprints each week. 

Meanwhile, the life sciences saw their own preliminary attempt at a preprint system arise in the 1960s.The National Institutes of Health (NIH) established Information Exchange Groups (IEGs), which started as informal networks for spreading preprints among biologists. These groups consisted of researchers in a narrow research area who would share informal communications, including unpublished data, to be printed and mailed out to all other members. The IEG program, which started as a single group and then expanded to several, lasted for six years before being canceled in 1967, with the reasoning that it had completed its goal of finding whether the IEG concept was feasible, and that the NIH no longer had the resources to support it, given the expansion of the project over time. A study following completion of the IEGs found that about 90% of the communications exchanged in the first IEG group were preprinted papers that had later been formally published. Interestingly the study also mentions that while the concern of IEGs posing a financial threat to academic journals was raised, many IEG members who were journal editors did not agree with this sentiment. Debate about the merits and detriments of the IEGs ensued across the pages of top science journals, and the chairman of the first IEG, David E. Green, stated his belief that the fear of journals losing status was the real culprit behind the cancellation of the IEGs [1]. Likewise, in 1969, Franz Ingelfinger, then editor-in-chief of the New England Journal of Medicine, established a policy that the journal would not accept manuscripts that had been previously published elsewhere to ensure that their articles would be original and newsworthy. This policy was adopted by other journals, and despite official policies changing over time, many scientists perceived the rule to still be in effect decades later.

When the internet came along, so did the potential for preprints to spread more rapidly and to a wider audience, including non-scientists. In 1991, the first online preprint repository, arXiv, was created. Though initially developed to serve the physics community, arXiv has expanded into hosting papers in other quantitative and technical fields, including physics, mathematics, computer science, engineering, quantitative biology, economics, etc. Today, arXiv hosts over 2 million submissions in total. Although many people were initially skeptical about a rapid paper-to-web transition, arXiv paved the way for other online centralized article resources, including PubMedCentral. Several attempts at a biology preprint server to complement arXiv were made in the late 1990s and early 2000s, but none became as widely successful as BioRxiv, which launched in 2013. A study examining all of the preprints available on BioRxiv as of the end of November 2018 (37,648 preprints in total) found that the submission rate and download rate had both continued to increase over time. The study also found that 67% of preprints uploaded between 2013-2016 had gone on to be published in academic journals by the end of November 2018. In addition to arXiv and BioRxiv, several other preprint servers exist today, hosting submissions from a wide array of STEM fields.

Preprints in public health emergencies

As acceptance of preprints within the biological sciences grows, the potential utility for preprints as rapid communication devices in public health emergencies has come into focus. When new diseases develop and spread, sharing critical findings as quickly as possible becomes a priority to aid in combating the culprit pathogens. 

When comparing the Zika virus outbreak to Ebola, a higher number of preprints were released for Zika, (quantified in the frame of November 2015-August 2017 for Zika vs. May 2014 to January 2016 for Ebola) potentially reflecting the increasing trend of preprint use in biological and health sciences. Furthermore, the same study found that the number of preprints containing original data increased from Ebola to Zika, and many of the preprints did progress to formal publication. However, the total number of preprints was still far lower than the total number of Ebola- or Zika-related publications.

More recently, the use of preprints for sharing findings about COVID-19 has gained attention, both positive and negative. Along with COVID-19 research funding and publications being fast-tracked as the pandemic spread, many preprints were uploaded to repositories for rapid information dissemination and potential feedback from readers. However, not all of the information shared through preprints was reliable. Notably, a preprint study posted in April 2020 that promoted the use of ivermectin to treat and prevent COVID-19 infection had data problems. It utilized data including 3 ivermectin-treated critically ill patients from Africa, but at that time only 2 cases of the disease had been confirmed on the continent. A later-updated version of the preprint included a graph that did not match the text and included a higher-than-typical percentage of untreated patients requiring mechanical ventilation. Despite these flaws, the study contributed to the adoption of ivermectin recommendation as COVID-19 treatment in Peru and Bolivia.

The database that had been used in the aforementioned ivermectin study (the Surgisphere Corporation database) was also used in two other peer-reviewed, but later criticized, COVID-19 studies published in prominent biomedical journals (both of these studies were later retracted). One of these studies, claiming that COVID-19 patients treated with hydroxychloroquine had a higher incidence of cardiac complications and death than untreated patients, had several issues. For example, the report did not list the countries or hospitals contributing data to the database, and the data indicated that an unexpectedly large proportion of reported COVID-19 cases and deaths in Africa occurred within the hospitals that did contribute to the database. Thus, peer review is not infallible, and in some cases peer-reviewed publications are susceptible to the same faults as preprints.

What do early career scientists think about preprints?

In order to get some additional input on the values and detriments of STEM preprints, I sent out a survey to PSPDG’s email listserv, social media accounts, and Penn’s Biomedical Graduate Studies Slack channel. The survey was left open for 3 weeks, collecting a total of 24 responses. The majority (87.5%) of responses were from graduate students, along with 1 response from a post baccalaureate scholar, 1 from a postdoctoral researcher and 1 from a climate fellow at a scientific publisher. While 54.2% of respondents have not posted manuscripts to preprint repositories, 91.7% do read preprints. The most commonly read repository from the respondents was BioRxiv, reflective of the fact that 83.3% of respondents work in the life sciences. 

Interestingly, although 66.7% of respondents believe that preprints are less reliable than published manuscripts, the average overall attitude towards preprints was 4.17 on a scale of 1-5 (1 being strongly negative and 5 being strongly positive). Some of the benefits of preprints that respondents most commonly cited include faster dissemination of research, greater accessibility, and being able to keep up with emerging progress in science. However, many respondents were concerned with the lack of peer review, especially regarding how it lowers reliability. Respondents also expressed concern that flawed preprints could end up misleading non-scientists.

The evolving relationship between preprints and formal publications

With the continued growth in popularity of preprints in various fields of science, we may see dramatic impacts on the norms of publishing even in formal academic journals. For instance, in 2021, the biology journal eLife adopted a policy that only papers that had been posted as preprints would be reviewed for publication. In the period between announcing and implementing this change, they collected feedback from authors who chose to not to post their manuscripts on a preprint server. Some of the most common concerns of these authors included fear of competitors scooping their work (i.e. another group publishing a paper on the same research question before they can publish), fear of reduced opportunities to publish in top journals, lack of official recognition of preprints in assessment processes (e.g. for funding), concerns about a lack of peer-review quality control, fear of negative impact on public perception of science, issues regarding publishing permissions, and intellectual property concerns. Yet, despite such concerns, the authors who chose to not post preprints were in the minority, as 86% of submissions from March 17 to May 31, 2021 were posted as preprints either by the authors directly or by eLife on the authors’ behalf.

Other journals are encouraging authors to post their papers as preprints as well. For example, PLOS journals offer to post submitted manuscripts to bioRxiv, medRxiv, or EarthArXiv on the authors’ behalf or allow authors to directly forward papers already on bioRxiv or medRxiv to PLOS submission. The direct preprint repository-to-journal option is available for many other journals aside from PLOS, and from various repositories including bioRxiv, medRxiv, and ChemRxiv. Furthermore, if scientists want to check out preprint and peer review policies of various journals when deciding where to submit, databases such as Sherpa Romeo and Transpose provide access to relevant information.

As the demand for rapid and freely-accessible science information continues to grow, preprints may play a critical role in shaping the landscape of science publication, and communication at large. Recent decades have seen the popularity of preprints spread to STEM fields where they were previously quashed, and emergencies including viral pandemics have brought both their benefits and drawbacks into the public eye. It remains to be seen whether preprints will have a net positive or negative impact, especially if the general public is taught to understand science as an ongoing process rather than strictly a set of facts. However, one thing is clear already--preprints do critically shape the process by which science is disseminated and consumed by scientists and laypeople alike.

 

Additional references

[1] Heenan and Weeks, 1971, “Informal Communication Among Scientists: A Study of the Information Exchange Group Program. Final Report. Part 1”, U.S. Air Force Office of Scientific Research Contract #F44620-69-C-0087