Bias in peer review: Who is given a voice in science? — Penn Science Policy and Diplomacy Group

By Clara Raithel

This is the third post in our series about how science is communicated and the consequences thereof.

Whether it is an application for research funding, or a manuscript sent to a scholarly journal for publication, writing is an essential aspect of scientific work. The majority of the resulting output is evaluated by other scientists, in a process referred to as peer review. The ultimate purpose of this evaluation is to ensure the originality, importance, and quality of the academic work before it is executed, or made publicly available. In other words, grants are awarded, and manuscripts are published only when scientific standards are met. As such, peer review represents a critical gatekeeping moment that can define the outcome of entire scientific careers - simply because so much in science depends on the amount of financial resources available, and the number of papers published. However, its consequences reach far beyond an individual’s career: peer review impacts the knowledge produced and shared with the rest of the world – thereby potentially affecting the lives of millions of people.

Having been in place since 1731, the peer review process is considered a staple in the world of science. Yet, despite its continued use throughout the past three centuries, many researchers have expressed concerns regarding the validity, reliability and fairness of the peer review process. For instance, applicants often perceive the grant review process as unfair and biased favorably toward ‘old-boy networks’ [1,2]. Likewise, authors frequently describe manuscript reviews as subjective, vastly inconsistent between reviewers, and susceptible to nepotism [3,4]. These concerns raise questions regarding the objectivity of the peer review process, and challenge our understanding of peer review as a functional democratic system to ensure scientific excellence. How does subjectivity or, worse, systematic bias come into play?

The peer review process commonly starts with the submission of a manuscript to an academic journal. Once submitted, the editor of the respective journal assesses the paper, often within only a few minutes, and makes the first decision, namely whether the paper is sent out for review, or is instead desk-rejected (the latter affecting between 40-75% of the submitted manuscripts, depending on the journal). Assuming the manuscript avoids a desk rejection, the editor then gets to decide who will serve as reviewers, or peers. The selection of reviewers is anything but trivial, as it is unclear who can be deemed a suitable critic: someone doing very similar research, equipping them with the necessary expertise to thoroughly evaluate the quality of the work? Someone in the same area of research to assess scientific innovation? Someone in a different area of research to assess accessibility to non-experts? This choice – often made by a single individual – may vastly impact the outcome of the review. In line with this argument, empirical studies find that the inter-rater reliability between reviewers is low and “barely beyond chance”, supporting the often-made claim that peer review resembles a lottery.

One can argue that inconsistency in the evaluation of different reviewers is not necessarily a bad thing – after all, different scientists have different academic backgrounds and may prioritize different aspects of a paper. As long as the reviewers evaluate these aspects objectively, peer review should still be fair. However, things become problematic when bias is at play. Referring back to my example above, imagine the selection of a reviewer who does very similar research to the one being evaluated – the reviewer may secretly identify as a competitor, and give the manuscript a worse evaluation, whether implicitly or explicitly, than it objectively deserves in order to avoid competition. Of note, such bias can be observed at a much larger scale, not only affecting individual scientists, but entire groups, such as female scientists, and/or scientists of color.

How do we assess systematic bias in the peer review system? To learn more about this, I spoke with Dr. Cassidy Sugimoto, a professor at the Georgia Institute of Technology studying biases and inequities in knowledge production and dissemination. Dr. Sugimoto became interested in this research sort of by accident, when stumbling over the vast disparities in research output between women and men and recognizing the urgency of the underlying problem. For example, analyzing all articles available on Web of Science databases published between 2008 and 2012, she and her colleagues reported that about 70% of the authors on publications were male while women accounted for fewer than 30% of authorships. Seeing these extreme disparities, it is important to understand that the observation of disparities does not necessarily prove the existence of bias as the underlying cause, though it is often a plausible explanation. A critical aspect when facing this dilemma is the consideration of base rates: e.g., we would not expect the ratio of male-to-female authorship to be 50:50 unless the population of scientists consists of 50% women and 50% men (which we know not to be true for many fields, such as physics, computer science or math). However, according to Dr. Sugimoto, even when accounting for differences in base rates, gender disparities in authorship persist, suggesting that there is bias and that it needs addressing.

Now, are the observed gender disparities caused by a bias against women exclusively at the stage of the scientific peer review? Almost certainly not. Dr. Sugimoto explained that various factors come into play. For instance, women are not placed on scientific teams early in their career, and are not given leadership positions to the same degree as men. Likewise, women have less time available to do research due to a higher teaching load and a greater share of domestic responsibilities, an effect that has been exacerbated by the COVID-19 pandemic. At the same time, women are less likely to receive credit for their work and more hesitant to submit to high-impact journals (as is suggested by unpublished data from Dr. Sugimoto’s group). So there are a lot of differences between male and female scientists even before submission to a journal takes place. However, even when men and women submit at the same rate, there are differential outcomes for women in peer review. This example illustrates that while uncovering and reducing bias in peer review may not be the magic solution to gender disparities in science, it is one step toward higher equity for women in science. After all, gatekeeping exerts a large influence over the dissemination of science. Women can do all the science they want – if the resulting knowledge is not communicated, it will not positively impact their career and it will not shape our understanding of the world.

Of course, gender is just one variable that may bias reviewers in their evaluation. According to Dr. Sugimoto, disparities in peer review outcome can also be observed for other factors, including (but not limited to) race, ethnicity, country of origin, or prestige. By and large, prestige seems to create the biggest bias in peer review. Scientists with a high reputation and/or working at prestigious universities or institutions (think, Ivy League schools or Max Planck Institutes) are evaluated more positively based on past success, irrespective of their present performance, resulting in a greater number of publications and more funding – an effect also known as the Matthew effect in science [5,6]. In a hallmark study, 12 articles written by prestigious individuals from highly reputable institutions and previously accepted for publication were re-submitted to journals, but this time with a change in the authors’ names and affiliations as to reduce prestige. Three articles were identified as re-submissions; of the nine remaining articles, 89% were rejected due to methodological concerns raised by the reviewers. This finding powerfully illustrates the impact of prestige in peer review, and demonstrates (yet again) that peer review outcomes do not necessarily vary as a function of quality, or scientific rigor.

Over the years, the scientific community has made several attempts to improve the peer review process. For instance, in contrast to the single-blind review in which the reviewers know the authors’ identities but not vice versa, the double-blind review promises anonymity to both reviewers and authors, thereby potentially offering a way to avoid bias based on author characteristics, such as affiliation, gender or race. However, this process has had only mixed success [7,8], and has proven difficult to implement in practice: despite various attempts to grant anonymity to the authors, their identity is often correctly guessed by reviewers, either due to the familiarity of reviewers with the authors’ research, or the insufficient anonymization of the manuscript itself. Another alternative model is open peer review [9,10,11]. In this format, the authors know who they are receiving feedback from, and the reviewers know who produced the work. After the publication decision, the reviewers’ names and comments are published alongside the manuscript on the journal’s website, thereby increasing transparency and creating an interesting way for readers to understand the complexity of science. However, this model may create extreme vulnerabilities for junior scientists: for example, postdoctoral researchers who are on the job market may not want to contest a leading scholar with a powerful network capable of destroying one’s career. In this scenario, transparency may in fact threaten the robustness of peer review.

Although the scientific community has not yet reached a consensus on what peer review model is superior, some improvements can be made with relatively little effort. This includes the diversification of editorial boards as well as the active inclusion of the junior (and more diverse) workforce into the peer review process. Such changes could meaningfully increase the probability with which authors belonging to marginalized groups get published. Of course, this is only a first step in establishing a fair and transparent peer review. Establishing a fair procedure is imperative, not only from a social justice perspective, but because of the consequences it has for the production and dissemination of knowledge. Creating a system in which academic work is evaluated fairly and robustly changes who is producing and communicating science – and that can change the world.