Jump to navigation

Home

Cochrane Training

Chapter 7: considering bias and conflicts of interest among the included studies.

Isabelle Boutron, Matthew J Page, Julian PT Higgins, Douglas G Altman, Andreas Lundh, Asbjørn Hróbjartsson; on behalf of the Cochrane Bias Methods Group

Key Points:

  • Review authors should seek to minimize bias. We draw a distinction between two places in which bias should be considered. The first is in the results of the individual studies included in a systematic review. The second is in the result of the meta-analysis (or other synthesis) of findings from the included studies.
  • Problems with the design and execution of individual studies of healthcare interventions raise questions about the internal validity of their findings; empirical evidence provides support for this concern.
  • An assessment of the internal validity of studies included in a Cochrane Review should emphasize the risk of bias in their results, that is, the risk that they will over-estimate or under-estimate the true intervention effect.
  • Results of meta-analyses (or other syntheses) across studies may additionally be affected by bias due to the absence of results from studies that should have been included in the synthesis.
  • Review authors should consider source of funding and conflicts of interest of authors of the study, which may inform the exploration of directness and heterogeneity of study results, assessment of risk of bias within studies, and assessment of risk of bias in syntheses owing to missing results.

Cite this chapter as: Boutron I, Page MJ, Higgins JPT, Altman DG, Lundh A, Hróbjartsson A. Chapter 7: Considering bias and conflicts of interest among the included studies. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

7.1 Introduction

Cochrane Reviews seek to minimize bias. We define bias as a systematic error , or deviation from the truth, in results. Biases can lead to under-estimation or over-estimation of the true intervention effect and can vary in magnitude: some are small (and trivial compared with the observed effect) and some are substantial (so that an apparent finding may be due entirely to bias). A source of bias may even vary in direction across studies. For example, bias due to a particular design flaw such as lack of allocation sequence concealment may lead to under-estimation of an effect in one study but over-estimation in another (Jüni et al 2001).

Bias can arise because of the actions of primary study investigators or because of the actions of review authors, or may be unavoidable due to constraints on how research can be undertaken in practice. Actions of authors can, in turn, be influenced by conflicts of interest. In this chapter we introduce issues of bias in the context of a Cochrane Review, covering both biases in the results of included studies and biases in the results of a synthesis. We introduce the general principles of assessing the risk that bias may be present, as well as the presentation of such assessments and their incorporation into analyses. Finally, we address how source of funding and conflicts of interest of study authors may impact on study design, conduct and reporting. Conflicts of interest held by review authors are also of concern; these should be addressed using editorial procedures and are not covered by this chapter (see Chapter 1, Section 1.3 ).

We draw a distinction between two places in which bias should be considered. The first is in the results of the individual studies included in a systematic review . Since the conclusions drawn in a review depend on the results of the included studies, if these results are biased, then a meta-analysis of the studies will produce a misleading conclusion. Therefore, review authors should systematically take into account risk of bias in results of included studies when interpreting the results of their review.

The second place in which bias should be considered is the result of the meta-analysis (or other synthesis) of findings from the included studies . This result will be affected by biases in the included studies, and may additionally be affected by bias due to the absence of results from studies that should have been included in the synthesis. Specifically, the conclusions of the review may be compromised when decisions about how, when and where to report results of eligible studies are influenced by the nature and direction of the results. This is the problem of ‘non-reporting bias’ (also described as ‘publication bias’ and ‘selective reporting bias’). There is convincing evidence that results that are statistically non-significant and unfavourable to the experimental intervention are less likely to be published than statistically significant results, and hence are less easily identified by systematic reviews (see Section 7.2.3 ). This leads to results being missing systematically from syntheses, which can lead to syntheses over-estimating or under-estimating the effects of an intervention. For this reason, the assessment of risk of bias due to missing results is another essential component of a Cochrane Review.

Both the risk of bias in included studies and risk of bias due to missing results may be influenced by conflicts of interest of study investigators or funders . For example, investigators with a financial interest in showing that a particular drug works may exclude participants who did not respond favourably to the drug from the analysis, or fail to report unfavourable results of the drug in a manuscript.

Further discussion of assessing risk of bias in the results of an individual randomized trial is available in Chapter 8 , and of a non-randomized study in Chapter 25 . Further discussion of assessing risk of bias due to missing results is available in Chapter 13 .

7.1.1 Why consider risk of bias?

There is good empirical evidence that particular features of the design, conduct and analysis of randomized trials lead to bias on average, and that some results of randomized trials are suppressed from dissemination because of their nature. However, it is usually impossible to know to what extent biases have affected the results of a particular study or analysis (Savović et al 2012). For these reasons, it is more appropriate to consider whether a result is at risk of bias rather than claiming with certainty that it is biased. Most recent tools for assessing the internal validity of findings from quantitative studies in health now focus on risk of bias, whereas previous tools targeted the broader notion of ‘methodological quality’ (see also Section 7.1.2 ).

Bias should not be confused with imprecision . Bias refers to systematic error , meaning that multiple replications of the same study would reach the wrong answer on average. Imprecision refers to random error , meaning that multiple replications of the same study will produce different effect estimates because of sampling variation, but would give the right answer on average. Precision depends on the number of participants and (for dichotomous outcomes) the number of events in a study, and is reflected in the confidence interval around the intervention effect estimate from each study. The results of smaller studies are subject to greater sampling variation and hence are less precise. A small trial may be at low risk of bias yet its result may be estimated very imprecisely, with a wide confidence interval. Conversely, the results of a large trial may be precise (narrow confidence interval) but also at a high risk of bias.

Bias should also not be confused with the external validity of a study, that is, the extent to which the results of a study can be generalized to other populations and settings. For example, a study may enrol participants who are not representative of the population who most commonly experience a particular clinical condition. The results of this study may have limited generalizability to the wider population, but will not necessarily give a biased estimate of the effect in the highly specific population on which it is based. Factors influencing the applicability of an included study to the review question are covered in Chapter 14 and Chapter 15 .

7.1.2 From quality scales to domain-based tools

Critical assessment of included studies has long been an important component of a systematic review or meta-analysis, and methods have evolved greatly over time. Early appraisal tools were structured as quality ‘scales’, which combined information on several features into a single score. However, this approach was questioned after it was revealed that the type of quality scale used could significantly influence the interpretation of the meta-analysis results (Jüni et al 1999). That is, risk ratios of trials deemed ‘high quality’ by some scales suggested that the experimental intervention was superior, whereas when trials were deemed ‘high quality’ by other scales, the opposite was the case. The lack of a theoretical framework underlying the concept of ‘quality’ assessed by these scales resulted in tools mixing different concepts such as risk of bias, imprecision, relevance, applicability, ethics, and completeness of reporting. Furthermore, the summary score combining these components is difficult to interpret (Jüni et al 2001).

In 2008, Cochrane released the Cochrane risk-of-bias (RoB) tool, which was slightly revised in 2011 (Higgins et al 2011). The tool was built on the following key principles:

  • The tool focused on a single concept: risk of bias. It did not consider other concepts such as the quality of reporting, precision (the extent to which results are free of random errors), or external validity (directness, applicability or generalizability).
  • The tool was based on a domain-based (or component) approach, in which different types of bias are considered in turn. Users were asked to assess seven domains: random sequence generation, allocation sequence concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective outcome reporting, and other sources of bias. There was no scoring system in the tool.
  • The domains were selected to characterize mechanisms through which bias may be introduced into a trial, based on a combination of theoretical considerations and empirical evidence.
  • The assessment of risk of bias required judgement and should thus be completely transparent. Review authors provided a judgement for each domain, rated as ‘low’, ‘high’ or ‘unclear’ risk of bias, and provided reasons to support their judgement.

This tool has been implemented widely both in Cochrane Reviews and non-Cochrane reviews (Jørgensen et al 2016). However, user testing has raised some concerns related to the modest inter-rater reliability of some domains (Hartling et al 2013), the need to rethink the theoretical background of the ‘selective outcome reporting’ domain (Page and Higgins 2016), the misuse of the ‘other sources of bias’ domain (Jørgensen et al 2016), and the lack of appropriate consideration of the risk-of-bias assessment in the analyses and interpretation of results (Hopewell et al 2013).

To address these concerns, a new version of the Cochrane risk-of-bias tool, RoB 2, has been developed, and this should be used for all randomized trials in Cochrane Reviews ( MECIR Box 7.1.a ). The tool, described in Chapter 8 , includes important innovations in the assessment of risk of bias in randomized trials. The structure of the tool is similar to that of the ROBINS-I tool for non-randomized studies of interventions (described in Chapter 25 ). Both tools include a fixed set of bias domains, which are intended to cover all issues that might lead to a risk of bias. To help reach risk-of-bias judgements, a series of ‘signalling questions’ are included within each domain. Also, the assessment is typically specific to a particular result. This is because the risk of bias may differ depending on how an outcome is measured and how the data for the outcome are analysed. For example, if two analyses for a single outcome are presented, one adjusted for baseline prognostic factors and the other not, then the risk of bias in the two results may be different. 

MECIR Box 7.1.a Relevant expectations for conduct of intervention reviews

Assessing risk of bias ( )

   Risk of bias in individual study results for the included studies should be explicitly considered to determine the extent to which findings of the studies can be believed. Risks of bias might vary by result. It may not be feasible to assess the risk of bias in every single result available across the included studies, particularly if a large number of studies and results are available. Review author should therefore assess risk of bias in the results of outcomes included in their ‘summary of findings’ tables, which present the findings of seven or fewer outcomes that are most important to patients. The RoB 2 tool – as described in the – is the preferred tool for all randomized trials in new reviews. The Cochrane Evidence Production and Methods Directorate is, however, aware that there remain challenges in learning and implementation of the tool, and use of the original Cochrane risk of bias tool is acceptable for the time being.

7.2 Empirical evidence of bias

Where possible, assessments of risk of bias in a systematic review should be informed by evidence. The following sections summarize some of the key evidence about bias that informs our guidance on risk-of-bias assessments in Cochrane Reviews.

7.2.1 Empirical evidence of bias in randomized trials: meta-epidemiologic studies

Many empirical studies have shown that methodological features of the design, conduct and reporting of studies are associated with biased intervention effect estimates. This evidence is mainly based on meta-epidemiologic studies using a large collection of meta-analyses to investigate the association between a reported methodological characteristic and intervention effect estimates in randomized trials. The first meta-epidemiologic study was published in 1995. It showed exaggerated intervention effect estimates when intervention allocation methods were inadequate or unclear and when trials were not described as double-blinded (Schulz et al 1995). These results were subsequently confirmed in several meta-epidemiologic studies, showing that lack of reporting of adequate random sequence generation, allocation sequence concealment, double blinding and more specifically blinding of outcome assessors tend to yield higher intervention effect estimates on average (Dechartres et al 2016a, Page et al 2016).

Evidence from meta-epidemiologic studies suggests that the influence of methodological characteristics such as lack of blinding and inadequate allocation sequence concealment varies by the type of outcome. For example, the extent of over-estimation is larger when the outcome is subjectively measured (e.g. pain) and therefore likely to be influenced by knowledge of the intervention received, and lower when the outcome is objectively measured (e.g. death) and therefore unlikely to be influenced by knowledge of the intervention received (Wood et al 2008, Savović et al 2012).

7.2.2 Trial characteristics explored in meta-epidemiologic studies that are not considered sources of bias

Researchers have also explored the influence of other trial characteristics that are not typically considered a threat to a direct causal inference for intervention effect estimates. Recent meta-epidemiologic studies have shown that effect estimates were lower in prospectively registered trials compared with trials not registered or registered retrospectively (Dechartres et al 2016b, Odutayo et al 2017). Others have shown an association between sample size and effect estimates, with larger effects observed in smaller trials (Dechartres et al 2013). Studies have also shown a consistent association between intervention effect and single or multiple centre status, with single-centre trials showing larger effect estimates, even after controlling for sample size (Dechartres et al 2011).

In some of these cases, plausible bias mechanisms can be hypothesized. For example, both the number of centres and sample size may be associated with intervention effect estimates because of non-reporting bias (e.g. single-centre studies and small studies may be more likely to be published when they have larger, statistically significant effects than when they have smaller, non-significant effects); or single-centre and small studies may be subject to less stringent controls and checks. However, alternative explanations are possible, such as differences in factors relating to external validity (e.g. participants in small, single-centre trials may be more homogenous than participants in other trials). Because of this, these factors are not directly captured by the risk-of-bias tools recommended by Cochrane. Review authors should record these characteristics systematically for each study included in the systematic review (e.g. in the ‘Characteristics of included studies’ table) where appropriate. For example, trial registration status should be recorded for all randomized trials identified.

7.2.3 Empirical evidence of non-reporting biases

A list of the key types of non-reporting biases is provided in Table 7.2.a . In the sections that follow, we provide some of the evidence that underlies this list.

Table 7.2.a Definitions of some types of non-reporting biases

Publication bias

The or of research findings, depending on the nature and direction of the results.

Time-lag bias

The or publication of research findings, depending on the nature and direction of the results.

Language bias

The publication of research findings , depending on the nature and direction of the results.

Citation bias

The or of research findings, depending on the nature and direction of the results.

Multiple (duplicate) publication bias

The or publication of research findings, depending on the nature and direction of the results.

Location bias

The publication of research findings in journals with different or in standard databases, depending on the nature and direction of results.

Selective (non-) reporting bias

The of some outcomes or analyses, but not others, depending on the nature and direction of the results.

7.2.3.1 Selective publication of study reports

There is convincing evidence that the publication of a study report is influenced by the nature and direction of its results (Chan et al 2014). Direct empirical evidence of such selective publication (or ‘publication bias’) is obtained from analysing a cohort of studies in which there is a full accounting of what is published and unpublished (Franco et al 2014). Schmucker and colleagues analysed the proportion of published studies in 39 cohorts (including 5112 studies identified from research ethics committees and 12,660 studies identified from trials registers) (Schmucker et al 2014). Only half of the studies were published, and studies with statistically significant results were more likely to be published than those with non-significant results (odds ratio (OR) 2.8; 95% confidence interval (CI) 2.2 to 3.5) (Schmucker et al 2014). Similar findings were observed by Scherer and colleagues, who conducted a systematic review of 425 studies that explored subsequent full publication of research initially presented at biomedical conferences (Scherer et al 2018). Only 37% of the 307,028 abstracts presented at conferences were published later in full (60% for randomized trials), and abstracts with statistically significant results in favour of the experimental intervention (versus results in favour of the comparator intervention) were more likely to be published in full (OR 1.17; 95% CI 1.07 to 1.28) (Scherer et al 2018). By examining a cohort of 164 trials submitted to the FDA for regulatory approval, Rising and colleagues found that trials with favourable results were more likely than those with unfavourable results to be published (OR 4.7; 95% CI 1.33 to 17.1) (Rising et al 2008).

In addition to being more likely than unpublished randomized trials to have statistically significant results, published trials also tend to report larger effect estimates in favour of the experimental intervention than trials disseminated elsewhere (e.g. in conference abstracts, theses, books or government reports) (ratio of odds ratios 0.90; 95% CI 0.82 to 0.98) (Dechartres et al 2018). This bias has been observed in studies in many scientific disciplines, including the medical, biological, physical and social sciences (Polanin et al 2016, Fanelli et al 2017).

7.2.3.2 Other types of selective dissemination of study reports

The length of time between completion of a study and publication of its results can be influenced by the nature and direction of the study results (‘time-lag bias’). Several studies suggest that randomized trials with results that favour the experimental intervention are published in journals about one year earlier on average than trials with unfavourable results (Hopewell et al 2007, Urrutia et al 2016).

Investigators working in a non-English speaking country may publish some of their work in local, non-English language journals, which may not be indexed in the major biomedical databases (‘language bias’). It has long been assumed that investigators are more likely to publish positive studies in English-language journals than in local, non-English language journals (Morrison et al 2012). Contrary to this belief, Dechartres and colleagues identified larger intervention effects in randomized trials published in a language other than English than in English (ratio of odds ratios 0.86; 95% CI 0.78 to 0.95), which the authors hypothesized may be related to the higher risk of bias observed in the non-English language trials (Dechartres et al 2018). Several studies have found that in most cases there were no major differences between summary estimates of meta-analyses restricted to English-language studies compared with meta-analyses including studies in languages other than English (Morrison et al 2012, Dechartres et al 2018).

The number of times a study report is cited appears to be influenced by the nature and direction of its results (‘citation bias’). In a meta-analysis of 21 methodological studies, Duyx and colleagues observed that articles with statistically significant results were cited 1.57 times the rate of articles with non-significant results (rate ratio 1.57; 95% CI 1.34 to 1.83) (Duyx et al 2017). They also found that articles with results in a positive direction (regardless of their statistical significance) were cited at 2.14 times the rate of articles with results in a negative direction (rate ratio 2.14; 95% CI 1.29 to 3.56) (Duyx et al 2017). In an analysis of 33,355 studies across all areas of science, Fanelli and colleagues found that the number of citations received by a study was positively correlated with the magnitude of effects reported (Fanelli et al 2017). If positive studies are more likely to be cited, they may be more likely to be located, and thus more likely to be included in a systematic review.

Investigators may report the results of their study across multiple publications; for example, Blümle and colleagues found that of 807 studies approved by a research ethics committee in Germany from 2000 to 2002, 135 (17%) had more than one corresponding publication (Blümle et al 2014). Evidence suggests that studies with statistically significant results or larger treatment effects are more likely to lead to multiple publications (‘multiple (duplicate) publication bias’) (Easterbrook et al 1991, Tramèr et al 1997), which makes it more likely that they will be located and included in a meta-analysis.

Research suggests that the accessibility or level of indexing of journals is associated with effect estimates in trials (‘location bias’). For example, a study of 61 meta-analyses found that trials published in journals indexed in Embase but not MEDLINE yielded smaller effect estimates than trials indexed in MEDLINE (ratio of odds ratios 0.71; 95% CI 0.56 to 0.90); however, the risk of bias due to not searching Embase may be minor, given the lower prevalence of Embase-unique trials (Sampson et al 2003). Also, Moher and colleagues estimate that 18,000 biomedical research studies are tucked away in ‘predatory’ journals, which actively solicit manuscripts and charge publications fees without providing robust editorial services (such as peer review and archiving or indexing of articles) (Moher et al 2017). The direction of bias associated with non-inclusion of studies published in predatory journals depends on whether they are publishing valid studies with null results or studies whose results are biased towards finding an effect.

7.2.3.3 Selective dissemination of study results

The need to compress a substantial amount of information into a few journal pages, along with a desire for the most noteworthy findings to be published, can lead to omission from publication of results for some outcomes because of the nature and direction of the findings. Particular results may not be reported at all ( ‘selective non-reporting of results’ ) or be reported incompletely ( ‘selective under-reporting of results’ , e.g. stating only that “P>0.05” rather than providing summary statistics or an effect estimate and measure of precision) (Kirkham et al 2010). In such instances, the data necessary to include the results in a meta-analysis are unavailable. Excluding such studies from the synthesis ignores the information that no significant difference was found, and biases the synthesis towards finding a difference (Schmid 2016).

Evidence of selective non-reporting and under-reporting of results in randomized trials has been obtained by comparing what was pre-specified in a trial protocol with what is available in the final trial report. In two landmark studies, Chan and colleagues found that results were not reported for at least one benefit outcome in 71% of randomized trials in one cohort (Chan et al 2004a) and 88% in another (Chan et al 2004b). Results were under-reported (e.g. stating only that “P>0.05”) for at least one benefit outcome in 92% of randomized trials in one cohort and 96% in another. Statistically significant results for benefit outcomes were twice as likely as non-significant results to be completely reported (range of odds ratios 2.4 to 2.7) (Chan et al 2004a, Chan et al 2004b). Reviews of studies investigating selective non-reporting and under-reporting of results suggest that it is more common for outcomes defined by trialists as secondary rather than primary (Jones et al 2015, Li et al 2018).

Selective non-reporting and under-reporting of results occurs for both benefit and harm outcomes. Examining the studies included in a sample of 283 Cochrane Reviews, Kirkham and colleagues suspected that 50% of 712 studies with results missing for the primary benefit outcome of the review were missing because of the nature of the results (Kirkham et al 2010). This estimate was slightly higher (63%) in 393 studies with results missing for the primary harm outcome of 322 systematic reviews (Saini et al 2014).

7.3 General procedures for risk-of-bias assessment

7.3.1 collecting information for assessment of risk of bias.

Information for assessing the risk of bias can be found in several sources, including published articles, trials registers, protocols, clinical study reports (i.e. documents prepared by pharmaceutical companies, which provide extensive detail on trial methods and results), and regulatory reviews (see also Chapter 5, Section 5.2 ).

Published articles are the most frequently used source of information for assessing risk of bias. This source is theoretically very valuable because it has been reviewed by editors and peer reviewers, who ideally will have prompted authors to report their methods transparently. However, the completeness of reporting of published articles is, in general, quite poor, and essential information for assessing risk of bias is frequently missing. For example, across 20,920 randomized trials included in 2001 Cochrane Reviews, the percentage of trials at unclear risk of bias was 49% for random sequence generation, 57% for allocation sequence concealment; 31% for blinding and 25% for incomplete outcome data (Dechartres et al 2017). Nevertheless, more recent trials were less likely to be judged at unclear risk of bias, suggesting that reporting is improving over time (Dechartres et al 2017).

Trials registers can be a useful source of information to obtain results of studies that have not yet been published (Riveros et al 2013). However, registers typically report only limited information about methods used in the trial to inform an assessment of risk of bias (Wieseler et al 2012). Protocols, which outline the objectives, design, methodology, statistical consideration and procedural aspects of a clinical study, may provide more detailed information on the methods used than that provided in the results report of a study. They are increasingly being published or made available by journals who publish the final report of a study. Protocols are also available in some trials registers, particularly ClinicalTrials.gov (Zarin et al 2016), on websites dedicated to data sharing such as ClinicalStudyDataRequest.com , or from drug regulatory authorities such as the European Medicines Agency. Clinical study reports are another highly useful source of information (Wieseler et al 2012, Jefferson et al 2014).

It may be necessary to contact study investigators to request access to the trial protocol, to clarify incompletely reported information or understand discrepant information available in different sources. To reduce the risk that study authors provide overly positive answers to questions about study design and conduct, we suggest review authors use open-ended questions. For example, to obtain information about the randomization process, review authors might consider asking: ‘What process did you use to assign each participant to an intervention?’ To obtain information about blinding of participants, it might be useful to request something like, ‘Please describe any measures used to ensure that trial participants were unaware of the intervention to which they were assigned’. More focused questions can then be asked to clarify remaining uncertainties.

7.3.2 Performing assessments of risk of bias   

Risk-of-bias assessments in Cochrane Reviews should be performed independently by at least two people ( MECIR Box 7.3.a ). Doing so can minimize errors in assessments and ensure that the judgement is not influenced by a single person’s preconceptions. Review authors should also define in advance the process for resolving disagreements. For example, both assessors may attempt to resolve disagreements via discussion, and if that fails, call on another author to adjudicate the final judgement. Review authors assessing risk of bias should have either content or methodological expertise (or both), and an adequate understanding of the relevant methodological issues addressed by the risk-of-bias tool. There is some evidence that intensive, standardized training may significantly improve the reliability of risk-of-bias assessments (da Costa et al 2017). To improve reliability of assessments, a review team could consider piloting the risk-of-bias tool on a sample of articles. This may help ensure that criteria are applied consistently and that consensus can be reached. Three to six papers should provide a suitable sample for this. We do not recommend the use of statistical measures of agreement (such as kappa statistics ) to describe the extent to which assessments by multiple authors were the same. It is more important that reasons for any disagreement are explored and resolved.

MECIR Box 7.3.a Relevant expectations for conduct of intervention reviews

Assessing risk of bias in duplicate ( )

Duplicating the risk-of-bias assessment reduces both the risk of making mistakes and the possibility that assessments are influenced by a single person’s biases.

The process for reaching risk-of-bias judgements should be transparent. In other words, readers should be able to discern why a particular result was rated at low risk of bias and why another was rated at high risk of bias. This can be achieved by review authors providing information in risk-of-bias tables to justify the judgement made. Such information may include direct quotes from study reports that articulate which methods were used, and an explanation for why such a method is flawed. Cochrane Review authors are expected to record the source of information (including the precise location within a document) that informed each risk-of-bias judgement ( MECIR Box 7.3.b ).

MECIR Box 7.3.b Relevant expectations for conduct of intervention reviews

Supporting judgements of risk of bias ( )

Providing support for the judgement makes the process transparent.

Providing sources of information for risk-of-bias assessments ( )

Readers, editors and referees should have the opportunity to see for themselves from where supports for judgements have been obtained.

Many results are often available in trial reports, so review authors should think carefully about which results to assess for risk of bias. Review authors should assess risk of bias in results for outcomes that are included in the ‘Summary of findings’ table ( MECIR Box 7.1.a ). Such tables typically include seven or fewer patient-important outcomes (for more details on constructing a ‘Summary of findings’ table, see Chapter 14 ).

Novel methods for assessing risk of bias are emerging, including machine learning systems designed to semi-automate risk-of-bias assessment (Marshall et al 2016, Millard et al 2016). These methods involve using a sample of previous risk-of-bias assessments to train machine learning models to predict risk of bias from PDFs of study reports, and extract supporting text for the judgements. Some of these approaches showed good performance for identifying relevant sentences to identify information pertinent to risk of bias from the full-text content of research articles describing clinical trials. A study showed that about one-third of articles could be assessed by just one reviewer if such a tool is used instead of the two required reviewers (Millard et al 2016). However, reliability in reaching judgements about risk of bias compared with human reviewers was slight to moderate depending on the domain assessed (Gates et al 2018).

7.4 Presentation of assessment of risk of bias

Risk-of-bias assessments may be presented in a Cochrane Review in various ways. A full risk-of-bias table includes responses to each signalling question within each domain (see Chapter 8, Section 8.2 ) and risk-of-bias judgements, along with text to support each judgement. Such full tables are lengthy and are unlikely to be of great interest to readers, so should generally not be included in the main body of the review. It is nevertheless good practice to make these full tables available for reference.

We recommend the use of forest plots that present risk-of-bias judgements alongside the results of each study included in a meta-analysis (see Figure 7.4.a ). This will give a visual impression of the relative contributions of the studies at different levels of risk of bias, especially when considered in combination with the weight given to each study. This may assist authors in reaching overall conclusions about the risk of bias of the synthesized result, as discussed in Section 7.6 . Optionally, forest plots or other tables or graphs can be ordered (stratified) by judgements on each risk-of-bias domain or by the overall risk-of-bias judgement for each result.

Review authors may wish to generate bar graphs illustrating the relative contributions of studies with each of risk-of-bias judgement (low risk of bias, some concerns, and high risk of bias). When dividing up a bar into three regions for this purpose, it is preferable to determine the regions according to statistical information (e.g. precision, or weight in a meta-analysis) arising from studies in each category, rather than according to the number of studies in each category.

Figure 7.4.a Forest plot displaying RoB 2 risk-of-bias judgements for each randomized trial included in a meta-analysis of mental health first aid (MHFA) knowledge scores. Adapted from Morgan et al (2018).

literature reviews bias

7.5 Summary assessments of risk of bias

Review authors should make explicit summary judgements about the risk of bias for important results both within studies and across studies (see MECIR Box 7.5.a ). The tools currently recommended by Cochrane for assessing risk of bias within included studies (RoB 2 and ROBINS-I) produce an overall judgement of risk of bias for the result being assessed. These overall judgements are derived from assessments of individual bias domains as described, for example, in Chapter 8, Section 8.2 .

To summarize risk of bias across study results in a synthesis, review authors should follow guidance for assessing certainty in the body of evidence (e.g. using GRADE), as described in Chapter 14, Section 14.2.2 . When a meta-analysis is dominated by study results at high risk of bias, the certainty of the body of evidence may be rated as being lower than if such studies were excluded from the meta-analysis. Section 7.6 discusses some possible courses of action that may be preferable to retaining such studies in the synthesis.

MECIR Box 7.5.a Relevant expectations for conduct of intervention reviews

Summarizing risk-of-bias assessments ( )

.

7.6 Incorporating assessment of risk of bias into analyses

7.6.1 introduction.

When performing and presenting meta-analyses, review authors should address risk of bias in the results of included studies ( MECIR Box 7.6.a ). It is not appropriate to present analyses and interpretations while ignoring flaws identified during the assessment of risk of bias. In this section we present suitable strategies for addressing risk of bias in results from studies included in a meta-analysis, either in order to understand the impact of bias or to determine a suitable estimate of intervention effect (Section 7.6.2 ). For the latter, decisions often involve a trade-off between bias and precision. A meta-analysis that includes all eligible studies may produce a result with high precision (narrow confidence interval) but be seriously biased because of flaws in the conduct of some of the studies. However, including only the studies at low risk of bias in all domains assessed may produce a result that is unbiased but imprecise (if there are only a few studies at low risk of bias).

MECIR Box 7.6.a Relevant expectations for conduct of intervention reviews

Addressing risk of bias in the synthesis ( )

.

Incorporating assessments of risk of bias ( )

If randomized trials have been assessed using one or more tools in addition to the RoB 2 tool

.

7.6.2 Including risk-of-bias assessments in analyses

Broadly speaking, studies at high risk of bias should be given reduced weight in meta-analyses compared with studies at low risk of bias. However, methodological approaches for weighting studies according to their risk of bias are not sufficiently well developed that they can currently be recommended for use in Cochrane Reviews.

When risks of bias vary across studies in a meta-analysis, four broad strategies are available to incorporate assessments into the analysis. The choice of strategy will influence which result to present as the main finding for a particular outcome (e.g. in the Abstract). The intended strategy should be described in the protocol for the review.

(1) Primary analysis restricted to studies at low risk of bias

The first approach involves restricting the primary analysis to studies judged to be at low risk of bias overall. Review authors who restrict their primary analysis in this way are encouraged to perform sensitivity analyses to show how conclusions might be affected if studies at a high risk of bias were included.

(2) Present multiple (stratified) analyses

Stratifying according to the overall risk of bias will produce multiple estimates of the intervention effect: for example, one based on all studies, one based on studies at low risk of bias, and one based on studies at high risk of bias. Two or more such estimates might be considered with equal prominence (e.g. the first and second of these). However, presenting the results in this way may be confusing for readers. In particular, people who need to make a decision usually require a single estimate of effect. Furthermore, ‘Summary of findings’ tables typically present only a single result for each outcome. On the other hand, a stratified forest plot presents all the information transparently. Though we would generally recommend stratification is done on the basis of overall risk of bias, review authors may choose to conduct subgroup analyses based on specific bias domains (e.g. risk of bias arising from the randomization process).

Formal comparisons of intervention effects according to risk of bias can be done with a test for differences across subgroups (e.g. comparing studies at high risk of bias with studies at low risk of bias), or by using meta-regression (for more details see Chapter 10, Section 10.11.4 ). However, review authors should be cautious in planning and carrying out such analyses, because an individual review may not have enough studies in each category of risk of bias to identify meaningful differences. Lack of a statistically significant difference between studies at high and low risk of bias should not be interpreted as absence of bias, because these analyses typically have low power.

The choice between strategies (1) and (2) should be based to large extent on the balance between the potential for bias and the loss of precision when studies at high or unclear risk of bias are excluded.

(3) Present all studies and provide a narrative discussion of risk of bias

The simplest approach to incorporating risk-of-bias assessments in results is to present an estimated intervention effect based on all available studies, together with a description of the risk of bias in individual domains, or a description of the summary risk of bias, across studies. This is the only feasible option when all studies are at the same risk of bias. However, when studies have different risks of bias, we discourage such an approach for two reasons. First, detailed descriptions of risk of bias in the Results section, together with a cautious interpretation in the Discussion section, will often be lost in the Authors’ conclusions, Abstract and ‘Summary of findings’ table, so that the final interpretation ignores the risk of bias and decisions continue to be based, at least in part, on compromised evidence. Second, such an analysis fails to down-weight studies at high risk of bias and so will lead to an overall intervention effect that is too precise, as well as being potentially biased.

When the primary analysis is based on all studies, summary assessments of risk of bias should be incorporated into explicit measures of the certainty of evidence for each important outcome, for example, by using the GRADE system (Guyatt et al 2008). This incorporation can help to ensure that judgements about the risk of bias, as well as other factors affecting the quality of evidence, such as imprecision, heterogeneity and publication bias, are considered appropriately when interpreting the results of the review (see Chapter 14 and Chapter 15 ).

(4) Adjust effect estimates for bias

A final, more sophisticated, option is to adjust the result from each study in an attempt to remove the bias. Adjustments are usually undertaken within a Bayesian framework, with assumptions about the size of the bias and its uncertainty being expressed through prior distributions (see Chapter 10, Section 10.13 ). Prior distributions may be based on expert opinion or on meta-epidemiological findings (Turner et al 2009, Welton et al 2009). The approach is increasingly used in decision making, where adjustments can additionally be made for applicability of the evidence to the decision at hand. However, we do not encourage use of bias adjustments in the context of a Cochrane Review because the assumptions required are strong, limited methodological expertise is available, and it is not possible to account for issues of applicability due to the diverse intended audiences for Cochrane Reviews. The approach might be entertained as a sensitivity analysis in some situations.

7.7 Considering risk of bias due to missing results

The 2011 Cochrane risk-of-bias tool for randomized trials encouraged a study-level judgement about whether there has been selective reporting, in general, of the trial results. As noted in Section 7.2.3.3 , selective reporting can arise in several ways: (1) selective non-reporting of results, where results for some of the analysed outcomes are selectively omitted from a published report; (2) selective under-reporting of data, where results for some outcomes are selectively reported with inadequate detail for the data to be included in a meta-analysis; and (3) bias in selection of the reported result, where a result has been selected for reporting by the study authors, on the basis of the results, from multiple measurements or analyses that have been generated for the outcome domain (Page and Higgins 2016).

The RoB 2 and ROBINS-I tools focus solely on risk of bias as it pertains to a specific trial result. With respect to selective reporting, RoB 2 and ROBINS-I examine whether a specific result from the trial is likely to have been selected from multiple possible results on the basis of the findings (scenario 3 above). Guidance on assessing the risk of bias in selection of the reported result is available in Chapter 8 (for randomized trials) and Chapter 25 (for non-randomized studies of interventions).

If there is no result (i.e. it has been omitted selectively from the report or under-reported), then a risk-of-bias assessment at the level of the study result is not applicable. Selective non-reporting of results and selective under-reporting of data are therefore not covered by the RoB 2 and ROBINS-I tools. Instead, selective non-reporting of results and under-reporting of data should be assessed at the level of the synthesis across studies. Both practices lead to a situation similar to that when an entire study report is unavailable because of the nature of the results (also known as publication bias). Regardless of whether an entire study report or only a particular result of a study is unavailable, the same consequence can arise: bias in a synthesis because available results differ systematically from missing results (Page et al 2018). Chapter 13 provides detailed guidance on assessing risk of bias due to missing results in a systematic review.

7.8 Considering source of funding and conflict of interest of authors of included studies

Readers of a trial report often need to reflect on whether conflicts of interest have influenced the design, conduct, analysis and reporting of a trial. It is therefore now common for scientific journals to require authors of trial reports to provide a declaration of conflicts of interest (sometimes called ‘competing’ or ‘declarations of’ interest), to report funding sources and to describe any funder’s role in the trial.

In this section, we characterize conflicts of interest in randomized trials and discuss how conflicts of interest may impact on trial design and effect estimates. We also suggest how review authors can collect, process and use information on conflicts of interest in the assessment of:

  • directness of studies to the review’s research question;
  • heterogeneity in results due to differences in the designs of eligible studies;
  • risk of bias in results of included studies;
  • risk of bias in a synthesis due to missing results.

At the time of writing, a formal Tool for Addressing Conflicts of Interest in Trials (TACIT) is being developed under the auspices of the Cochrane Bias Methods Group. The TACIT development process has informed the content of this section, and we encourage readers to check http://tacit.one for more detailed guidance that will become available.

7.8.1 Characteristics of conflicts of interest

The Institute of Medicine defined conflicts of interest as “ a set of circumstances that creates a risk that professional judgment or actions regarding a primary interest will be unduly influenced by a secondary interest” (Lo et al 2009). In a clinical trial, the primary interest is to provide patients, clinicians and health policy makers with an unbiased and clinically relevant estimate of an intervention effect. Secondary interest may be both financial and non-financial.

Financial conflicts of interest involve both financial interests related to a specific trial (for example, a company funding a trial of a drug produced by the same company) and financial interests related to the authors of a trial report (for example, authors’ ownership of stocks or employment by a drug company).

For drug and device companies and other manufacturers, the financial difference between a negative and positive pivotal trial can be considerable. For example, the mean stock price of the companies funding 23 positive pivotal oncology trials increased by 14% after disclosure of the results (Rothenstein et al 2011). Industry funding is common, especially in drug trials. In a study of 200 trial publications from 2015, 68 (38%) of 178 trials with funding declarations were industry funded (Hakoum et al 2017). Also, in a cohort of oncology drug trials, industry funded 44% of trials and authors declared conflicts of interest in 69% of trials (Riechelmann et al 2007).

The degree of funding, and the type of the involvement of industry funders, may differ across trials. In some situations, involvement includes only the provision of free study medication for a trial that has otherwise been planned and conducted independently, and funded largely, by public means. In other situations, a company fully funds and controls a trial. In rarer cases, head-to-head trials comparing two drugs may be funded by the two different companies producing the drugs.

A Cochrane Methodology Review analysed 75 studies of the association between industry funding and trial results (Lundh et al 2017). The authors concluded that trials funded by a drug or device company were more likely to have positive conclusions and statistically significant results, and that this association could not be explained by differences in risk of bias between industry and non-industry funded trials. However, industry and non-industry trials may differ in ways that may confound the association; for example due to choice of patient population, comparator interventions or outcomes. Only one of the included studies used a meta-epidemiological design and found no clear association between industry funding and the magnitude of intervention effects (Als-Nielsen et al 2003). Similar to the association with industry funding, other studies have reported that results of trials conducted by authors with a financial conflict of interest were more likely to be positive (Ahn et al 2017).

Conflicts of interest may also be non-financial (Viswanathan et al 2014). Characterizations of non-financial conflicts of interest differ somewhat, but typically distinguish between conflicts related mainly to an individual (e.g. adherence to a theory or ideology), relationships to other individuals (e.g. loyalty to friends, family members or close colleagues), or relationship to groups (e.g. work place or professional groups). In medicine, non-financial conflicts of interest have received less attention than financial conflicts of interest. In addition, financial and non-financial conflicts are often intertwined; for example, non-financial conflicts related to institutional association can be considered as indirect financial conflicts linked to employment. Definitions of what should be characterized as a ‘non-financial’ conflict of interest, and, in particular, whether personal beliefs, experiences or intellectual commitments should be considered conflicts of interest, have been debated (Bero and Grundy 2016).

It is useful to differentiate between non-financial conflicts of interest of a trial researcher and the basic interests and hopes involved in doing good trial research. Most researchers conducting a trial will have an interest in the scientific problem addressed, a well-articulated theoretical position, anticipation for a specific trial result, and hopes for publication in a respectable journal. This is not a conflict of interest but a basic condition for doing health research. However, individual researchers may lose sight of the primacy of the methodological neutrality at the heart of a scientific enquiry, and become unduly occupied with the secondary interest of how trial results may affect academic appearance or chances of future funding. Extreme examples are the publication of fabricated trial data or trials, some of which have had an impact on systematic reviews (Marret et al 2009).

Few empirical studies of non-financial conflicts of interest in randomized trials have been published, and to our knowledge there are none that assess the impact of non-financial conflicts of interest on trial results and conclusions. However, non-financial conflicts of interests have been investigated in other types of clinical research; for example, guideline authors’ specialty appears to have influenced their voting behaviour while developing guidelines for mammography screening (Norris et al 2012).

7.8.2 Conflict of interest and trial design

Core decisions on designing a trial involve defining the type of participants to be included, the type of experimental intervention, the type of comparator, the outcomes (and timing of outcome assessments) and the choice of analysis. Such decisions will often reflect a compromise between what is clinically and scientifically ideal and what is practically possible. However, when investigators have important conflicts of interest, a trial may be designed in a way that increases its chances of detecting a positive trial result, at the expense of clinical applicability. For example, narrow eligibility criteria may exclude older and frail patients, thus reducing the possibility of detecting clinically relevant harms. Alternatively, trial designers may choose placebo as a comparator despite an effective intervention being in regular use, or they may focus on short-term surrogate outcomes rather than clinically relevant long-term outcomes (Estellat and Ravaud 2012, Wieland et al 2017).

Trial design choices may be more subtle. For example, a trial may be designed to favour an experimental drug by using an inferior comparator drug when better alternatives exist (Safer 2002) or by using a low dose of the comparator drug when the focus is efficacy and a high dose of the comparator drug when the focus is harms (Mann and Djulbegovic 2013). In a typical Cochrane Review with fairly broad eligibility criteria aiming to identify and summarize all relevant trials, it is pertinent to consider the degree to which a given trial result directly relates to the question posed by the review. If all or most identified trials have narrow eligibility criteria and short-term outcomes, a review question focusing on broad patient categories and long-term effects can only be answered indirectly by the included studies. This has implications for the assessment of the certainty of the evidence provided by the review, which is addressed through the concept of indirectness in the GRADE framework (see Chapter 14, Section 14.2 ).

If results in a meta-analysis display heterogeneity, then differences in design choices that are driven by conflicts of interest may be one reason for this. Thus, conflicts of interest may also affect reflections on the certainty of the evidence through the GRADE concept of inconsistency.

7.8.3 Conflicts of interest and risk of bias in a trial’s effect estimate

Authors of Cochrane Reviews have sometimes included conflicts of interest as an ‘other source of bias’ while using the previous versions of the risk-of-bias tool (Jørgensen et al 2016). Consistent with previous versions of the Handbook , we discourage the inclusion of conflicts of interest directly in the risk-of-bias assessment. Adding conflicts of interest to the bias tool is inconsistent with the conceptual structure of the tool, which is built on mechanistically defined bias domains. Also, restricting consideration of the potential impact of conflicts of interest to a question of risk of bias in an individual trial result overlooks other important aspects, such as the design of the trial (see Section 7.8.2 ) and potential bias in a meta-analysis due to missing results (see Section 7.8.4 ).

Conflicts of interest may lead to bias in effect estimates from a trial through several mechanisms. For example, if those recruiting participants into a trial have important conflicts of interest and the allocation sequence is not concealed, then they may be more likely to subvert the allocation process to produce intervention groups that are systematically unbalanced in favour of their preferred intervention. Similarly, investigators with important conflicts of interests may decide to exclude from the analysis some patients who did not respond as anticipated to the experimental intervention, resulting in bias due to missing outcome data. Furthermore, selective reporting of a favourable result may be strongly associated with conflicts of interest (McGauran et al 2010), due to either selective reporting of particular outcome measurements or selective reporting of particular analyses (Eyding et al 2010, Vedula et al 2013). One study found that use of modified-intention-to-treat analysis and post-randomization exclusions occurred more often in trials with industry funding or author conflicts of interest (Montedori et al 2011). Accessing the trial protocol and statistical analysis plan to determine which outcomes and analyses were pre-specified is therefore especially important for a trial with relevant conflicts of interest.

Review authors should explain how consideration of conflicts of interest informed their risk-of-bias judgements. For example, when information on the analysis plans is lacking, review authors may judge the risk of bias in selection of the reported result to be high if the study investigators had important financial conflicts of interest. Conversely, if trial investigators have clearly used methods that are likely to minimize bias, review authors should not judge the risk of bias for each domain higher just because the investigators happen to have conflicts of interest. In addition, as an optional component in the revised risk-of-bias tool, review authors may reflect on the direction of bias (e.g. bias in favour of the experimental intervention). Information on conflicts of interest may inform the assessment of direction of bias.

7.8.4 Conflicts of interest and risk of bias in a synthesis of trial results

Conflicts of interest may also affect the decision not to report trial results. Conflicts of interest are probably one of several important reasons for decisions not to publish trials with negative findings, and not to publish unfavourable results (Sterne 2013). When relevant trial results are systematically missing from a meta-analysis because of the nature of the findings, the synthesis is at risk of bias due to missing results. Chapter 13 provides detailed guidance on assessing risk of bias due to missing results in a systematic review.

7.8.5 Practical approach to identifying and extracting information on conflicts of interest

When assessing conflicts of interest in a trial, review authors will, to a large degree, rely on declared conflicts. Source of funding may be reported in a trial publication, and conflicts of interest may be reported in an accompanying declaration, for example the International Committee of Medical Journal Editors ( ICMJE ) declaration. In a random sample of 1002 articles published in 2016, authors of 229 (23%) declared having a conflict of interest (Grundy et al 2018). Unfortunately, undeclared conflicts of interest and sources of funding are fairly common (Rasmussen et al 2015, Patel et al 2018).

It is always prudent to examine closely the conflicts of interest of lead and corresponding authors, based on information reported in the trial publication and the author declaration (for example, the ICMJE declaration form). Review authors should also consider examining conflicts of interest of trial co-authors and any commercial collaborators with conflicts of interest; for example, a commercial contract research organization hired by the funder to collect and analyse trial data or the involvement of a medical writing agency. Due to the high prevalence of undisclosed conflicts of interest, review authors should consider expanding their search for conflicts of interest data from other sources (e.g. disclosure in other publications by the authors, the trial protocol, the clinical study report, and public conflicts of interest registries (e.g. Open Payments database)).

We suggest that review authors balance the workload involved with the expected gain, and search additional sources of information on conflicts of interest when there is reason to suspect important conflicts of interest . As a rule of thumb, in trials with unclear funding source and no declaration of conflicts of interest from lead or corresponding authors, we suggest review authors search the Open Payments database, ClinicalTrials.gov , and conflicts of interest declarations in a few previous publications by the study authors. In trials with no commercial funding (including no company employee co-authors) and no declared conflicts of interest for lead or corresponding authors, we suggest review authors not bother to consult additional sources. Also, for trials where lead or corresponding authors have clear conflicts of interest, little additional information may be gained from checking conflicts of interest of co-authors.

Gaining access to relevant information on financial conflicts of interest is possible for a considerable number of trials, despite inherent problems of undeclared conflicts. We expect that the proportion of trials with relevant declarations will increase further.

Access to relevant information on non-financial conflicts of interest is more difficult to gain. Declaration of non-financial conflicts of interest is requested by approximately 50% of journals (Shawwa et al 2016). The term was deleted from ICMJE’s declaration in 2010 in exchange for a broad category of “Other relationships or activities” (Drazen et al 2010). Therefore, non-financial conflicts of interests are seldom self-declared, although if available, such information should be considered.

Non-financial conflicts of interest are difficult to address due to lack of relevant empirical studies on their impact on study results, lack of relevant thresholds for importance, and lack of declaration in many previous trials. However, as a rule of thumb, we suggest that review authors assume trial authors have no non-financial conflicts of interest unless there are clear suggestions of the opposite. Examples of such clues could be a considerable spin in trial publications (Boutron et al 2010), an institutional relationship pertinent to the intervention tested, or external evidence of a fixated ideological or theoretical position.

7.8.6 Judgement of notable concern about conflict of interest

Review authors should describe funding information and conflicts of interest of authors for all studies in the ‘Characteristics of included studies’ table ( MECIR Box 7.8.a ). Also, review authors may want to explore (e.g. in a subgroup analysis) whether trials with conflicts of interest have different intervention effect estimates, or more variable effect estimates, than trials without conflicts of interest. In both cases, review authors need to aim for a relevant threshold for when any conflict of interest is deemed important. If put too low, there is a risk that trivial conflicts of interest will cloud important ones; if set too high, there is the risk that important conflicts of interest are downplayed or ignored.

This judgement should take into account both the degree of conflicts of interest of study authors and also the extent of their involvement in the study. We pragmatically suggest review authors aim for a judgement about whether or not there is reason for ‘notable concern’ about conflicts of interest. This information could be displayed in a table with three columns:

  • trial identifier;
  • judgement (e.g. ‘notable concern about conflict of interest’ versus ‘no notable concern about conflict of interest’); and
  • rationale for judgement, potentially subdivided according to who had conflicts of interest (e.g. lead or corresponding authors, other authors) and stage(s) of the trial to which they contributed (design, conduct, analysis, reporting).

A judgement of ‘notable concern about conflict of interest’ should be based on reflected assessment of identified conflicts of interest. A hypothetical possibility for undeclared conflicts of interest is, as a rule of thumb, not considered sufficient reason for ‘notable concern’. By ‘notable concern’ we imply important conflicts of interest expected to have a potential impact on study design, risk of bias in study results or risk of bias in a synthesis due to missing results. For example, financial conflicts of interest are important in a trial initiated, designed, analysed and reported by drug or device company employees. Conversely, financial conflicts of interest are less important in a trial initiated, designed, analysed and reported by academics adhering to the arm’s length principle when acquiring free trial medication from a drug company, and where lead authors have no conflicts of interest. Similarly, non-financial conflicts of interest may be important in a trial of a highly controversial and ideologically loaded question such as the adverse effect of male circumcision. Non-financial conflicts of interest are less concerning in a trial comparing two treatments in general use with no connotation to highly controversial scientific theories, ideology or professional groups. Mixing trivial conflicts of interest with important ones may mask the latter and will expand review author workload considerably.

MECIR Box 7.8.a Relevant expectations for conduct of intervention reviews

Addressing conflicts of interest in included trials ( )

 

7.9 Chapter information

Authors: Isabelle Boutron, Matthew J Page, Julian PT Higgins, Douglas G Altman, Andreas Lundh, Asbjørn Hróbjartsson

Acknowledgements: We thank Gerd Antes, Peter Gøtzsche, Peter Jüni, Steff Lewis, David Moher, Andrew Oxman, Ken Schulz, Jonathan Sterne and Simon Thompson for their contributions to previous versions of this chapter.

7.10 References

Ahn R, Woodbridge A, Abraham A, Saba S, Korenstein D, Madden E, Boscardin WJ, Keyhani S. Financial ties of principal investigators and randomized controlled trial outcomes: cross sectional study. BMJ 2017; 356 : i6770.

Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Association of funding and conclusions in randomized drug trials: a reflection of treatment effect or adverse events? JAMA 2003; 290 : 921-928.

Bero LA, Grundy Q. Why Having a (Nonfinancial) Interest Is Not a Conflict of Interest. PLoS Biology 2016; 14 : e2001221.

Blümle A, Meerpohl JJ, Schumacher M, von Elm E. Fate of clinical research studies after ethical approval--follow-up of study protocols until publication. PloS One 2014; 9 : e87184.

Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 2010; 303 : 2058-2064.

Chan A-W, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz HM, Ghersi D, van der Worp HB. Increasing value and reducing waste: addressing inaccessible research. The Lancet 2014; 383 : 257-266.

Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 2004a; 291 : 2457-2465.

Chan AW, Krleža-Jeric K, Schmid I, Altman DG. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. Canadian Medical Association Journal 2004b; 171 : 735-740.

da Costa BR, Beckett B, Diaz A, Resta NM, Johnston BC, Egger M, Jüni P, Armijo-Olivo S. Effect of standardized training on the reliability of the Cochrane risk of bias assessment tool: a prospective study. Systematic Reviews 2017; 6 : 44.

Dechartres A, Boutron I, Trinquart L, Charles P, Ravaud P. Single-center trials show larger treatment effects than multicenter trials: evidence from a meta-epidemiologic study. Annals of Internal Medicine 2011; 155 : 39-51.

Dechartres A, Trinquart L, Boutron I, Ravaud P. Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ 2013; 346 : f2304.

Dechartres A, Trinquart L, Faber T, Ravaud P. Empirical evaluation of which trial characteristics are associated with treatment effect estimates. Journal of Clinical Epidemiology 2016a; 77 : 24-37.

Dechartres A, Ravaud P, Atal I, Riveros C, Boutron I. Association between trial registration and treatment effect estimates: a meta-epidemiological study. BMC Medicine 2016b; 14 : 100.

Dechartres A, Trinquart L, Atal I, Moher D, Dickersin K, Boutron I, Perrodeau E, Altman DG, Ravaud P. Evolution of poor reporting and inadequate methods over time in 20 920 randomised controlled trials included in Cochrane reviews: research on research study. BMJ 2017; 357 : j2490.

Dechartres A, Atal I, Riveros C, Meerpohl J, Ravaud P. Association between publication characteristics and treatment effect estimates: A meta-epidemiologic study. Annals of Internal Medicine 2018.

Drazen JM, de Leeuw PW, Laine C, Mulrow C, DeAngelis CD, Frizelle FA, Godlee F, Haug C, Hébert PC, Horton R, Kotzin S, Marusic A, Reyes H, Rosenberg J, Sahni P, Van der Weyden MB, Zhaori G. Towards more uniform conflict disclosures: the updated ICMJE conflict of interest reporting form. BMJ 2010; 340 : c3239.

Duyx B, Urlings MJE, Swaen GMH, Bouter LM, Zeegers MP. Scientific citations favor positive results: a systematic review and meta-analysis. Journal of Clinical Epidemiology 2017; 88 : 92-101.

Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet 1991; 337 : 867-872.

Estellat C, Ravaud P. Lack of head-to-head trials and fair control arms: randomized controlled trials of biologic treatment for rheumatoid arthritis. Archives of Internal Medicine 2012; 172 : 237-244.

Eyding D, Lelgemann M, Grouven U, Harter M, Kromp M, Kaiser T, Kerekes MF, Gerken M, Wieseler B. Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ 2010; 341 : c4737.

Fanelli D, Costas R, Ioannidis JPA. Meta-assessment of bias in science. Proceedings of the National Academy of Sciences of the United States of America 2017; 114 : 3714-3719.

Franco A, Malhotra N, Simonovits G. Social science. Publication bias in the social sciences: unlocking the file drawer. Science 2014; 345 : 1502-1505.

Gates A, Vandermeer B, Hartling L. Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool. Journal of Clinical Epidemiology 2018; 96 : 54-62.

Grundy Q, Dunn AG, Bourgeois FT, Coiera E, Bero L. Prevalence of Disclosed Conflicts of Interest in Biomedical Research and Associations With Journal Impact Factors and Altmetric Scores. JAMA 2018; 319 : 408-409.

Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336 : 924-926.

Hakoum MB, Jouni N, Abou-Jaoude EA, Hasbani DJ, Abou-Jaoude EA, Lopes LC, Khaldieh M, Hammoud MZ, Al-Gibbawi M, Anouti S, Guyatt G, Akl EA. Characteristics of funding of clinical trials: cross-sectional survey and proposed guidance. BMJ Open 2017; 7 : e015997.

Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. Journal of Clinical Epidemiology 2013; 66 : 973-981.

Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savovic J, Schulz KF, Weeks L, Sterne JAC. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011; 343 : d5928.

Hopewell S, Clarke M, Stewart L, Tierney J. Time to publication for results of clinical trials. Cochrane Database of Systematic Reviews 2007; 2 : MR000011.

Hopewell S, Boutron I, Altman D, Ravaud P. Incorporation of assessments of risk of bias of primary studies in systematic reviews of randomised trials: a cross-sectional study. BMJ Open 2013; 3 : 8.

Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Onakpoya I, Heneghan CJ. Risk of bias in industry-funded oseltamivir trials: comparison of core reports versus full clinical study reports. BMJ Open 2014; 4 : e005253.

Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Medicine 2015; 13 : 282.

Jørgensen L, Paludan-Muller AS, Laursen DR, Savovic J, Boutron I, Sterne JAC, Higgins JPT, Hróbjartsson A. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Systematic Reviews 2016; 5 : 80.

Jüni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 1999; 282 : 1054-1060.

Jüni P, Altman DG, Egger M. Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ 2001; 323 : 42-46.

Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, Williamson PR. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 2010; 340 : c365.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, Bantoto B, Luo C, Shams I, Shahid H, Chang Y, Sun G, Mbuagbaw L, Samaan Z, Levine MAH, Adachi JD, Thabane L. A systematic review of comparisons between protocols or registrations and full reports in primary biomedical research. BMC Medical Research Methodology 2018; 18 : 9.

Lo B, Field MJ, Institute of Medicine (US) Committee on Conflict of Interest in Medical Research Education and Practice. Conflict of Interest in Medical Research, Education, and Practice . Washington, D.C.: National Academies Press (US); 2009.

Lundh A, Lexchin J, Mintzes B, Schroll JB, Bero L. Industry sponsorship and research outcome. Cochrane Database of Systematic Reviews 2017; 2 : MR000033.

Mann H, Djulbegovic B. Comparator bias: why comparisons must address genuine uncertainties. Journal of the Royal Society of Medicine 2013; 106 : 30-33.

Marret E, Elia N, Dahl JB, McQuay HJ, Møiniche S, Moore RA, Straube S, Tramèr MR. Susceptibility to fraud in systematic reviews: lessons from the Reuben case. Anesthesiology 2009; 111 : 1279-1289.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 2016; 23 : 193-201.

McGauran N, Wieseler B, Kreis J, Schuler YB, Kolsch H, Kaiser T. Reporting bias in medical research - a narrative review. Trials 2010; 11 : 37.

Millard LA, Flach PA, Higgins JPT. Machine learning to assist risk-of-bias assessments in systematic reviews. International Journal of Epidemiology 2016; 45 : 266-277.

Moher D, Shamseer L, Cobey KD, Lalu MM, Galipeau J, Avey MT, Ahmadzai N, Alabousi M, Barbeau P, Beck A, Daniel R, Frank R, Ghannad M, Hamel C, Hersi M, Hutton B, Isupov I, McGrath TA, McInnes MDF, Page MJ, Pratt M, Pussegoda K, Shea B, Srivastava A, Stevens A, Thavorn K, van Katwyk S, Ward R, Wolfe D, Yazdi F, Yu AM, Ziai H. Stop this waste of people, animals and money. Nature 2017; 549 : 23-25.

Montedori A, Bonacini MI, Casazza G, Luchetta ML, Duca P, Cozzolino F, Abraha I. Modified versus standard intention-to-treat reporting: are there differences in methodological quality, sponsorship, and findings in randomized trials? A cross-sectional study. Trials 2011; 12 : 58.

Morgan AJ, Ross A, Reavley NJ. Systematic review and meta-analysis of Mental Health First Aid training: Effects on knowledge, stigma, and helping behaviour. PloS One 2018; 13 : e0197102.

Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M, Mierzwinski-Urban M, Clifford T, Hutton B, Rabb D. The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. International Journal of Technology Assessment in Health Care 2012; 28 : 138-144.

Norris SL, Burda BU, Holmer HK, Ogden LA, Fu R, Bero L, Schunemann H, Deyo R. Author's specialty and conflicts of interest contribute to conflicting guidelines for screening mammography. Journal of Clinical Epidemiology 2012; 65 : 725-733.

Odutayo A, Emdin CA, Hsiao AJ, Shakir M, Copsey B, Dutton S, Chiocchia V, Schlussel M, Dutton P, Roberts C, Altman DG, Hopewell S. Association between trial registration and positive study findings: cross sectional study (Epidemiological Study of Randomized Trials-ESORT). BMJ 2017; 356 : j917.

Page MJ, Higgins JPT. Rethinking the assessment of risk of bias due to selective reporting: a cross-sectional study. Systematic Reviews 2016; 5 : 108.

Page MJ, Higgins JPT, Clayton G, Sterne JAC, Hróbjartsson A, Savović J. Empirical evidence of study design biases in randomized trials: systematic review of meta-epidemiological studies. PloS One 2016; 11 : 7.

Page MJ, McKenzie JE, Higgins JPT. Tools for assessing risk of reporting biases in studies and syntheses of studies: a systematic review. BMJ Open 2018; 8 : e019703.

Patel SV, Yu D, Elsolh B, Goldacre BM, Nash GM. Assessment of conflicts of interest in robotic surgical studies: validating author's declarations with the open payments database. Annals of Surgery 2018; 268 : 86-92.

Polanin JR, Tanner-Smith EE, Hennessy EA. Estimating the difference between published and unpublished effect sizes: a meta-review. Review of Educational Research 2016; 86 : 207-236.

Rasmussen K, Schroll J, Gøtzsche PC, Lundh A. Under-reporting of conflicts of interest among trialists: a cross-sectional study. Journal of the Royal Society of Medicine 2015; 108 : 101-107.

Riechelmann RP, Wang L, O'Carroll A, Krzyzanowska MK. Disclosure of conflicts of interest by authors of clinical trials and editorials in oncology. Journal of Clinical Oncology 2007; 25 : 4642-4647.

Rising K, Bacchetti P, Bero L. Reporting bias in drug trials submitted to the Food and Drug Administration: review of publication and presentation. PLoS Medicine 2008; 5 : e217.

Riveros C, Dechartres A, Perrodeau E, Haneef R, Boutron I, Ravaud P. Timing and completeness of trial results posted at ClinicalTrials.gov and published in journals. PLoS Medicine 2013; 10 : e1001566.

Rothenstein JM, Tomlinson G, Tannock IF, Detsky AS. Company stock prices before and after public announcements related to oncology drugs. Journal of the National Cancer Institute 2011; 103 : 1507-1512.

Safer DJ. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. Journal of Nervous and Mental Disease 2002; 190 : 583-592.

Saini P, Loke YK, Gamble C, Altman DG, Williamson PR, Kirkham JJ. Selective reporting bias of harm outcomes within studies: findings from a cohort of systematic reviews. BMJ 2014; 349 : g6501.

Sampson M, Barrowman NJ, Moher D, Klassen TP, Pham B, Platt R, St John PD, Viola R, Raina P. Should meta-analysts search Embase in addition to Medline? Journal of Clinical Epidemiology 2003; 56 : 943-955.

Savović J, Jones HE, Altman DG, Harris RJ, Jüni P, Pildal J, Als-Nielsen B, Balk EM, Gluud C, Gluud LL, Ioannidis JPA, Schulz KF, Beynon R, Welton NJ, Wood L, Moher D, Deeks JJ, Sterne JAC. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Annals of Internal Medicine 2012; 157 : 429-438.

Scherer RW, Meerpohl JJ, Pfeifer N, Schmucker C, Schwarzer G, von Elm E. Full publication of results initially presented in abstracts. Cochrane Database of Systematic Reviews 2018; 11 : MR000005.

Schmid CH. Outcome Reporting Bias: A Pervasive Problem in Published Meta-analyses. American Journal of Kidney Diseases 2016; 69 : 172-174.

Schmucker C, Schell LK, Portalupi S, Oeller P, Cabrera L, Bassler D, Schwarzer G, Scherer RW, Antes G, von Elm E, Meerpohl JJ. Extent of non-publication in cohorts of studies approved by research ethics committees or included in trial registries. PloS One 2014; 9 : e114023.

Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995; 273 : 408-412.

Shawwa K, Kallas R, Koujanian S, Agarwal A, Neumann I, Alexander P, Tikkinen KA, Guyatt G, Akl EA. Requirements of Clinical Journals for Authors’ Disclosure of Financial and Non-Financial Conflicts of Interest: A Cross Sectional Study. PloS One 2016; 11 : e0152301.

Sterne JAC. Why the Cochrane risk of bias tool should not include funding source as a standard item [editorial]. Cochrane Database of Systematic Reviews 2013; 12 : ED000076.

Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315 : 635-640.

Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. Journal of the Royal Statistical Society Series A, (Statistics in Society) 2009; 172 : 21-47.

Urrutia G, Ballesteros M, Djulbegovic B, Gich I, Roque M, Bonfill X. Cancer randomized trials showed that dissemination bias is still a problem to be solved. Journal of Clinical Epidemiology 2016; 77 : 84-90.

Vedula SS, Li T, Dickersin K. Differences in reporting of analyses in internal company documents versus published trial reports: comparisons in industry-sponsored trials in off-label uses of gabapentin. PLoS Medicine 2013; 10 : e1001378.

Viswanathan M, Carey TS, Belinson SE, Berliner E, Chang SM, Graham E, Guise JM, Ip S, Maglione MA, McCrory DC, McPheeters M, Newberry SJ, Sista P, White CM. A proposed approach may help systematic reviews retain needed expertise while minimizing bias from nonfinancial conflicts of interest. Journal of Clinical Epidemiology 2014; 67 : 1229-1238.

Welton NJ, Ades AE, Carlin JB, Altman DG, Sterne JAC. Models for potentially biased evidence in meta-analysis using empirically based priors. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 119-136.

Wieland LS, Berman BM, Altman DG, Barth J, Bouter LM, D'Adamo CR, Linde K, Moher D, Mullins CD, Treweek S, Tunis S, van der Windt DA, Zwarenstein M, Witt C. Rating of Included Trials on the Efficacy-Effectiveness Spectrum: development of a new tool for systematic reviews. Journal of Clinical Epidemiology 2017; 84 .

Wieseler B, Kerekes MF, Vervoelgyi V, McGauran N, Kaiser T. Impact of document type on reporting quality of clinical drug trials: a comparison of registry reports, clinical study reports, and journal publications. BMJ 2012; 344 : d8141.

Wood L, Egger M, Gluud LL, Schulz K, Jüni P, Altman DG, Gluud C, Martin RM, Wood AJG, Sterne JAC. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ 2008; 336 : 601-605.

Zarin DA, Tse T, Williams RJ, Carr S. Trial Reporting in ClinicalTrials.gov - The Final Rule. New England Journal of Medicine 2016; 375 : 1998-2004.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Systematic review of...

Systematic review of publication bias in studies on publication bias

  • Related content
  • Peer review
  • Hans-Hermann Dubben , senior scientist 1 ( dubben{at}uke.uni-hamburg.de ) ,
  • Hans-Peter Beck-Bornholdt , professor 1
  • 1 Institut für Allgemeinmedizin, Universitätsklinikum Hamburg-Eppendorf, Martinistrasse 52, 20246 Hamburg, Germany
  • Correspondence to: H-H Dubben
  • Accepted 19 April 2005

Introduction

Publication bias is a well known phenomenon in clinical literature, 1 2 in which positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Conclusions exclusively based on published studies, therefore, can be misleading. 3 Selective underreporting of research might be more widespread and more likely to have adverse consequences for patients than publication of deliberately falsified data. 1 We investigated whether there is preferential publication of positive papers on publication bias.

Methods and results

We identified studies that assessed the impact of publication bias in Medline (January 1993 to October 2003) using the search terms “publication bias”, “citation bias”, “language bias”, location bias”, “reference bias”, or “multiple publication bias”. We also searched the references of a Cochrane review on publication bias. 4 We restricted the search to publications that primarily investigated publication bias and whose acceptance therefore might have depended on whether they had found publication bias or not. We retrieved 265 references. Of these, we chose 148 for full examination. Their bibliographies yielded 26 additional papers. We excluded 148 studies because they gave no original data. All remaining 26 were included in the analysis (see bmj.com ).

We used a funnel plot to evaluate reports for publication bias. In a funnel plot the effect size is plotted versus a measure of its precision, such as sample size. With increasing sample size, random variations of the effect are smaller. Thus, data from several studies are expected to be symmetrically distributed in a funnel shaped area of the plot if no publication bias is present. Conversely, an asymmetrical funnel plot indicates a biased study sample. 5

We plotted effect size versus sample size ( figure ). The effect is the ratio of the odds of a positive study being published to the odds of a negative study. We transformed reported relative risks into odds ratios. We did not transform hazard ratios. The vertical line indicates no publication bias; 23 of 26 studies report preferential publication of positive results.

Funnel plot of 26 reports on publication bias, with reported effect as dependent variable

  • Download figure
  • Open in new tab
  • Download powerpoint

The median reported odds ratio is 2.3, indicating preferred publication of positive results. The sloping line results from a regression with the reported effect as dependent variable. Its slope does not differ significantly from zero (P = 0.13)—that is, the asymmetry of the data is not statistically significant.

We found no evidence of publication bias in reports on publication bias. But, with just 26 studies, the power to detect asymmetry in a funnel plot was low. 5 Furthermore, the definition of the terms “positive” and “significant” is non-uniform and sometimes rather arbitrary in the studies reinvestigated here. For example, Dickersin (see bmj.com ) used the definition “studies reported to have statistically significant findings were combined with those reported to have findings of great importance. Together they are referred to as ‘significant’ and are contrasted with the remainder, which are referred to as ‘not significant.’”

Most data on publication bias were recorded retro-spectively and lack prospective registration, as does the present analysis. Prospective and registered studies on publication bias are needed.

What is already known on this topic

Studies estimated to have publication bias seem more likely to be published at all, earlier, and in journals with higher impact factors; as a consequence effects are often overestimated

What this study adds

These findings do not indicate publication bias in reports on publication bias

This article was posted on bmj.com on 3 June 2005: http://bmj.com/cgi/doi/10.1136/bmj.38478.497164.F7

Contributors Both authors developed the design of this review, did the literature search, and analysed and interpreted the trials. The article was jointly written by both authors. H-HD is guarantor.

Competing interests None declared.

Funding Unna Stiftung, Düsseldorf, Germany.

Ethical approval Not needed.

  • Easterbrook PJ ,
  • Berlin JA ,
  • Gopalan R ,
  • Matthews DR
  • Scherer RW ,
  • Langenberg P
  • Schneider M ,

literature reviews bias

LSE - Small Logo

  • About the LSE Impact Blog
  • Comments Policy
  • Popular Posts
  • Recent Posts
  • Subscribe to the Impact Blog
  • Write for us
  • LSE comment

Neal Haddaway

October 19th, 2020, 8 common problems with literature reviews and how to fix them.

3 comments | 319 shares

Estimated reading time: 5 minutes

Literature reviews are an integral part of the process and communication of scientific research. Whilst systematic reviews have become regarded as the highest standard of evidence synthesis, many literature reviews fall short of these standards and may end up presenting biased or incorrect conclusions. In this post, Neal Haddaway highlights 8 common problems with literature review methods, provides examples for each and provides practical solutions for ways to mitigate them.

Enjoying this blogpost? 📨 Sign up to our  mailing list  and receive all the latest LSE Impact Blog news direct to your inbox.

Researchers regularly review the literature – it’s an integral part of day-to-day research: finding relevant research, reading and digesting the main findings, summarising across papers, and making conclusions about the evidence base as a whole. However, there is a fundamental difference between brief, narrative approaches to summarising a selection of studies and attempting to reliably and comprehensively summarise an evidence base to support decision-making in policy and practice.

So-called ‘evidence-informed decision-making’ (EIDM) relies on rigorous systematic approaches to synthesising the evidence. Systematic review has become the highest standard of evidence synthesis and is well established in the pipeline from research to practice in the field of health . Systematic reviews must include a suite of specifically designed methods for the conduct and reporting of all synthesis activities (planning, searching, screening, appraising, extracting data, qualitative/quantitative/mixed methods synthesis, writing; e.g. see the Cochrane Handbook ). The method has been widely adapted into other fields, including environment (the Collaboration for Environmental Evidence ) and social policy (the Campbell Collaboration ).

literature reviews bias

Despite the growing interest in systematic reviews, traditional approaches to reviewing the literature continue to persist in contemporary publications across disciplines. These reviews, some of which are incorrectly referred to as ‘systematic’ reviews, may be susceptible to bias and as a result, may end up providing incorrect conclusions. This is of particular concern when reviews address key policy- and practice- relevant questions, such as the ongoing COVID-19 pandemic or climate change.

These limitations with traditional literature review approaches could be improved relatively easily with a few key procedures; some of them not prohibitively costly in terms of skill, time or resources.

In our recent paper in Nature Ecology and Evolution , we highlight 8 common problems with traditional literature review methods, provide examples for each from the field of environmental management and ecology, and provide practical solutions for ways to mitigate them.

Problem Solution
Lack of relevance – limited stakeholder engagement can produce a review that is of limited practical use to decision-makers Stakeholders can be identified, mapped and contacted for feedback and inclusion without the need for extensive budgets – check out best-practice guidance
Mission creep – reviews that don’t publish their methods in an a priori protocol can suffer from shifting goals and inclusion criteria Carefully design and publish an a priori protocol that outlines planned methods for searching, screening, data extraction, critical appraisal and synthesis in detail. Make use of existing organisations to support you (e.g. the Collaboration for Environmental Evidence).
A lack of transparency/replicability in the review methods may mean that the review cannot be replicated – a central tenet of the scientific method! Be explicit, and make use of high-quality guidance and standards for review conduct (e.g. CEE Guidance) and reporting (PRISMA or ROSES)
Selection bias (where included studies are not representative of the evidence base) and a lack of comprehensiveness (an inappropriate search method) can mean that reviews end up with the wrong evidence for the question at hand Carefully design a search strategy with an info specialist; trial the search strategy (against a benchmark list); use multiple bibliographic databases/languages/sources of grey literature; publish search methods in an a priori protocol for peer-review
The exclusion of grey literature and failure to test for evidence of publication bias can result in incorrect or misleading conclusions Include attempts to find grey literature, including both ‘file-drawer’ (unpublished academic) research and organisational reports. Test for possible evidence of publication bias.
Traditional reviews often lack appropriate critical appraisal of included study validity, treating all evidence as equally valid – we know some research is more valid and we need to account for this in the synthesis. Carefully plan and trial a critical appraisal tool before starting the process in full, learning from existing robust critical appraisal tools.
Inappropriate synthesis (e.g. using vote-counting and inappropriate statistics) can negate all of the preceding systematic effort. Vote-counting (tallying studies based on their statistical significance) ignores study validity and magnitude of effect sizes. Select the synthesis method carefully based on the data analysed. Vote-counting should never be used instead of meta-analysis. Formal methods for narrative synthesis should be used to summarise and describe the evidence base.

There is a lack of awareness and appreciation of the methods needed to ensure systematic reviews are as free from bias and as reliable as possible: demonstrated by recent, flawed, high-profile reviews. We call on review authors to conduct more rigorous reviews, on editors and peer-reviewers to gate-keep more strictly, and the community of methodologists to better support the broader research community. Only by working together can we build and maintain a strong system of rigorous, evidence-informed decision-making in conservation and environmental management.

Note: This article gives the views of the authors, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our  comments policy  if you have any concerns on posting a comment below

Image credit:  Jaeyoung Geoffrey Kang  via unsplash

Print Friendly, PDF & Email

About the author

literature reviews bias

Neal Haddaway is a Senior Research Fellow at the Stockholm Environment Institute, a Humboldt Research Fellow at the Mercator Research Institute on Global Commons and Climate Change, and a Research Associate at the Africa Centre for Evidence. He researches evidence synthesis methodology and conducts systematic reviews and maps in the field of sustainability and environmental science. His main research interests focus on improving the transparency, efficiency and reliability of evidence synthesis as a methodology and supporting evidence synthesis in resource constrained contexts. He co-founded and coordinates the Evidence Synthesis Hackathon (www.eshackathon.org) and is the leader of the Collaboration for Environmental Evidence centre at SEI. @nealhaddaway

Why is mission creep a problem and not a legitimate response to an unexpected finding in the literature? Surely the crucial points are that the review’s scope is stated clearly and implemented rigorously, not when the scope was finalised.

  • Pingback: Quick, but not dirty – Can rapid evidence reviews reliably inform policy? | Impact of Social Sciences

#9. Most of them are terribly boring. Which is why I teach students how to make them engaging…and useful.

Leave a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Related Posts

literature reviews bias

“But I’m not ready!” Common barriers to writing and how to overcome them

November 16th, 2020.

literature reviews bias

“Remember a condition of academic writing is that we expose ourselves to critique” – 15 steps to revising journal articles

January 18th, 2017.

literature reviews bias

A simple guide to ethical co-authorship

March 29th, 2021.

literature reviews bias

How common is academic plagiarism?

February 8th, 2024.

literature reviews bias

Visit our sister blog LSE Review of Books

DistillerSR Logo

Types of Bias in Systematic Reviews

literature reviews bias

Automate every stage of your literature review to produce evidence-based research faster and more accurately.

If you’re interested in understanding the automated process of how to do a systematic review , you can check out our article above.

A bias can be introduced in a study at any stage of the process – from formulating the research question, establishing the eligibility criteria for inclusion and exclusion of primary studies, reviewing collected resources, to choosing which findings to publish. The hallmark of a systematic review is a reduced risk of bias. However, they are not fully immune to bias. The strengths and weaknesses of a systematic review depend solely on how the reviewer addresses the introduced errors. Let us look at the type of biases that can creep into a review in each of its stages.

Bias In Study Design

This kind of bias arises in the first step of formulating the review design and protocol. It could introduce a bias in the way the author frames the research question due to insufficient knowledge in the field of research. The author could, for example, decide to include only males in the study, assuming that no previous studies have been conducted on females. Other errors may arise due to an inefficient search strategy. For example, if reviewers have assigned arbitrary search limiters such as geographical regions or year of publication. Imposing such limiters will undoubtedly produce a biased sample set since it fails to collect all the available evidence.

Selection Bias

This kind of bias is introduced while collecting the primary resources for the study. If the collection of resources is not exhaustive, it could lead to over or underestimation of the results. The collection of resources for a systematic review must include all available resources, including grey literature. Potential personal bias can also be introduced by the reviewers in charge of selecting the primary studies. Key concepts regarding the eligibility criteria of studies included and excluded in the review must be clearly stated to avoid this kind of bias. Most of the known errors in systematic reviews arise in the selection and publication stages.

Publication Bias

An author or publisher may not publish a study whose results are negative or are not statistically significant. This is called publication bias. The outcomes may not be of relevance to the publisher but may have serious clinical implications.

Selective Outcome Reporting

Learn more about distillersr.

(Article continues below)

literature reviews bias

Lack Of A Risk Of Bias Assessment

When primary resources are picked up to be included in a study, risk of bias assessment for each of these primary studies has to be done. Failure to critically appraise each of the primary studies by a reviewer can result in the accumulation of bias in the final outcomes of the systematic review.

Conclusion Bias

It relates to the way the author decides to relay the conclusions derived from the systematic review. Again, this goes back to careful consideration of the research question. The decision on representing the outcomes qualitatively or quantitatively is crucial to how the outcome is utilized in the future. But if you’re wondering, are systematic reviews quantitative or qualitative , you can learn more on the topic from our article linked above.

Final Takeaway

3 reasons to connect.

literature reviews bias

  • Mayo Clinic Libraries
  • Evidence Synthesis Guide
  • Minimize Bias

Evidence Synthesis Guide : Minimize Bias

  • Review Types & Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Risk of Bias by Study Design
  • GRADE & GRADE-CERQual
  • Data Extraction Tools
  • Synthesis & Meta-Analysis
  • Publishing your Review

Minimizing Bias

literature reviews bias

Multiple types of bias may impact health evidence.  The Cochrane Handbook for Systematic Reviews of Interventions ( Table 7.2.a ) 1 provides definitions of non-reporting biases that can be minimized by identifying all relevant literature on a research topic.

Note :  *Prior to assessing Risk of Bias, consider use of the Research Integrity Assessment (RIA) tool for randomized controlled trials to validate the authenticity of studies. See: Weibel S, Popp M, Reis S, Skoetz N, Garner P, Sydenham E. Identifying and managing problematic trials: A research integrity assessment tool for randomized controlled trials in evidence synthesis. Res Synth Methods. 2023 May;14(3):357-369. doi: 10.1002/jrsm.1599. Epub 2022 Sep 15. PMID: 36054583

Bias in Locating Studies

  • Publication Bias
  • Time-Lag Bias
  • Language Bias
  • Citation Bias
  • Multiple Publication Bias
  • Location Bias
  • Non-Reporting Bias

The publication or non-publication of research findings, depending on the nature and direction of the results, i.e., “the selective publication of manuscripts based on the magnitude or direction of the study results.” 2

The rapid or delayed publication of research findings, depending on the nature and direction of the results, i.e., “[t]ime-lag bias occurs when the results of negative trials take substantially longer to publish than positive trials.” 3

The publication of research findings in a particular language, depending on the nature and direction of results, i.e., language bias “introduces the risk of ignoring key data… as well as missing important cultural contexts, which may limit the review’s findings and usefulness.” 4

The citation or non-citation of research findings, depending on the nature and direction of the results. Citation bias occurs during the process of citation searching for related publications to include in the review. Bias may be introduced by “selective inclusion of statistically significant studies with effect sizes similar to other published studies retrieved from database searching.” 5

The multiple or singular publication of research findings. When “studies are published in more than one journal to maximize readership and impact of study findings,” they may inadvertently be included in the systematic review more than once. 6

The publication of research findings in journals with different ease of access or levels of indexing in standard databases, depending on the nature and direction of results

The selective reporting of some outcomes or analyses, but not others, depending on the nature and direction of the results, i.e., “Selective reporting bias…, the incomplete publication of outcomes measured or of analyses performed in a study, may lead to the over- or underestimation of treatment effects of harms.” 7

References & Recommended Reading

1.       Boutron I, Page MJ, Higgins JP, Altman DG, Lundh A, Hróbjartsson A.  Considering bias and conflicts of interest among the included studies.  In: Higgins JPT, Thomas J, Chandler J, et al., eds.  Cochrane Handbook for Systematic Reviews of Interventions . version 6.2: Cochrane; 2021.

2.       Montori VM, Smieja M, Guyatt GH.  Publication bias: a brief review for clinicians.   Mayo Clinic proceedings.  2000;75(12):1284-1288.

3.       Reyes MM, Panza KE, Martin A, Bloch MH.  Time-lag bias in trials of pediatric antidepressants: a systematic review and meta-analysis.   Journal of the American Academy of Child and Adolescent Psychiatry.  2011;50(1):63-72.

4.       Stern C, Kleijnen J.  Language bias in systematic reviews: you only get out what you put in.  JBI Evidence Synthesis.  2020;18(9).

5.       Vassar M, Johnson AL, Sharp A, Wayant C.  Citation bias in otolaryngology systematic reviews.   Journal of the Medical Library Association : JMLA.  2021;109(1):62-67.

6.       Fairfield CJ, Harrison EM, Wigmore SJ.  Duplicate publication bias weakens the validity of meta-analysis of immunosuppression after transplantation.  World journal of gastroenterology.  2017;23(39):7198-7200.

7.       Reid EK, Tejani AM, Huan LN, et al.  Managing the incidence of selective reporting bias: a survey of Cochrane review groups.   Systematic reviews.  2015;4:85.

  • << Previous: Risk of Bias
  • Next: Risk of Bias by Study Design >>
  • Last Updated: Aug 30, 2024 2:14 PM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Peer Review Bias: A Critical Review

Affiliations.

  • 1 Digestive Center for Diagnosis and Treatment, Damascus, Syrian Arab Republic.
  • 2 Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN.
  • 3 Division of Preventive Medicine, Mayo Clinic, Rochester, MN. Electronic address: [email protected].
  • PMID: 30797567
  • DOI: 10.1016/j.mayocp.2018.09.004

Various types of bias and confounding have been described in the biomedical literature that can affect a study before, during, or after the intervention has been delivered. The peer review process can also introduce bias. A compelling ethical and moral rationale necessitates improving the peer review process. A double-blind peer review system is supported on equipoise and fair-play principles. Triple- and quadruple-blind systems have also been described but are not commonly used. The open peer review system introduces "Skin in the Game" heuristic principles for both authors and reviewers and has a small favorable effect on the quality of published reports. In this exposition, we present, on the basis of a comprehensive literature search of PubMed from its inception until October 20, 2017, various possible mechanisms by which the peer review process can distort research results, and we discuss the evidence supporting different strategies that may mitigate this bias. It is time to improve the quality, transparency, and accountability of the peer review system.

Copyright © 2018 Mayo Foundation for Medical Education and Research. Published by Elsevier Inc. All rights reserved.

PubMed Disclaimer

Similar articles

  • Ensuring the Quality, Fairness, and Integrity of Journal Peer Review: A Possible Role of Editors. Resnik DB, Elmore SA. Resnik DB, et al. Sci Eng Ethics. 2016 Feb;22(1):169-88. doi: 10.1007/s11948-015-9625-5. Epub 2015 Jan 30. Sci Eng Ethics. 2016. PMID: 25633924
  • A systematic review highlights a knowledge gap regarding the effectiveness of health-related training programs in journalology. Galipeau J, Moher D, Campbell C, Hendry P, Cameron DW, Palepu A, Hébert PC. Galipeau J, et al. J Clin Epidemiol. 2015 Mar;68(3):257-65. doi: 10.1016/j.jclinepi.2014.09.024. Epub 2014 Nov 7. J Clin Epidemiol. 2015. PMID: 25510373 Review.
  • Systematic review of the effectiveness of training programs in writing for scholarly publication, journal editing, and manuscript peer review (protocol). Galipeau J, Moher D, Skidmore B, Campbell C, Hendry P, Cameron DW, Hébert PC, Palepu A. Galipeau J, et al. Syst Rev. 2013 Jun 17;2:41. doi: 10.1186/2046-4053-2-41. Syst Rev. 2013. PMID: 23773340 Free PMC article.
  • Peer review for biomedical publications: we can improve the system. Stahel PF, Moore EE. Stahel PF, et al. BMC Med. 2014 Sep 26;12:179. doi: 10.1186/s12916-014-0179-1. BMC Med. 2014. PMID: 25270270 Free PMC article.
  • Peer review. Twaij H, Oussedik S, Hoffmeyer P. Twaij H, et al. Bone Joint J. 2014 Apr;96-B(4):436-41. doi: 10.1302/0301-620X.96B4.33041. Bone Joint J. 2014. PMID: 24692607 Review.
  • Is something rotten in the state of Denmark? Cross-national evidence for widespread involvement but not systematic use of questionable research practices across all fields of research. Schneider JW, Allum N, Andersen JP, Petersen MB, Madsen EB, Mejlgaard N, Zachariae R. Schneider JW, et al. PLoS One. 2024 Aug 12;19(8):e0304342. doi: 10.1371/journal.pone.0304342. eCollection 2024. PLoS One. 2024. PMID: 39133711 Free PMC article.
  • Paying reviewers and regulating the number of papers may help fix the peer-review process. L Seghier M. L Seghier M. F1000Res. 2024 Aug 27;13:439. doi: 10.12688/f1000research.148985.1. eCollection 2024. F1000Res. 2024. PMID: 38962691 Free PMC article.
  • Use of artificial intelligence and the future of peer review. Bauchner H, Rivara FP. Bauchner H, et al. Health Aff Sch. 2024 May 3;2(5):qxae058. doi: 10.1093/haschl/qxae058. eCollection 2024 May. Health Aff Sch. 2024. PMID: 38757006 Free PMC article.
  • Elements of successful NIH grant applications. Araj H, Worth L Jr, Yeung DT. Araj H, et al. Proc Natl Acad Sci U S A. 2024 Apr 9;121(15):e2315735121. doi: 10.1073/pnas.2315735121. Epub 2024 Apr 1. Proc Natl Acad Sci U S A. 2024. PMID: 38557195 Free PMC article.
  • Peer reviewers' willingness to review, their recommendations and quality of reviews after the Finnish Medical Journal switched from single-blind to double-blind peer review. Parmanne P, Laajava J, Järvinen N, Harju T, Marttunen M, Saloheimo P. Parmanne P, et al. Res Integr Peer Rev. 2023 Oct 24;8(1):14. doi: 10.1186/s41073-023-00140-6. Res Integr Peer Rev. 2023. PMID: 37876004 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Elsevier Science
  • Ovid Technologies, Inc.
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

literature reviews bias

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts
  • Submit your manuscript

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Language bias in systematic reviews: you only get out what you put in

Stern, Cindy 1 ; Kleijnen, Jos 2

1 JBI, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia

2 Department of Family Medicine, School for Public Health and Primary Care, Maastricht, The Netherlands

Limiting study inclusion on the basis of language of publication is a common practice in systematic reviews. Neimann Rasmussen and Montgomery cite lack of time, insufficient funding, and unavailability of language resources (e.g. professional translators) as the most common reasons for not including languages other than English (LOTE) in a systematic review. 1 Thirty-eight percent (95% confidence interval, 34-42%) from a random sample of 516 reviews (out of a total of 18,140 systematic reviews published in 2016) reported language restrictions ( source: www.ksrevidence.com ). While often the most feasible option, it introduces the risk of ignoring key data, introducing bias (referred to as language bias), as well as missing important cultural contexts, which may limit the review's findings and usefulness. 2-4 Cultural context may simply be tied to geography, or in some instances, fundamentally entwined with the review question: for example, conducting a review on Chinese herbal remedies that does not include Chinese-language studies, nor searches Chinese databases or resources; or a review that focuses on health promotion strategies for indigenous populations in Canada that does not consider French-language studies. Such examples would seemingly demand the inclusion of LOTE.

Currently, JBI methodology does not require authors to include papers in LOTE but recommends that, where a review team has capacity, the search should ideally attempt to identify studies and papers published in any language, and may expand the search to include databases and resources that index LOTE. 2 Further, authors are advised to outline any language restrictions with appropriate justifications, and consider the potential consequences of language restriction in their discussion, 1 which aligns with the PRISMA Statement (Item 6: Eligibility criteria, and Item 25: Limitations of the review process). 5 The Campbell Collaboration takes a similar stance and warns against the risk of language bias, recommending that “ideally no language restrictions should be included in the search strategy,” 6 (p.28) while Cochrane advocates that searches should not be restricted by language. 7

Despite this overarching recommendation, across the diverse range of synthesis methodology and methods espoused by JBI, there are other important considerations for LOTE. If we consider the type of review question and thus the methodological design required, there may be different implications for qualitative reviews and mixed methods reviews due to the nature of their data and the potential issues in their translation. 8 Scoping reviews may also not fall under this remit due to their very nature; therefore, it is clear that we cannot assume a one-size-fits-all approach for the inclusion of LOTE.

Many protocols and reviews submitted to JBI Evidence Synthesis limit the search parameters to English only, with authors overwhelmingly stating this is due to the limited resources available. The infrequent exception to this arises from author teams in Europe, South America, and Asia who include at least one additional LOTE (largely based on the languages spoken by the author team) and search databases or resources in LOTE. Of the 17 reviews published in JBI Evidence Synthesis in the first half of 2020, seven (41%) did not limit the language to English. Pleasingly, in this issue, half of the protocols published also do not limit the language to English, with the languages chosen to represent those of the author team and/or those relevant to the cultural context (see examples 9,10 ).

A key message that JBI highlights in its global systematic review training program 11 is that an attempt should be made to locate all evidence (published and unpublished) that is relevant to a review question; however, by allowing reviews that limit by language, JBI systematic reviews are essentially overlooking this very feature that they should be promoting. JBI has reconsidered its stance on the inclusion of LOTE in JBI systematic reviews and is currently deliberating on how best to implement this; for example, standards regarding databases and other resources in LOTE (e.g. which to include as well as training and access), the use of Google Translate and other translation tools to screen/assess suitability, recruitment of collaborators to assist with LOTE, and acknowledgment versus authorship of collaborators.

There are also multiple ways to deal with difficulties in reading and managing LOTE studies in a systematic review. Rather than expensive full translations of published articles, which are often not necessary, a more economical solution may be for a reviewer to work closely with a person who can read the language and facilitate identification and extraction of the required information. In addition, studies for which nobody can be found to help with translation could be listed in the review with a remark that the reviewers could not process the study. This would at least enable the readers to make a judgment about the possible bias involved.

While it is clear this will impact authors, we must move forward to ensure we capture a truly global picture of the evidence. Should we expect authors to include every piece of research ever written that fits their review's inclusion criteria? It simply may not be feasible; however, by limiting a review to one language from the outset, we are violating the very essence of what a systematic review is and its purpose in assisting in making informed decisions from the best available evidence.

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

Effectiveness of interventions to prevent pre-frailty and frailty progression..., nurses’ experience in providing care at shelters following natural hazards and..., effectiveness and family experiences of interventions promoting partnerships..., updated methodological guidance for the conduct of scoping reviews, experiences of patients with lysosomal storage disorders treated with enzyme....

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Writing a Literature Review

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

A literature review is a document or section of a document that collects key sources on a topic and discusses those sources in conversation with each other (also called synthesis ). The lit review is an important genre in many disciplines, not just literature (i.e., the study of works of literature such as novels and plays). When we say “literature review” or refer to “the literature,” we are talking about the research ( scholarship ) in a given field. You will often see the terms “the research,” “the scholarship,” and “the literature” used mostly interchangeably.

Where, when, and why would I write a lit review?

There are a number of different situations where you might write a literature review, each with slightly different expectations; different disciplines, too, have field-specific expectations for what a literature review is and does. For instance, in the humanities, authors might include more overt argumentation and interpretation of source material in their literature reviews, whereas in the sciences, authors are more likely to report study designs and results in their literature reviews; these differences reflect these disciplines’ purposes and conventions in scholarship. You should always look at examples from your own discipline and talk to professors or mentors in your field to be sure you understand your discipline’s conventions, for literature reviews as well as for any other genre.

A literature review can be a part of a research paper or scholarly article, usually falling after the introduction and before the research methods sections. In these cases, the lit review just needs to cover scholarship that is important to the issue you are writing about; sometimes it will also cover key sources that informed your research methodology.

Lit reviews can also be standalone pieces, either as assignments in a class or as publications. In a class, a lit review may be assigned to help students familiarize themselves with a topic and with scholarship in their field, get an idea of the other researchers working on the topic they’re interested in, find gaps in existing research in order to propose new projects, and/or develop a theoretical framework and methodology for later research. As a publication, a lit review usually is meant to help make other scholars’ lives easier by collecting and summarizing, synthesizing, and analyzing existing research on a topic. This can be especially helpful for students or scholars getting into a new research area, or for directing an entire community of scholars toward questions that have not yet been answered.

What are the parts of a lit review?

Most lit reviews use a basic introduction-body-conclusion structure; if your lit review is part of a larger paper, the introduction and conclusion pieces may be just a few sentences while you focus most of your attention on the body. If your lit review is a standalone piece, the introduction and conclusion take up more space and give you a place to discuss your goals, research methods, and conclusions separately from where you discuss the literature itself.

Introduction:

  • An introductory paragraph that explains what your working topic and thesis is
  • A forecast of key topics or texts that will appear in the review
  • Potentially, a description of how you found sources and how you analyzed them for inclusion and discussion in the review (more often found in published, standalone literature reviews than in lit review sections in an article or research paper)
  • Summarize and synthesize: Give an overview of the main points of each source and combine them into a coherent whole
  • Analyze and interpret: Don’t just paraphrase other researchers – add your own interpretations where possible, discussing the significance of findings in relation to the literature as a whole
  • Critically Evaluate: Mention the strengths and weaknesses of your sources
  • Write in well-structured paragraphs: Use transition words and topic sentence to draw connections, comparisons, and contrasts.

Conclusion:

  • Summarize the key findings you have taken from the literature and emphasize their significance
  • Connect it back to your primary research question

How should I organize my lit review?

Lit reviews can take many different organizational patterns depending on what you are trying to accomplish with the review. Here are some examples:

  • Chronological : The simplest approach is to trace the development of the topic over time, which helps familiarize the audience with the topic (for instance if you are introducing something that is not commonly known in your field). If you choose this strategy, be careful to avoid simply listing and summarizing sources in order. Try to analyze the patterns, turning points, and key debates that have shaped the direction of the field. Give your interpretation of how and why certain developments occurred (as mentioned previously, this may not be appropriate in your discipline — check with a teacher or mentor if you’re unsure).
  • Thematic : If you have found some recurring central themes that you will continue working with throughout your piece, you can organize your literature review into subsections that address different aspects of the topic. For example, if you are reviewing literature about women and religion, key themes can include the role of women in churches and the religious attitude towards women.
  • Qualitative versus quantitative research
  • Empirical versus theoretical scholarship
  • Divide the research by sociological, historical, or cultural sources
  • Theoretical : In many humanities articles, the literature review is the foundation for the theoretical framework. You can use it to discuss various theories, models, and definitions of key concepts. You can argue for the relevance of a specific theoretical approach or combine various theorical concepts to create a framework for your research.

What are some strategies or tips I can use while writing my lit review?

Any lit review is only as good as the research it discusses; make sure your sources are well-chosen and your research is thorough. Don’t be afraid to do more research if you discover a new thread as you’re writing. More info on the research process is available in our "Conducting Research" resources .

As you’re doing your research, create an annotated bibliography ( see our page on the this type of document ). Much of the information used in an annotated bibliography can be used also in a literature review, so you’ll be not only partially drafting your lit review as you research, but also developing your sense of the larger conversation going on among scholars, professionals, and any other stakeholders in your topic.

Usually you will need to synthesize research rather than just summarizing it. This means drawing connections between sources to create a picture of the scholarly conversation on a topic over time. Many student writers struggle to synthesize because they feel they don’t have anything to add to the scholars they are citing; here are some strategies to help you:

  • It often helps to remember that the point of these kinds of syntheses is to show your readers how you understand your research, to help them read the rest of your paper.
  • Writing teachers often say synthesis is like hosting a dinner party: imagine all your sources are together in a room, discussing your topic. What are they saying to each other?
  • Look at the in-text citations in each paragraph. Are you citing just one source for each paragraph? This usually indicates summary only. When you have multiple sources cited in a paragraph, you are more likely to be synthesizing them (not always, but often
  • Read more about synthesis here.

The most interesting literature reviews are often written as arguments (again, as mentioned at the beginning of the page, this is discipline-specific and doesn’t work for all situations). Often, the literature review is where you can establish your research as filling a particular gap or as relevant in a particular way. You have some chance to do this in your introduction in an article, but the literature review section gives a more extended opportunity to establish the conversation in the way you would like your readers to see it. You can choose the intellectual lineage you would like to be part of and whose definitions matter most to your thinking (mostly humanities-specific, but this goes for sciences as well). In addressing these points, you argue for your place in the conversation, which tends to make the lit review more compelling than a simple reporting of other sources.

The bias beneath: analyzing drift in YouTube’s algorithmic recommendations

  • Original Article
  • Open access
  • Published: 24 August 2024
  • Volume 14 , article number  171 , ( 2024 )

Cite this article

You have full access to this open access article

literature reviews bias

  • Mert Can Cakmak 1 ,
  • Nitin Agarwal 1 &
  • Remi Oni 1  

443 Accesses

Explore all metrics

In today’s digital world, understanding how YouTube’s recommendation systems guide what we watch is crucial. This study dives into these systems, revealing how they influence the content we see over time. We found that YouTube’s algorithms tend to push content in certain directions, affecting the variety and type of videos recommended to viewers. To uncover these patterns, we used a mixed methods approach to analyze videos recommended by YouTube. We looked at the emotions conveyed in videos, the moral messages they might carry, and whether they contained harmful content. Our research also involved statistical analysis to detect biases in how these videos are recommended and network analysis to see how certain videos become more influential than others. Our findings show that YouTube’s algorithms can lead to a narrowing of the content landscape, limiting the diversity of what gets recommended. This has important implications for how information is spread and consumed online, suggesting a need for more transparency and fairness in how these algorithms work. In summary, this paper highlights the need for a more inclusive approach to how digital platforms recommend content. By better understanding the impact of YouTube’s algorithms, we can work towards creating a digital space that offers a wider range of perspectives and voices, affording fairness, and enriching everyone’s online experience.

Similar content being viewed by others

literature reviews bias

Examining Video Recommendation Bias on YouTube

literature reviews bias

Assessing Bias in YouTube’s Video Recommendation Algorithm in a Cross-lingual and Cross-topical Context

literature reviews bias

Investigating Bias in YouTube Recommendations: Emotion, Morality, and Network Dynamics in China-Uyghur Content

Explore related subjects.

  • Artificial Intelligence
  • Medical Ethics

Avoid common mistakes on your manuscript.

1 Introduction

In an age where information floods our digital landscapes, recommendation systems emerge as essential beacons, leading users simply and effectively through the vast ocean of data. These systems, elegantly designed to analyze and predict user preferences, have revolutionized how content is consumed online. At their core, recommendation systems are sophisticated algorithms that look through vast datasets to present users with choices customized to their historical interactions, preferences, and behaviors. This personalized approach not only enhances user experience but also significantly impacts cultural and political narratives by influencing what is watched, read, and discussed across the globe. A typical example of this influence is YouTube’s recommendation algorithm, which has become a pivotal force in shaping viewing habits among billions of users worldwide. Such algorithms have the power to subtly direct the flow of information, underscoring the importance of understanding their underlying mechanisms and the implications of their widespread adoption. The evolution of recommendation systems, as detailed in Burke et al. ( 2011 ), Lü et al. ( 2012 ), spans a trajectory from simple collaborative filtering techniques (where recommendations are made based on the preferences of similar users) to complex, multi-faceted approaches that incorporate a variety of artificial intelligence methodologies. This progression reflects a deepening sophistication in how digital platforms engage with users, ensuring that the content they encounter resonates with their individual tastes and preferences. However, the immense influence wielded by these seemingly impartial systems also brings to the forefront the need for a critical examination of how digital content is curated and the potential consequences of its reach.

At the core of platforms like YouTube, recommender systems stand as technological wonders, adept at predicting and shaping our preferences. They search through vast content, aligning choices with our past interactions to enhance our experience. These systems, by prioritizing content they predict will be of interest, significantly shape our digital diets, potentially narrowing our exposure to a homogenized set of perspectives. However, these algorithms, for all their sophistication, are not without their biases, which can skew content diversity and fairness in information distribution, raising concerns about echo chambers, filter bubbles, and the significant influence on public discourse (Polatidis and Georgiadis 2013 ). Recent studies have underscored the need to understand and mitigate biases in recommender systems, including but not limited to selection bias, position bias, exposure bias, and popularity bias. Not addressing these inherent biases could lead to serious issues, such as a disparity between offline evaluation and online metrics, negatively affecting user satisfaction and trust in the recommendation service (Chen et al. 2020 ).

Building on the understanding of recommender systems’ impact on user experience and the potential biases inherent in such systems, this study aims to delve deeper into the specific dynamics at play within YouTube’s ecosystem. YouTube, as a leading platform for video content, offers a fertile ground for examining how recommender algorithms shape the content landscape and user interactions over time. Accordingly, this research seeks to address the following pivotal questions:

RQ1: In what ways do YouTube’s recommendation algorithms influence drift within content over time?

RQ2: How do YouTube’s algorithms affect content diversity, narrative visibility, fairness of information distribution, user engagement, and echo chamber formation?

This study embarks on a critical examination of YouTube’s recommendation algorithm, specifically its role in driving narrative drift within the digital content ecosystem over time. Narrative drift, defined as the gradual changes in themes, topics, and perspectives within recommended content, significantly influences the diversity and depth of information accessible to users, thereby shaping their informational landscape.

Our investigation employs a multidimensional analytical framework, blending statistical evaluations with advanced analyses of emotion and moral sentiment, among other methodologies, to dissect the complex dynamics of YouTube’s content recommendation system. This comprehensive approach blends quantitative data analysis with qualitative insights, offering a detailed exploration of how algorithms impact user engagement and content evolution.

Structured as follows, the paper aims to provide a coherent narrative: Sect.  3 introduces the narratives under study and offers an overview of related domains, setting the stage for the methodologies applied. Section  4 details our data collection processes and analytical methods. Section  5 presents our findings on narrative drift, supplemented by detailed graphical analyses. Finally, Sect.  6 summarizes our study’s key insights.

The goal of this research extends beyond merely mapping narrative drift; it seeks to delve into the implications of algorithmic content curation on content diversity and the fair distribution of information across the digital landscape. By scrutinizing YouTube’s recommendation algorithm and identifying potential biases, this study contributes to the ongoing dialogue on digital media consumption, governance, and its societal impact.

In doing so, we challenge the current state of digital content recommendation, envisioning a path toward more transparent, equitable, and diverse digital ecosystems. Through this exploration of YouTube’s recommendation system, we aim to shed light on the nuances of algorithmic governance, fostering a richer and more inclusive digital commons for all.

2 Background

In this section, we delve into the intricate web of geopolitical issues that shape our world, exploring conflicts and disputes that not only have regional implications but also resonate on the global stage. From the deep-rooted tensions in China’s Xinjiang province to the strategic complexities of the South China Sea dispute, and the nuanced use of historical narratives, such as the story of Cheng Ho, in modern-day diplomacy, these offer insightful analyses into some of the most pressing and contentious geopolitical challenges of our times.

2.1 China-Uyghur conflict

The Xinjiang conflict, deeply rooted in complex historical, cultural, and political factors, has emerged as a significant issue in global discourse. Central to this conflict is the difficult situation faced by the Uyghur Muslim minority in China’s Xinjiang province, an area filled with ethnic tensions and controversial government actions. Research highlights the cultural and linguistic aspects of the conflict, focusing on the Uyghur identity and language policy, underscoring how identity plays a crucial role in the ongoing tensions (Dwyer 2005 ). Another study examines the conflict through the lens of majority-minority dynamics within China, providing insights into the socio-political factors that have contributed to the escalation of tensions (Hasmath 2019 ). Further analysis explores the broader implications of the conflict, particularly China’s national policies and their impact on the Uyghur population, offering a critical view of the government’s approach to handling ethnic diversity and disagreement (Israeli 2010 ). This is complemented by discussions on the involvement of international organizations like Amnesty International in addressing the discrimination and conflict faced by Uyghurs, highlighting the period from 2018 to 2022 and the international community’s response (Al-Asad and Zarkachi 2023 ). Additionally, studies on Uyghur Muslim ethnic separatism clarify the complexities of ethnic identity and the desire for self-governance within Xinjiang, illustrating the intricate relationship between ethnic identity, political aspirations, and the broader conflict narrative (Davis 2008 ). These scholarly perspectives paint a multifaceted picture of the Xinjiang conflict, demonstrating its multi-dimensional nature that includes cultural, political, and international elements.

2.2 South China sea dispute

The South China Sea dispute represents a complex intersection of geopolitical, economic, and legal challenges, crucial for understanding contemporary international relations. This region, pivotal for global maritime trade, sees an estimated one-third of the world’s shipping pass through its waters, highlighting its significance in international commerce. The area is not only a key maritime route but also possesses considerable untapped natural resources, including substantial oil and gas reserves, making it an economically strategic zone. Central to this dispute is the People’s Republic of China (PRC)’s assertive attitude, as explored in Chubb ( 2020 ). China’s strategy includes extensive island-building and militarization, particularly within the Paracel and Spratly Islands, reshaping the region’s geopolitical landscape. These efforts involve constructing artificial islands and establishing military bases, actions that have significantly altered regional dynamics and heightened tensions among neighboring states. The complexity of the South China Sea dispute extends beyond its strategic maritime routes, as detailed in Fravel ( 2011 ). The promise of rich undersea resources, including significant reserves of oil and natural gas, plays a pivotal role in fueling the conflicting territorial claims. This economic potential, closely mixed with layers of historical claims and national pride, adds to the complexity of the situation. The legal dimension of the dispute gained importance following the 2016 decision by the Permanent Court of Arbitration, which challenged China’s extensive “nine-dash line" territorial claim, deeming it inconsistent with international law (Macaraig and Fenton 2021 ). Although China has rejected the ruling, it introduced an important legal aspect to the dispute, emphasizing the role of international law in maritime territorial rights. China’s activities in the South China Sea, including land reclamation and militarization, have implications far beyond the immediate region, raising significant concerns regarding the principles of freedom of navigation and overflight in crucial international waters. The South China Sea dispute, with its intricate blend of geopolitical significance, economic interests, and legal complexities, stands as a testament to the challenges facing the international order in the 21st century.

2.3 Cheng Ho propaganda

In the context of current international politics and strategies, the story of Cheng Ho, also known as Zheng He, has gained new significance, especially in how it’s used in the propaganda of the Chinese Communist Party (CCP). Historically recognized as a celebrated Chinese naval admiral of the early 15th century, Zheng He is renowned for his extensive maritime voyages across Southeast Asia, India, and the Middle East, as detailed in Wade ( 2005 ), Finlay ( 2008 ). These expeditions are traditionally viewed as exploratory and diplomatic in nature, emphasizing peaceful engagement and trade. However, in recent times, the CCP has strategically repurposed Zheng He’s legacy, as discussed in Dotson ( 2011 ), to serve its contemporary political and strategic agendas. This recontextualization of Zheng He’s historical image is particularly evident in the narrative that portrays him as a figure who not only spread Islam and religious tolerance but also as a symbol of China’s peaceful rise and kindness. This portrayal aligns with the CCP’s broader objective of countering international criticism regarding its treatment of the Uyghur Muslim population, as well as bolstering its geopolitical influence, especially in relation to the South China Sea conflict. By recasting Zheng He as a compassionate gift-giver and a peaceful diplomat, the narrative serves to project an image of China as a historically tolerant and inclusive nation. Furthermore, the manipulation of Zheng He’s story is intricately linked with China’s “Maritime Silk Road" initiative, aiming to expand its economic and strategic influence across Asia, Africa, and Europe. The narrative serves as a tool to foster regional support for China’s ambitious project, presenting it as a continuation of a peaceful and cooperative maritime tradition. This tactic of referencing historical events is a sophisticated approach to engage in international relations, strengthen credibility within China, and improve its global position. The Cheng Ho propaganda is a testament to the power of historical narratives in serving current political objectives, where the past is actively reinterpreted to shape the present and future.

2.4 Significance of selected geopolitical topics

We have chosen these geopolitical topics due to their inherent controversy and their significant impact on international relations and public opinion. The selected topics include two anti-China perspectives: the China-Uyghur conflict and the South China Sea dispute, and one pro-China narrative, the story of Cheng Ho. This balanced selection allows us to examine issues where bias might be particularly evident due to their polarizing nature.

These geopolitical issues are highly relevant in current global discourse and receive significant media coverage, making them ideal for analyzing recommendation bias on platforms like YouTube. The controversies surrounding these topics often result in strong opinions and divided audiences, providing a fertile ground for studying how recommendation systems present content in response to user interactions. By examining these topics, we aim to uncover potential biases in YouTube’s recommendation algorithms that might influence the direction of content suggestions.

Additionally, these topics encompass a range of cultural, political, and historical elements, providing a comprehensive framework for studying the complexity of bias in recommendation systems. This selection allows us to assess whether YouTube’s algorithms exhibit any tendencies in the progression of recommended content starting from these contentious geopolitical issues. Understanding these tendencies is crucial for recognizing the broader implications of algorithmic bias in shaping public opinion and the potential consequences for international relations and social harmony.

3 Literature review

This literature review systematically explores a wide range of scholarly work on the multifaceted nature of biases in digital spaces, algorithmic influences in recommendation systems, and the psychological and ethical dimensions of online content interaction.

3.1 Recommendation bias

This section delves into the intricate web of biases inherent in recommendation algorithms, examining their profound implications on information consumption, user behavior, and societal discourse across various digital platforms.

The author in Stinson ( 2022 ) explores biases inherent in collaborative filtering algorithms, used extensively in recommendation and search systems. Highlighting the cold-start problem, popularity bias, over-specialization, and homogenization, the author argues these statistical biases can marginalize already marginalized groups. This insight is crucial for the broader discourse on algorithmic fairness, stressing the importance of addressing both data and algorithmic mechanisms to mitigate biases and ensure more equitable outcomes in digital recommendation environments. Building on this foundational understanding of recommendation biases, the comprehensive survey by Chen et al. ( 2023 ) meticulously explores the multifaceted nature of biases in recommendation algorithms. They provide a deeper dive into each of the seven identified biases: selection, exposure, conformity, position, inductive, popularity, and unfairness; their research inspects various strategies for mitigating such biases. Among the debiasing techniques, the survey discusses the effectiveness of tendency score adjustments, adversarial learning, and other methods in enhancing the fairness and diversity of recommendations. By elaborating on the challenges and solutions associated with each bias type, the survey enriches the discourse on creating equitable and inclusive recommender systems, aligning closely with the thematic concerns of recommendation bias in our study.

Similarly, the researchers in Zhan et al. ( 2022 ) present a novel approach to address the duration bias in video watch-time prediction models. By employing a causal graph and backdoor adjustment, the study innovatively separates the intrinsic effect of video duration on watch-time from its biased impact on video exposure. This methodology allows for more accurate and fair recommendations by mitigating the undue preference for longer videos, which has been shown to skew platform engagement metrics and user experience. Through extensive offline evaluations and live experiments on the Kuaishou platform, the researchers demonstrate the effectiveness of this approach in enhancing watch-time prediction accuracy and, consequently, video consumption, further emphasizing the importance of addressing biases in recommendation systems.

The researchers in Haroon et al. ( 2022 ) conduct a comprehensive audit of YouTube’s recommendation system to assess ideological biases and potential radicalization through recommendations. They employ a novel methodology using “sock puppets" to mimic user behavior across different ideological spectrums. The findings reveal YouTube’s tendency to guide users, particularly those leaning right, towards increasingly radical content. Additionally, the study proposes a bottom-up intervention strategy aimed at mitigating these biases, demonstrating its effectiveness in diversifying recommendations. This research adds critical insights into the ongoing debate on social media’s role in ideological bias and radicalization, highlighting the complex challenges faced by digital platforms in ensuring fair and balanced content distribution.

The study by Nechushtai et al. ( 2023 ) investigates the effect of algorithmic recommendation systems on the diversity of news exposure across major digital platforms, presenting a comprehensive analysis that compares the extent of homogenization in news recommendations. It emphasizes the tendency of these platforms to favor nationally oriented news sources over local or regional ones, highlighting concerns regarding the centralization of information, the potential reduction in exposure diversity, and the implications for public discourse. The study employs a crowd-sourced audit methodology to assess the recommendations made by Google, Google News, Facebook, YouTube, and Twitter to a diverse set of users in the United States, examining the interplay between user characteristics and algorithmic sorting in shaping news consumption patterns. This analysis underscores the subtle impacts of YouTube’s recommendation algorithms on content diversity and user perception, illuminating a complex picture of algorithmic bias.

Recent studies, including Cakmak et al. ( 2024b ), Okeke et al. ( 2023 ), Cakmak et al. ( 2024a ), Gurung et al. ( 2024 ) and Onyepunuka et al. ( 2023 ), further contribute to the understanding of YouTube’s recommendation algorithm’s complexities. These analyses reveal trends towards positive emotions, reduced focus on moral dilemmas, systematic content filtration, and shifts in thematic and emotional engagement, which highlight the algorithms’ influence on shaping viewers’ feelings, beliefs, and public discourse dynamics. Specifically, Onyepunuka et al. ( 2023 ) explores the Cheng Ho narrative to assess topic and emotion drift, finding that YouTube’s recommendations tend to deviate towards content subtly introducing pro-China topics, which target specific demographics. These findings emphasize the algorithm’s capacity to shift discussion points and influence audience perception, adding depth to the landscape of recommendation biases.

Exploring the nuances of bias and misinformation propagation within digital ecosystems, scholarly investigations such as Kirdemir et al. ( 2021a ), Kirdemir and Agarwal ( 2022 ), Kirdemir et al. ( 2021b ), Cakmak et al. ( 2024a ), Cakmak and Agarwal ( 2024b ), Poudel et al. ( 2024 ), and Srba et al. ( 2023 ) unveil the inherent structural preferences and the algorithm’s role in fostering content homogeneity, tightly knit content communities, and polarized content spheres. These studies underscore the critical need for transparent, accountable algorithmic practices and the development of debiasing interventions to cultivate a more diverse and accurate digital information landscape, thereby reinforcing the pivotal themes discussed in the aforementioned research on recommendation biases.

3.2 Behavioral dynamics in social media

The emotional, moral, and toxic behavioral dynamics of social media are foundational to user engagement and the development of effective recommender systems. Recognizing and interpreting these dynamics is critical for creating algorithms that resonate with user preferences and enhance their online experiences.

3.2.1 Emotions in social media

The emotional dynamics of social media significantly influence user engagement and content dissemination. Recognizing and interpreting emotional expressions in user-generated content is crucial for creating algorithms that enhance user experiences. Research highlights the importance of accurately identifying emotions to prevent the spread of misinformation and counteract online radicalization (Kušen et al. 2017 ). Studies delve into the role of emotions in shaping user influence and engagement, advocating for emotionally intelligent recommendation systems that capture the essence of user-generated content and ensure emotionally engaging recommendations (Chung and Zeng 2020 ; Panger 2017 ). Further research explores the impact of visual and auditory cues on user emotional responses, emphasizing the potential for leveraging these elements to enhance recommendation accuracy and appeal (Cakmak et al. 2024c ; Yousefi et al. 2024a ). Multimodal emotion analysis, combining textual and auditory data through deep learning, improves emotion detection accuracy and underscores the development of more emotionally intelligent recommendation systems, aligning recommendations more closely with user emotions (Banjo et al. 2022 ).

3.2.2 Morality in social media

Understanding how moral considerations intersect with social media dynamics is imperative for addressing biases in digital spaces. Research highlights the significant impact of social media on moral reasoning, judgments, and behaviors, emphasizing the need for theoretical contributions to understanding morality on these platforms (Neumann and Rhodes 2024 ). Studies delve into the effects of moral outrage on political polarization, showing how social media magnifies aggression and withdrawal from political conversations (Carpenter et al. 2020 ). The interplay between social media and morality can amplify both negative (e.g., outrage, intergroup conflict) and positive (e.g., social support, prosociality) aspects of morality (Van Bavel et al. 2024 ). The design and algorithmic preferences of social media platforms significantly shape the spread of moral narratives, embedding moral biases within the algorithms responsible for content recommendations. This highlights the importance of incorporating moral values into recommender systems to create a balanced and less biased digital environment (Mbila-Uma et al. 2023 ).

3.2.3 Toxic behavior in social media

Toxic content on social media presents a significant challenge for the development of unbiased recommendation algorithms. Studies utilizing Reddit data reveal how toxic content impacts communities and biases algorithmic decisions by amplifying negative discourse (Yousefi et al. 2024b ). Research on the contagious nature of toxic tweets underscores the urgency for algorithms to understand how harmful content multiplies (Yousefi et al. 2023 ). Comparisons of toxicity levels across different platforms suggest that distinct strategies may be needed to curb bias (DiCicco et al. 2020 ; Noor et al. 2023 ). Recent studies extend our understanding of the toxicity landscape by examining its role in amplifying public health debates and affecting polarization in the wake of feminist protests (Pascual-Ferrá et al. 2021 ; Estrada et al. 2022 ). These insights emphasize the necessity for recommendation algorithms to account for toxicity dynamics to refine algorithms, promote healthier discourse, and mitigate biases, ultimately fostering constructive public engagement.

In conclusion, a comprehensive understanding of the emotional, moral, and toxic behavioral dynamics in social media is essential for developing recommendation systems that can effectively manage biases, enhance user engagement, and promote a healthier online environment.

3.3 Topic modeling in social media

Understanding the importance of topic content in social media is pivotal for enhancing recommender systems and ensuring their fairness and relevance. The dynamism of social media platforms, such as Sina Weibo, underscores the necessity to analyze and comprehend the thematic shifts in user-generated content. By examining the distribution of hot topics and their correlations across different platforms, researchers can gain insights into user interests and behaviors, which are crucial for developing more accurate and unbiased recommender systems (Yu et al. 2014 ). This analysis not only helps in identifying trending topics but also in understanding the broader social context in which these discussions occur.

Moreover, the application of advanced topic modeling techniques to social media content enables the discovery of underlying topic facets and their evolution over time. Such methodologies are instrumental in capturing the rich tapestry of online discourse, facilitating a deeper understanding of the thematic structures within vast datasets (Rohani et al. 2016 ). This knowledge is invaluable for recommender systems, as it allows for the refinement of content curation algorithms to better match user preferences and mitigate the risk of reinforcing echo chambers.

The significance of topic analysis extends beyond bare content filtering and recommendation. It plays a critical role in identifying shifts in public sentiment and emerging trends, thereby enabling decision support systems to adapt to changing user needs and preferences (Li et al. 2023 ). Furthermore, the study of changes in social media content over time provides marketers and content creators with insights into the effectiveness of their strategies and the changing interests of their audience, as evidenced by research in marketing science by Zhong and Schweidel ( 2020 ).

In conclusion, topic content on social media emerges as a critical element requiring meticulous analysis for identifying and mitigating bias within recommender systems. Its influence on recommendation algorithms underscores the necessity for careful examination to ensure the integrity and fairness of these systems.

3.4 Social network analysis

Social Network Analysis (SNA) has emerged as a vital tool for analyzing the complicated web of interactions within social media platforms, offering profound insights into the diffusion of information and the identification of influential actors within these networks, as described in Harrigan et al. ( 2021 ). By mapping out the relationships and flows between users, SNA facilitates a deeper understanding of how information spreads through these digital landscapes, highlighting the individuals who wield disproportionate influence over these processes. Influencers, identified through their central positions within the network, play a critical role in shaping audience attitudes and facilitating the spread of information, thus acting as gatekeepers in the dissemination of content (Khanam et al. 2023 ). The study by Shaik et al. ( 2024 ) further elucidates the role of multimedia in amplifying these dynamics, underscoring the potential of multimedia content to engage and mobilize communities through social networks.

The significance of influencers extends beyond mere popularity, as their strategic position within the network grants them the ability to affect the flow and reach of information significantly. This influence is not uniform but varies based on the network’s structure and the nature of the connections. Research describes distinct types of influencers, such as disseminators, engagers, and leaders, each playing unique roles in information spread (del Fresno García et al. 2024 ). These distinctions underscore the ways in which influence manifests within social networks, shaping how information is shared and received.

In the context of recommender systems, understanding the dynamics of social networks and the role of influencers is paramount. These systems, designed to select and recommend content to users based on various algorithms, can significantly benefit from including insights derived from SNA. By recognizing and leveraging the influence of key actors, recommender systems can enhance their effectiveness, ensuring that the content reaches a broader audience and resonates more deeply with users (Aïmeur et al. 2023 ). Moreover, the inclusion of social network insights can help mitigate the challenges of filter bubbles and echo chambers, promoting a more diverse and engaging content landscape.

Incorporating SNA into the examination and refinement of recommender systems on social media platforms presents a strategic approach to mitigating bias and enriching the content landscape. By meticulously identifying and deciphering the roles of influencers within these digital networks, platforms have the opportunity to recalibrate their algorithms to leverage these pivotal actors effectively. This strategy not only amplifies the diversity and relevance of recommendations but also addresses underlying biases by ensuring a broader, more inclusive representation of perspectives and content. As highlighted by Alp et al. ( 2022 ), analyzing social media discussions around critical health issues like COVID-19 and vaccines can offer invaluable insights into public sentiment, further enriching the dataset for recommender systems. Consequently, this integration fosters a digital ecosystem that is not only more interconnected and vibrant but also fairer and more transparent. Through such tailored algorithmic adjustments, social media platforms can excel conventional limitations, offering users a richer, more balanced and unbiased content experience.

Recent studies by Bhattacharya et al. ( 2024a ) highlight the critical role of network analysis in uncovering biases within social networks. The authors used computational methods to analyze identity formation in political protests, showing how social networks can coalesce into cohesive movements and revealing potential biases in these networks. Additionally, Bhattacharya et al. ( 2024b ) examined the socio-technical factors behind modern social movements, emphasizing how network analysis can identify biases and better understand the interplay between solidarity and collective action in digital environments.

3.5 User engagement effect on recommendation systems

In the realm of digital platforms, recommender systems serve as the keystone for navigating the vast array of content available to users, aiming to enhance user engagement by tailoring recommendations to individual preferences. However, the attempt to personalize user experiences and maximize engagement does not come without its challenges, particularly in terms of recommendation bias and its impact on the visibility of diversified content. This understanding of recommender systems functionality underscores the critical balance between user engagement and the equitable representation of content.

The research of Maslowska et al. ( 2022 ) emphasizes the pivotal role of recommender systems in fostering user engagement, suggesting that the design and operational variations of recommender systems significantly influence users’ long-term interactions with platforms. While accuracy in predicting user preferences is traditionally valued, their work suggests a broader scope for evaluating recommender systems effectiveness, highlighting the importance of understanding user engagement dynamics in depth.

Complementing this perspective, Ping et al. ( 2024 ) investigates the effects of diversity, novelty, and serendipity on user engagement and reveals the intricate relationship between recommender systems design choices and the potential for bias. Their findings illuminate how an overemphasis on popular or trending content could inadvertently marginalize less conventional, yet potentially engaging, content. This phenomenon, often referred to as the popularity bias, underlines the critical trade-offs recommender systems designers face between optimizing for user engagement and ensuring a diverse content ecosystem.

Moreover, the study by Zhao et al. ( 2018 ) on the differential impacts of explicit versus implicit feedback on user engagement and satisfaction offers insights into the mechanisms through which recommender systems might amplify or mitigate biases. By highlighting the intricate ways in which user feedback is included into recommender systems, their research underscores the potential for recommender systems to either perpetuate or challenge existing biases, depending on the design and implementation choices made.

Additionally, research by Shajari et al. ( 2024a , 2024b ) explores anomalous engagement and commenter behavior on YouTube, providing valuable insights into how engagement metrics can be manipulated and the implications this has for recommender systems. Adeliyi et al. ( 2024 ) further investigate inorganic user engagement, emphasizing the impact of automated and semi-automated activities on the integrity of user engagement metrics. Their findings highlight the importance of implementing robust mechanisms to detect and address such behaviors to maintain the integrity of user engagement metrics.

Addressing these complexities, it becomes evident that the larger goal of recommender systems to enhance user engagement must be critically examined through the lens of recommendation bias. The challenge lies not only in designing recommender systems that are expert at capturing and sustaining user interest but also in ensuring that these systems promote a balanced and inclusive representation of content.

3.6 Statistical evaluation in recommender systems

The advancement of recommender systems has underscored the vital need for robust statistical evaluation frameworks. As these systems increasingly influence user experiences across various digital platforms, the need for their effectiveness and fairness cannot be overstated. Statistical evaluation methodologies provide a foundation for understanding, improving, and benchmarking recommender systems. The integration of Information Retrieval (IR) metrics into recommender system evaluation has emerged as a pivotal area of study, aiming to bridge the gap between predicted user preferences and actual user satisfaction (Shani and Gunawardana 2011 ). However, this adaptation is not without challenges, as noted in Bellogín et al. ( 2017 ), who highlighted the inherent statistical biases, such as sparsity and popularity biases, that could distort evaluation outcomes and obstruct the comparability of recommender systems.

In the realm of evaluating recommender systems, traditional error-based metrics have shown limitations, prompting a shift towards more user-centric evaluation criteria (Knijnenburg and Willemsen 2015 ). This shift has been influential in capturing the multifaceted nature of user preferences and the complex dynamics of recommendation processes. The work by Shani and Gunawardana ( 2011 ) further elaborates on the intricacies of applying IR methodologies to recommender systems, emphasizing the need for a systematic approach to address these challenges.

The evaluation of recommender systems has also brought to light the significance of addressing and mitigating biases inherent in recommendation algorithms. These biases, if unchecked, can skew the recommendation process, potentially leading to a reinforcement of existing user preferences and hindering the discovery of diverse content (Herlocker et al. 2004 ).

Furthermore, the exploration of statistical robustness in the evaluation of stream-based recommender systems by Vinagre et al. ( 2019 ) adds an additional layer of complexity, necessitating the development of dynamic evaluation metrics that can adapt to the evolving nature of user interactions with content. This dynamic evaluation underscores the importance of time aspects in the assessment of recommender systems, highlighting the need for metrics that can capture the transient preferences of users and the fluidity of content relevance.

In conclusion, the literature underscores the paramount importance of statistical measurements in the evaluation of recommender systems. As these systems continue to evolve and play a crucial role in shaping digital experiences, the development and refinement of statistical evaluation methodologies will remain a critical area of research. This endeavor not only aids in benchmarking the performance of recommender systems but also in ensuring their fairness, transparency, and adaptability to the diverse and changing needs of users.

4 Methodology

Our methodology outlines the structured approach we adopt to investigate the intricate dynamics of recommendation algorithms, combining data collection, analysis, and theoretical examination to unveil the underlying biases within digital platforms.

4.1 Data collection

In this section, we detail the rigorous methodology employed for gathering and processing data, setting the foundation for our comprehensive analysis of the narratives explored in this study.

4.1.1 Narrative keywords

To initiate data collection as outlined in Sect.  2 , we conducted a series of workshops with subject matter experts. These sessions were instrumental in generating a list of relevant keywords associated with the three narratives. Subsequently, these keywords were utilized to facilitate the search for related videos on YouTube.

China-Uyghur Conflict: in our research on the China-Uyghur conflict, the carefully chosen keywords reflect the critical themes identified in our literature review. These keywords are detailed in Table 1 . The keywords include human rights abuses, cultural and religious identities, and the international response to the conflict. Terms like “Oppression”, “Muslim Uyghur”, and “Stop Genocide” are used to capture the mistreatment of the Uyghur population, their cultural and religious significance, and the global reaction to these issues. By incorporating specific organizations and notable figures, our data collection becomes comprehensive, ensuring our study accurately represents the complexities and the proved realities of the China-Uyghur conflict.

South China Sea Dispute: as shown in Table 2 , our selected keywords for the South China Sea dispute study encapsulate the conflict’s key aspects: legal rulings (“Permanent Court", “Arbitration", “UNCLOS"), geopolitical tensions (“China + Philippines", “sovereignty"), and economic interests (“economic cooperation", “natural resources"). These terms are essential to examine the intricate blend of legal, political, and economic factors in the dispute, particularly focusing on China’s territorial claims and the responses of neighboring states like the Philippines. The keywords enable a comprehensive analysis, aligning with the diverse perspectives and complexities discussed in our literature review.

As outlined in Table 3 , our study utilized specific keywords to investigate Cheng Ho’s (Zheng He) maritime expeditions and their modern reinterpretation by the Chinese Communist Party (CCP). “Cheng Ho", “Zheng He", and related terms explore his historical significance and cultural impact. Keywords linking to the Uighur region and figures like “Gavin Menzies" connect his legacy with contemporary geopolitical narratives and popular theories. This selection facilitates a comprehensive examination of Zheng He’s historical role and his portrayal in current political strategies, aligning with our research focus.

These keywords, as presented in their respective tables, play a pivotal role in uncovering the complex themes and narratives at the heart of our study. This approach not only structures our in-depth analysis but also contains a blend of English and non-English terms. This bilingual approach accounts for the original content of videos and their broader dissemination in English, ensuring a comprehensive understanding of the subject matter from multiple linguistic perspectives. It’s important to note that the unbalanced count of keywords between topics does not detract from our study’s validity. Each topic is unique, encompassing a varied range of terms. Moreover, we selected an equal number of initial seed videos for each topic, a methodological choice that will be elaborated upon in the upcoming section.

4.1.2 Recommendation depth collection

To accurately measure bias in YouTube’s recommendations, we needed to collect the videos recommended by YouTube. Our methodology mirrors the approach used by the authors in Onyepunuka et al. ( 2023 ). Initially, we selected seed videos using the keywords mentioned in Sect.  4.1.1 . These seed videos were manually chosen based on their relevance to the subject.

For the collection of recommended videos, we employed Selenium, a widely-used open-source library for web scraping. Selenium utilizes the WebDriver protocol to control web browsers, such as Chrome in our case. We individually opened each seed video in the browser and scraped the videos recommended by YouTube, typically displayed in the right-hand corner of the screen. These recommended videos are related to the current video being viewed. After completing the collection at each recommendation depth, the newly gathered videos served as the starting point for the subsequent depth. This process helped us to construct a recommendation network. Ultimately, we achieved four new depths of recommendations. The number of videos and their corresponding depths are presented in Table 4 .

We initially selected 40 seed videos for each narrative. After each depth, there was a variation in the count of recommended videos. This variation is attributable to YouTube’s algorithm, which factors in aspects such as content metadata, viewer engagement, video length, ongoing algorithmic adjustments, content availability, and current trends. Throughout the video collection process, we ensured an unbiased approach by not logging into any YouTube account. Each browsing session started afresh with cleared cookies to eliminate any influence of user history on the data.

4.1.3 Attribute retrieval

In this research, we analyzed various video attributes including the title, description, transcription, comments, views, and likes. For all attributes except transcription, we utilized the YouTube Data API v3. This API facilitates the retrieval of feeds related to videos, among other functionalities.

For transcriptions, we adopted a different approach, as detailed in Cakmak et al. ( 2023 ), Cakmak and Agarwal ( 2024 ). In these studies, the authors developed a method to efficiently collect video transcripts from YouTube. This process primarily involved the use of the YouTube Transcript API (Depoix 2023 ), which extracts transcripts from YouTube videos. For videos without available transcripts, we used the OpenAI Whisper model by Radford et al. ( 2023 ), which applies speech generation algorithms, to create the necessary transcriptions. This method effectively streamlined the transcription collection process, demonstrating the practical use of advanced computational techniques in extracting data from online multimedia sources.

Additionally, due to geo-specific issues encountered during our data collection, we dealt with non-English data. To enhance the understanding and accuracy of the models we used, we translated the data into English. This was achieved using the Googletrans Library in Python, a free and unlimited library that implements the Google Translate API. The library leverages the Google Translate Ajax API for functions such as language detection and translation.

4.2 Emotion assessment

The bias inherent in recommended video content can be effectively analyzed through the lens of emotional shifts. Emotions significantly influence our interaction with media, shaping our reactions, responses, and engagement levels. In the realm of recommended videos, a complex relationship exists between the emotional tone of the content and the viewer’s current emotional state. This interaction can result in a skewed selection of recommendations, as algorithms might favor content that evokes emotions leading to higher engagement from viewers. This tendency can create a cycle where viewers are continually presented with content that provokes specific emotional responses, potentially resulting in a more engaged but less diverse viewing experience. Understanding this mechanism is key in identifying and addressing bias in video recommendations, highlighting the subtle ways in which emotional targeting can influence viewing habits and content exposure.

To quantify these emotional shifts in video recommendations, our approach involved analyzing the emotions in each video using a transformer model, a tool at the forefront of advancements in natural language processing. Renowned for their ability to contextually interpret language, models like BERT, GPT, and RoBERTa (Devlin et al. 2019 ; Radford and Narasimhan 2018 ; Brown et al. 2020 ; Liu et al. 2019 ) are particularly adept at accurate emotion analysis.

Our research utilized RoBERTa and its more efficient variant, DistilRoBERTa. We selected the model (Hartmann 2022 ) from Hugging Face, a refined version of DistilRoBERTa, which has been meticulously trained on diverse datasets to identify a range of emotions: anger, disgust, fear, joy, neutral, sadness, and surprise.

Employing this model enabled us to conduct a thorough analysis of the emotional content in video titles, descriptions, transcriptions, and user comments. This methodology yielded valuable insights into the nature of emotional content within these videos and its impact on audience engagement. Such findings are integral to deepening our understanding of emotional biases in video recommendations and developing strategies to mitigate their effects.

4.3 Moral foundation assessment

In our exploration of the biases present in video recommendations, we acknowledge that alongside emotions, the subtle yet powerful influence of moral values plays a crucial role in steering viewer choices. These values, which form the backbone of personal ethics and decision-making, are instrumental in shaping how audiences perceive and interact with video content. Our study, therefore, delves into the realm of morality in recommended videos, aiming to unravel how these ethical dimensions influence viewer behavior and content preferences.

To investigate moral values in video content, we employed the extended Moral Foundations Dictionary (eMFD), a sophisticated tool designed for extracting moral content from textual data (Hopp et al. 2021 ). The eMFD represents a significant advancement in moral analysis, leveraging the input of a large and diverse group of human annotators to capture a wide range of moral intuitions. This methodology contrasts with previous approaches, which often relied on a small group of experts and resulted in a more constrained interpretation of morality.

For this study, we have used the eMFD as a quantitative tool and did not engage in its construction process. The eMFD’s construction involved a detailed annotation process that profoundly enhanced our analysis of moral content in video recommendations. In this process, each word in the eMFD is assigned continuously weighted vectors, reflecting its likelihood of association with five core moral foundations: Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion, and Sanctity/Degradation. Additionally, a key aspect of our methodology is the eMFD’s capability to evaluate the moral connotations of each word based on its alignment with vice or virtue characteristics. This dual approach, considering both the moral foundations and the vice-virtue spectrum, allows for a nuanced assessment of the moral undertones in video content. By examining how words align with these ethical dimensions, we gain a detailed view of how moral values are conveyed, offering insights into their potential impact on viewer behavior and preferences.

The methodology employed by the eMFD does not utilize probability distributions to analyze or interpret text. Instead, the approach is fundamentally frequency-based and categorical. When applied to a body of text, the eMFD quantifies the presence of moral language by counting the occurrences of words and phrases associated with each of the five moral foundations. This process results in a set of metrics that reflect the extent to which each moral foundation is represented in the text. The output is thus a direct measure of moral rhetoric, expressed through the frequency of specific lexicon usage across the identified moral dimensions. This direct and categorical assessment of moral content allows for a clear understanding of how moral values are embedded and communicated in video content, enhancing our ability to analyze the ethical implications of video recommendations.

Furthermore, the eMFD incorporates sentiment analysis through the Valence Aware Dictionary and sEntiment Reasoner (VADER), providing an additional layer of depth to the moral language assessment. This combination allows for a sophisticated examination of the moral and ethical themes within the video content, considering not just the presence of moral language but also the context and sentiment surrounding it.

By applying the eMFD to our analysis of recommended videos, we aim to explore the moral dimensions in video titles, descriptions, transcriptions, and user comments. This approach helps us assess narratives and character portrayals, uncovering the moral and ethical implications embedded in these aspects. Understanding how moral values are represented across these video attributes will provide insights into their influence on viewer engagement and preferences, offering a more comprehensive view of moral biases in video recommendations.

4.4 Toxicity assessment

In exploring the biases in video recommendations, the measurement of toxicity is as crucial as the assessment of emotion and moral values. Toxicity, which encompasses rude, disrespectful, or unreasonable content, is a key factor that can significantly influence people’s choices and interactions with online content. Its importance lies in its potential impact on the viewer’s experience and decision-making process. Just like emotional and moral content, toxic elements in videos can subtly shape preferences and behaviors, which in turn may affect the functioning of recommendation algorithms. Therefore, incorporating toxicity as a measure in our analysis is essential to gain a comprehensive understanding of the various factors that contribute to recommendation biases and their implications on user engagement and content consumption patterns.

To methodologically assess toxicity in video content, we employed the Detoxify model (Hanu 2020 ). Detoxify is a state-of-the-art machine learning model designed to detect and quantify various forms of toxic behavior in textual content. Among the variants of Detoxify, we specifically chose the “unbiased" model, which is designed to minimize biases that often accompany toxicity detection, such as those related to gender, race, or specific ideologies. This model’s architecture is based on a transformer-based framework, leveraging the power of models like BERT for context-sensitive analysis. It has been trained on a large dataset comprising diverse and challenging text samples, allowing it to accurately identify and score a range of toxic behaviors, including insults, threats, and hate speech.

Our use of the unbiased Detoxify model involved analyzing textual elements of videos such as titles, descriptions, transcriptions, and user comments. By applying this model, we were able to generate toxicity scores for each video element, giving us a quantifiable measure of the toxic content present in the recommended videos. This approach allowed us to systematically evaluate the prevalence and severity of toxic content in video recommendations and to understand how such content could influence viewer behavior and preferences.

4.5 Topic analysis

Understanding the biases in video recommendations requires an analysis of the topics presented in these videos. The topic or theme of a video is a crucial element that can influence the recommendation algorithm. If certain topics are consistently recommended while others are neglected, this can indicate a bias in the algorithm. This bias may arise because viewers tend to watch certain topics more frequently or engage more deeply with them, prompting the algorithm to favor these topics in its recommendations. Such a trend can lead to a homogenization of content, where diverse or less popular topics are underrepresented. This aspect of topic analysis is essential in understanding how content diversity is maintained or limited by recommendation systems and its impact on viewer exposure to a broad range of subjects.

To conduct a comprehensive topic analysis, we employed a BERTopic model (Grootendorst 2022 ), a sophisticated machine learning tool designed for topic classification and extraction. The fine-tuned version of this model, referenced in Grootendorst ( 2023 ), is pre-trained on approximately 1,000,000 Wikipedia pages, covering a broad spectrum of knowledge. It is capable of identifying 2,377 distinct topics, providing us with a robust framework for analyzing the thematic content of videos. The BERTopic model operates using a transformer-based architecture, similar to BERT, which excels in understanding and categorizing complex textual data. This capability allows for precise and nuanced topic detection, essential for accurately assessing the range and diversity of topics in video content.

Our methodology involved applying the BERTopic model to various textual elements of videos, such as titles, descriptions, transcriptions, and comments. By doing so, we could systematically categorize videos into specific topics and analyze the distribution of these topics within the recommended videos. This approach enabled us to observe patterns and trends in topic representation, offering insights into how certain topics are either prioritized or overlooked by the recommendation algorithm.

4.6 Network analysis

Network analysis offers a powerful perspective for unraveling the often hidden biases within video recommendation systems. This approach goes beyond surface-level observations, diving into the complex web of connections that define how videos are interlinked and recommended. By mapping these networks, we can expose subtle patterns and relationships, illuminating how certain videos gain notability or become marginalized within the recommendation ecosystem.

Our focus on network analysis derives from a desire to decode the intricacies of YouTube’s recommendation algorithm, as detailed in Sect.  4.1.2 . Here, we examine the structure of the recommendation network, which is composed of parent videos and their associated recommended (child) videos. This network representation is key to understanding how certain content gains traction and influences viewer choices, potentially leading to biases in what is recommended to users.

Central to our network analysis is the application of Eigenvector Centrality (EC), as shown in Eq.  1 . This metric is insightful because it evaluates both the number and the quality of connections a video has. In the equation, \(EC(v)\) represents the centrality of a video \(v\) . The term \(\lambda\) is the largest eigenvalue of the adjacency matrix \(A\) , which normalizes the centrality values. The adjacency matrix \(A\) itself reflects the connection strengths between videos, where \(A_{uv}\) indicates the strength between video \(v\) and its neighbor \(u\) . The sum \(\sum _{u \in N(v)}\) takes into account all neighboring videos \(u\) of \(v\) . Essentially, a video with high eigenvector centrality, as calculated by this formula, is one that is recommended by other influential videos. This indicates a form of indirect influence that can significantly shape viewer consumption patterns, highlighting videos that are important due to their strong connections to other significant videos.

Our analysis, conducted with the aid of Gephi software (Bastian et al. 2009 ) identified videos that serve as pivotal nodes within the recommendation network. These influential videos can be seen as key influencers or trend starters, guiding the direction of video recommendations based on how viewers interact with them. By examining these key influencers, we aimed to discern whether and how they contribute to perpetuating certain biases within the recommendation algorithm.

We also used modularity values in the network analysis. Modularity values are important because they help to identify those nodes (individuals, entities) that are central or influential within their respective communities or modules. By focusing on nodes with high modularity values, one can target influencers who are not just broadly connected across the entire network but also pivotal within their specific communities. This approach enhances the effectiveness of strategies that rely on these influencer nodes for information dissemination, ensuring that efforts are concentrated on individuals who can mobilize or impact their immediate community significantly.

We also incorporated modularity values into our network analysis to refine the identification of influential nodes within their respective communities or modules. High modularity values signify nodes that are not only broadly connected but also hold pivotal positions within specific communities, enhancing the targeted strategies for information dissemination. This consideration ensures that efforts are concentrated on individuals who can significantly impact their immediate community.

The modularity \(Q\) of a network, particularly utilizing the Louvain method, is defined by the formula shown in Eq.  2 .

In this formula \(A_{ij}\) represents the adjacency matrix element, \(k_i\) and \(k_j\) are the degrees of nodes \(i\) and \(j\) , \(m\) is the total weight of all edges, and \(\delta (c_i, c_j)\) is the Kronecker delta function; the Kronecker delta is 1 if nodes \(i\) and \(j\) are in the same community and 0 otherwise. This formula aids in identifying nodes that are central within their communities, thereby highlighting the influencers who play a critical role in the dissemination of content within specific segments of the network.

4.7 Examination of engagement metrics

The analysis of engagement metrics plays a pivotal role in understanding biases in video recommendation systems. Engagement metrics, such as views, likes, and comment counts, serve as indicators of a video’s popularity and audience interaction. These metrics can be instrumental in revealing whether recommendation algorithms disproportionately favor more popular content, potentially leading to a bias in recommendations.

Our examination focused on the hypothesis that recommendation algorithms might be inclined to suggest videos with higher engagement metrics as users explore content more deeply. This potential bias could manifest in a cycle where already popular videos gain further visibility, overshadowing less-viewed content regardless of its relevance or quality. To investigate this, we considered a video popular based on the high view counts, substantial likes, and a significant number of comments.

In our methodology, we tracked the engagement metrics of videos across various recommendation depths. This approach allowed us to analyze patterns in how the recommendation algorithm prioritizes content based on viewer engagement. If we observe a trend where videos with higher engagement consistently appear in recommendations, it could indicate an algorithmic preference for popular content.

This analysis is crucial in understanding how engagement metrics could skew the diversity of recommended content. A bias towards highly engaged videos might limit the exposure of newer or relevant content, potentially narrowing the spectrum of ideas and perspectives presented to viewers. By examining engagement metrics, we aim to uncover and understand these potential biases, contributing to a more comprehensive understanding of the factors influencing content recommendations on digital platforms.

4.8 Statistical measurement

Our analysis utilizes statistical methods to validate the presence and extent of biases within YouTube’s recommendation system, as outlined in Sect.  4 . Through quantitative evaluation, we assess the significance of deviations in content distribution and engagement metrics from expected norms. This approach ensures a robust understanding of recommendation biases, supporting our findings with concrete evidence of how these biases may affect content visibility and user interaction patterns across the platform.

4.8.1 Drift significance

A key component of our statistical analysis is the evaluation of drift significance, particularly how content distribution changes across recommendation depths. For this purpose, we utilized the Chi-Square test (Pearson 1900 ), a robust statistical method for examining the relationship between categorical variables. This test compares the observed frequency of categories at different recommendation depths against expected frequencies, assuming no underlying bias, as detailed in Eq.  3 .

In this context, \(O_{i}\) represents the observed frequency of each category within the recommendation depths, while \(E_{i}\) denotes the expected frequency, calculated based on the assumption of uniform distribution across depths. The expected frequencies are derived using the formula shown in Eq.  4 . This calculation helps us to establish a baseline against which to measure the extent of deviation (or drift) from expected content distribution patterns. In our case, the rows were the categorical values, and the columns were the recommendation depths. This setup allows for a detailed analysis of how content categories distribute across different levels of recommendation, providing insights into the recommendation algorithm’s behavior and its potential biases.

We set a distinct significance level of 0.05 to ascertain the threshold for rejecting the null hypothesis, which posits that “there is no drift in the distribution of the categorical variables between the different recommendation depths." The degrees of freedom, crucial for understanding the distribution’s variance, are calculated as shown in Eq.  5 .

The p -value, which is the area under the Chi-Square distribution curve to the right of the observed \(\chi ^2\) statistic for the calculated degrees of freedom, indicates the probability of observing a result as extreme as, or more extreme than, what was actually found, assuming the null hypothesis is true. If the p -value is small (less than the significance level), then we reject the null hypothesis and conclude that there is evidence of drift in the distribution of the categories between the depths. Conversely, a larger p -value suggests that the observed differences could have occurred by random chance, leading us to fail to reject the null hypothesis, indicating no significant drift or difference between the depths.

As we mention, the calculation of the p -value requires integrating the Chi-Square probability density function (PDF) from the observed \(\chi ^2\) value to infinity, which is not typically done by hand. Instead, one would use statistical tables designed for this purpose, but due to the limited values that can be retrieved from the Chi-Square distribution table, we have used a statistical software, a Python module which uses the “chi2_contingency" function from the “scipy.stats" package. This module uses an approximation method to calculate the p -value for large Chi-Square statistics. For very large Chi-Square statistics, the p -value may be approximated to 0.0 due to limitations in floating-point arithmetic and computational precision. This approximation is reasonable because such large Chi-Square statistics indicate a very strong deviation from the null hypothesis, making it highly unlikely to observe such extreme results under the assumption of independence.

Through this statistical framework, we aim to provide a concrete measure of bias, offering a more definitive understanding of how recommendation algorithms might skew content distribution. This methodological rigor enhances the credibility of our findings, facilitating a deeper exploration into the mechanics of bias within YouTube’s recommendation system.

4.8.2 Inequality quantification

In examining biases within YouTube’s content recommendation algorithms, analyzing disparities in engagement metrics reveals critical insights. This method allows us to inspect how viewer interactions are distributed among recommended videos, shedding light on potential inequalities that may affect content visibility and the overall user experience.

The Atkinson Index (Atkinson et al. 1970 ), a measure initially developed to assess income inequality within populations, provides a useful framework for examining disparities in content engagement on YouTube. This index quantifies the extent to which individual data points (in our case, engagement metrics such as likes, comments, and views) diverge from a perfectly equal distribution. The Atkinson Index is defined as shown in Eq.  6 .

\(A(\epsilon )\) represents the Atkinson Index, with \(\epsilon\) being a parameter that determines the sensitivity of the measure to changes in different parts of the distribution. A higher value of \(\epsilon\) indicates a greater sensitivity to inequalities at the lower end of the distribution. \(n\) is the number of videos considered in a particular depth of recommendation. \(p_i\) is the proportion of total engagement (likes, views, or comments) that the \(i\) -th video receives relative to the total engagement of all videos in the dataset.

In our analysis, we adopted an \(\epsilon\) value of 0.5 to balance the measure’s sensitivity to inequalities at both the lower and upper ends of the engagement spectrum. The choice of \(\epsilon\) is significant because it allows for the adjustment of the index’s focus, with higher values prioritizing the lower end of the distribution. As \(\epsilon\) approaches infinity, the Atkinson Index nears 1, reflecting an increasing emphasis on disparities at the lower end of the engagement spectrum.

By applying the Atkinson Index to the engagement metrics of recommended videos, we aim to quantify the level of inequality present within each recommendation depth. This analysis allows us to assess whether the YouTube recommendation algorithm exhibits a bias towards videos with significantly higher engagement metrics, potentially marginalizing content with lower but still substantial engagement levels.

Evaluating engagement inequality with the Atkinson Index sheds light on the dynamics of content recommendation and visibility on YouTube. It helps identify if the recommendation system perpetuates a concentration of attention on a small subset of highly popular videos, thereby reinforcing existing visibility and engagement disparities. Such insights are crucial for understanding the broader implications of algorithmic recommendation practices on content diversity and user exposure.

4.8.3 Understanding gaussian distributions and statistical measures

In the study of complex datasets, whether examining patterns in digital narratives or analyzing trends in social data, the application of statistical measures provides a foundational framework for both describing and understanding variability within the data. Central to this framework is the concept of the Gaussian distribution, often referred to as the normal distribution, which is a fundamental statistical distribution pattern observed in many natural phenomena and datasets.

The Gaussian distribution is characterized by its symmetric, bell-shaped curve, where the majority of observations cluster around a central value (the mean), decreasing in frequency as they diverge towards the extremes. This distribution is mathematically defined by its mean \(\mu\) and standard deviation \(\sigma\) , where the following is true:

The mean \(\mu\) represents the average value of the dataset, providing a central point around which the data is distributed.

The standard deviation \(\sigma\) quantifies the dispersion or variability of the dataset, indicating how spread out the data points are from the mean.

Formally, the Gaussian distribution can be expressed through its probability density function (PDF) as shown in Eq.  7 :

Within the context of Gaussian distributions, the concepts of mean + std ( \(\mu\) + \(\sigma\) ) and mean + 2std ( \(\mu\) + \(2\sigma\) ) serve as crucial analytical thresholds. These measures are instrumental in understanding the distribution of data:

Mean + std ( \(\mu\) + \(\sigma\) ): Approximately 68% of the data in a Gaussian distribution falls within one standard deviation of the mean. This range identifies the most common variance from the average, encapsulating the bulk of data points in a typical distribution.

Mean + 2std ( \(\mu\) + \(2\sigma\) ): Expanding the range to two standard deviations from the mean encompasses approximately 95% of the data. This broader threshold is critical for identifying outliers, which are the data points that lie beyond the typical range of variation. These outliers can signify extreme cases or occurrences that deviate significantly from the norm.

The application of these statistical measures and thresholds provides a powerful lens for analyzing and interpreting data. In real-world scenarios, understanding the distribution of data within these thresholds enables researchers to do the following:

Identify patterns and trends that are central to the dataset

Detect outliers or anomalies that may warrant further investigation

Make informed decisions based on the statistical behavior of the data

Employing these statistical concepts allows for a nuanced analysis that goes beyond mere averages, offering insights into the variability and extremities of the data. This approach is invaluable across a spectrum of fields, from social sciences to natural phenomena, enabling a deeper comprehension of the underlying patterns and behaviors within complex datasets.

In this section, we delve into our findings, unraveling the dynamics of narrative drift across various dimensions, including influencer nodes, engagement metrics, and other pivotal elements that underscore the algorithmic influence on content dissemination and reception.

5.1 Emotion drift

In our comprehensive analysis, as detailed in Sect.  4.2 , we embarked on an investigation to discern the presence of emotion drift across various narratives, alongside their respective attributes as elaborated in Sect.  4.1.3 . This inquiry was established on the hypothesis that emotional tones could significantly shift across the depth of recommendations, a phenomenon we aimed to quantify and understand within the context of digital discourse.

Our findings, illustrated in Fig.  1 , particularly spotlight the China-Uyghur Conflict as a case study. Initial expectations, based on the narrative’s nature discussed in Sect.  2.1 , suggested predominantly negative emotions. However, the data revealed notable emotion drifts, especially noticeable in the attributes of titles and descriptions (Fig.  1 a and b). These attributes exhibited a remarkable increase in neutrality and emergence of joy, with a simultaneous decrease in negative emotions such as anger and fear.

Conversely, the transcriptions and comments associated with the videos exhibited less variation. This could be attributed to the extended length of transcripts, which tends towards neutrality, and the varied nature of comments. Specifically, Fig.  1 c demonstrated an increase in neutral expressions and a decrease in disgust, whereas Fig.  1 d showcased an uptick in joy and a trace of surprise. Collectively, these findings underscore a general decline in negative sentiment across the recommendation depth, signaling a notable emotion drift within the context of the China-Uyghur Conflict narrative.

For the narrative concerning the South China Sea Dispute, our examination was rooted in expectations of initially negative emotions, as outlined in Sect.  2.2 . This narrative, similar to the China-Uyghur conflict, demonstrated a decline in negative sentiments, with a discernible decrease in anger as shown in Fig.  2 . Intriguingly, the descriptions, as captured in Fig.  2 b, unveiled an initial spike in fear at the outset of recommendations. Yet, this was followed by a notable reduction in fear levels later on. These observations collectively highlight a broader trend towards diminished negative sentiments, underscoring an emotion drift within the discourse of the South China Sea Dispute.

For the Cheng Ho narrative, expectations were set for initially positive emotions, as outlined in Sect.  2.3 . True to prediction, this narrative began with a higher combination of neutrality and joy compared to others. As depicted in Fig.  3 , these levels largely remained consistent across recommendation depths. Notably, Fig.  3 b shows a minor increase in fear, yet the overall emotional distribution maintained its positivity. Thus, the Cheng Ho narrative exhibited minimal emotion drift, with emotional levels showing negligible fluctuation, distinguishing it from the variability observed in other narratives.

In summarizing the emotion analysis across different narratives, our investigation revealed a subtle spectrum of emotion drift. Narratives characterized by negative or contentious themes exhibited notable shifts towards less negative emotional expressions across recommendation depths, indicating a dynamic emotional response to changing content. Conversely, narratives with inherently positive themes demonstrated stability in their emotional tone, with minimal shifts observed.

The analysis revealed significant shifts in the emotional tone of narratives, particularly from initially negative to increasingly neutral and positive emotions. This shift can be attributed to YouTube’s recommendation algorithm, which may prioritize content fostering longer engagement and positive user experiences. As users engage with content, the algorithm adjusts to suggest videos that are less emotionally charged and more balanced in tone. This trend highlights the dynamic nature of YouTube’s recommendation system and its potential impact on user perceptions and behavior. Understanding these shifts is crucial for developing strategies to enhance the quality and impact of online content recommendations, promoting constructive dialogue and reducing polarization.

To meticulously assess the presence and statistical significance of emotion drift across narratives, we utilized the Chi-Square method as outlined in Sect.  4.8.1 . With seven emotion categories across five depths, our analysis yielded 24 degrees of freedom, derived from the formula in Eq.  5 .

The p -values obtained for the emotion drift, as detailed in Table 5 , were below the strict significance threshold of 0.05 for both the China-Uyghur Conflict and the South China Sea Dispute narratives. This indicates a significant deviation from the null hypothesis, affirming the presence of an emotion drift among the depths. Conversely, the Cheng Ho Propaganda narrative exhibited p -values near the threshold but below it, except in one instance where the value was higher, in contrast to the extremely low values (almost zero) observed in other narratives. This suggests a presence of drift, though in a weaker form compared to other narratives. Finally, given that the sample size for comments is approximately 100 times larger than that for the other elements, it inherently possesses a higher sensitivity to detect changes. This increased sensitivity is reflected in the larger Chi-Square statistics observed for comments.

figure 1

Emotion distribution across all attributes of the China-Uyghur conflict

figure 2

Emotion distribution across all attributes of the South China sea dispute

figure 3

Emotion distribution across all attributes of the Cheng Ho propaganda

5.2 Morality drift

In our comprehensive analysis of morality drift within digital narratives, we delve into the evolution of moral values across various narratives, examining how these values fluctuate through the recommendation depth of video content. This undertaking, detailed in Sect.  4.3 , involves assessing the prevalence of moral virtues and vices, captured through mean scores at the sentence level, to measure the moral tone spreading different layers of digital discourse.

Our investigation into the China-Uyghur Conflict anticipated a dominance of negative moral values. The findings, as illustrated in Figs.  4 and 5 , reveal an initial decline in vices such as harm, which was the highest one initially, as well as cheating, betrayal, subversion, and degradation, with a trend towards stabilization in deeper recommendation levels. Conversely, virtue scores, especially loyalty as highlighted in Fig.  5 a–d, generally exhibit an increase or remain stable, underscoring a shift towards more positive moral values in the narrative discourse. In summary, for the China-Uyghur Conflict specifically, our analysis revealed a marked decrease in negative moral values (vices) and an increase or stabilization in positive moral values (virtues), notably loyalty.

In analyzing the South China Sea Dispute, subtle shifts in moral values were observed. Vice values, indicated in Fig.  6 , showed minor variations; increases in degradation and harm were noted in titles in Fig.  6 a, while a slight overall decrease in vices, except for degradation, was seen in descriptions in Fig.  6 b. Transcriptions in Fig.  6 c revealed a slight rise in harm and degradation, with other values remaining steady. Comments in Fig.  6 d initially increased in vices at depth one, but subsequently all vice values declined.

Virtue values presented more fluctuation, particularly in titles in Fig.  7 a, where three virtues increased and two decreased, with sanctity experiencing the most significant change. In descriptions in Fig.  7 b, we observed an initial decrease and then an increase later on. Transcriptions in Fig.  7 c showed virtue levels to be relatively stable. Lastly, in Fig.  7 d we noticed an increase in all virtues for comments.

In summarizing the moral dynamics within the South China Sea Dispute, it becomes evident that the shifts in moral values across various depths of recommendations present a complex pattern, lacking a straightforward trajectory.

In the analysis of the Cheng Ho Propaganda, the moral landscape presented diverse shifts. Vice values, referenced in Fig.  8 , depicted an uptick in harm across titles as shown in Fig.  8 a, alongside minor increases in other vices. Descriptions and transcriptions, in Fig.  8 b and c, demonstrated slight decreases in some values, while others remained unchanged. Comments, as per Fig.  8 d, showed a mix of stability and slight increases in certain vices.

Virtue values, detailed in Fig.  9 , varied across the board, with transcriptions in Fig.  9 c experiencing more uniform changes, contrasting with the significant fluctuations in other attributes, both in terms of increases and decreases.

In the examination of narratives surrounding the China-Uyghur Conflict, South China Sea Dispute, and Cheng Ho Propaganda, our investigation into the phenomenon of morality drift reveals a multifaceted spectrum of moral values that vary significantly across different levels of analysis. Through rigorous sentence-level evaluation, this analysis uncovers a dynamic shift accompanied by marked fluctuations in virtues and vices. Some narratives illustrate a discernible movement towards the stabilization of virtues or a reduction in vices, whereas others display complex patterns that go against a linear progression. These observations collectively highlight the relationship between the dissemination of digital content and the changing moral perspectives of audiences.

Building upon this foundation, our subsequent analysis, detailed in Sect.  4.8.1 , delves into the statistical significance of morality drift. We calculated the p -values and chose the maximum value as the moral value for that sentence. In this way, we have calculated the count for the Chi-Square evaluation. Therefore, these moral values are not count distribution values; they are mean scores of each word in a sentence across the entire dataset, as shown in Figs.  4 , 5 , 6 , 7 , 8 , and 9 . The empirical evidence, as shown in Table 6 , characterized by p -values consistently falling below the threshold of 0.05, clearly confirms the presence of significant shifts in moral content across the majority of cases examined. In some instances, the degree of drift is profoundly marked, with p -values approaching zero or, in certain cases, rounding to zero. This quantitative validation reinforces the notion of an ongoing evolution in moral values, further illustrating the complex interplay between content exposure and the evolution of moral perception within the digital era.

figure 4

Moral vices distribution across all attributes of the China-Uyghur conflict

figure 5

Moral virtues distribution across all attributes of the China-Uyghur conflict

figure 6

Moral vices distribution across all attributes of the South China sea dispute

figure 7

Moral virtues distribution across all attributes of the South China sea dispute

figure 8

Moral vices distribution across all attributes of the Chengh Ho propaganda

figure 9

Moral virtues distribution across all attributes of the Chengh Ho propaganda

5.3 Toxicity drift

In our refined analysis of toxicity shifts within digital narratives, we leveraged the toxicity measurement framework as outlined in Sect.  4.4 . This approach emphasizes not only the evaluation of average toxicity levels but also a rigorous examination of extreme toxicity instances, employing the mean + 2std metric. As detailed in Sect.  4.8.3 , this metric, grounded in the principles of Gaussian distributions, effectively highlights data points that significantly deviate from the norm, serving as a crucial threshold for identifying highly toxic content.

For our toxicity analysis, we primarily focused on average toxicity values, which constituted a single variable. Recognizing that the Chi-Square test is unsuitable for such cases, we instead applied Gaussian distribution principles to identify and scrutinize outliers, particularly those representing high toxicity levels, thus providing a clearer understanding of underlying trends.

This methodology facilitates a complex understanding of toxicity across different digital platforms, enabling us to distinguish between prevalent toxicity trends and the emergence of content that escalates from being merely unpleasant to unequivocally harmful. By applying the mean + 2std threshold, we can precisely identify and analyze instances of extreme toxicity, thereby illuminating the dynamics of toxicity within digital narratives at various levels of content recommendation.

In the context of our analysis on the narrative surrounding the China-Uyghur conflict, initial findings revealed an intriguing pattern: the mean toxicity levels were notably low from the starting point. More interestingly, a specific trend was observed at the initial depth of content recommendation, where mean toxicity demonstrated a marked reduction, hinting at the narrative that was becoming progressively less inflammatory over time. This trend of low mean toxicity levels not only persisted but also stabilized, maintaining a low profile across the entirety of our observation period.

Upon delving into the high toxicity levels, distinct fluctuations across different content aspects became apparent. For instance, the toxicity levels within narrative titles, as illustrated in Fig.  10 a, initially decreased, only to rise again, reflecting a fluctuating pattern of toxicity over time. Conversely, the narrative descriptions, shown in Fig.  10 b, experienced an initial uptick in toxicity levels, which subsequently decreased, indicating a shift towards moderation after an initial period of heightened toxicity. The most pronounced decline in high toxicity levels was observed in transcription content, as detailed in Fig.  10 c, suggesting a significant reduction in the toxicity of this content segment. On the other hand, user-generated comments, as seen in Fig.  10 d, experienced a slight increase in toxicity.

Despite these variable trends in instances of high toxicity, the narrative experienced a decrease in aggregate toxicity levels. This observation suggests a gradual improvement in the content recommendation algorithms of digital platforms, steering the narrative towards a more moderated and less extreme discourse on the China-Uyghur conflict. Such a trend is indicative of an evolving digital ecosystem that is becoming increasingly adept at managing and mitigating the spread of toxic content, contributing to a more controlled and constructive online discourse.

In our comprehensive analysis of the South China Sea Dispute narrative, we observed an initial trend where mean toxicity levels were notably low and stable, closely paralleling the findings from the China-Uyghur Conflict. These levels were so minimal that they nearly approached zero, a trend clearly depicted in Fig.  11 . This consistency underscores a baseline of low toxicity across different narratives within our dataset.

Remarkably, at the initial recommendation depth (depth 0), our analysis identified no instances of highly toxic content within titles, descriptions, and transcriptions. These content attributes uniformly failed to cross the high toxicity threshold, as delineated in Fig.  11 a–c.

However, the narrative complexity increased beyond depth 0, where we began to observe the emergence of high toxicity content. Specifically, Fig.  11 a revealed a progressive increase in toxicity levels with each subsequent depth, although this trend plateaued at depth 4, indicating a stabilization in the toxicity of narrative titles. Conversely, Fig.  11 b showcased an initial spike in toxicity at depth 1, followed by a significant reduction at depth 2, with only a slight recovery thereafter. A somewhat parallel trend was observed in Fig.  11 c, mirroring the behavior of descriptions but with a less pronounced decrease at depth 2, followed by a mild increase in toxicity levels.

The most notable finding was within the user comments, as captured in Fig.  11 d, where high toxicity levels peaked, nearing 0.9. This intensity was consistently maintained across different depths, suggesting an area of concentrated toxicity within the narrative. Despite this, the overall mean toxicity distribution remained largely unaffected across all attributes, illustrating a dynamic that mirrors the previously analyzed China-Uyghur Conflict narrative. This observation points to a nuanced understanding of narrative engagement, where, despite the presence of highly toxic comments, the aggregate toxicity level across the narrative’s content did not exhibit a significant shift, maintaining a low and stable mean toxicity level.

In the exploration of the Cheng Ho Propaganda narrative, our analysis found a pattern consistent with other narratives regarding initial mean toxicity levels: predominantly low, with occasional spikes in high toxicity instances, as illustrated in Fig.  12 . The titles showed an immediate increase in high toxicity levels in Fig.  12 a, followed by subsequent fluctuations. This variability suggests a dynamic narrative engagement from the outset. Meanwhile, descriptions depicted a different pattern, with a notable decrease in high toxicity at depth 1, before increasing again as shown in Fig.  12 b. This oscillation in toxicity levels indicates a nuanced content landscape that evolves with user interaction depth.

Contrastingly, transcriptions did not exhibit high toxicity at the surface level (depth 0), as shown in Fig.  12 c; however, an increase was observed as users engaged more deeply, eventually stabilizing. Comments, on the other hand, demonstrated a gradual increase in high toxicity levels with each deeper engagement level in Fig.  12 d, hinting at a compounding effect of user interactions on toxicity.

Our findings suggest that as users navigate through deeper layers of recommendations, the likelihood of encountering highly toxic content subtly increases. Nonetheless, it appears that content recommendation algorithms are designed with a cautious balance in mind, aiming to maintain overall low toxicity levels. This approach suggests a potential systemic bias towards creating a safer, more welcoming digital environment at the cost of possibly filtering out a broader spectrum of voices. Such a strategy underscores the inherent challenge platforms face in curating content: they must navigate the fine line between reducing exposure to potentially harmful content and preserving a space favorable to free expression and diverse viewpoints. This balancing act reflects the complexities of managing digital narratives, aiming to ensure user safety while fostering an inclusive and neutral platform.

figure 10

Toxicity distribution across all attributes of the China-Uyghur conflict

figure 11

Toxicity distribution across all attributes of the South China sea dispute

figure 12

Toxicity distribution across all attributes of the Chengh Ho propaganda

5.4 Topic drift

To examine the evolution of topics within the narratives, we employed BERTopic, as detailed in Sect.  4.5 . Given the model’s extensive topic generation, we focused on the three most prevalent topics for each depth level, enabling us to track topic transitions effectively. This approach allowed us to identify a range from a minimum of three to a maximum of fifteen topics by the conclusion of depth 4, dependent on the degree of topic overlap and shift.

For the analysis of topic drifts, it’s important to note that at each depth level, both the count and labels of topics change; for instance, depth \(a\) might have \(x\) topics, while depth \(b\) might have \(y\) different topics. Given this variability, the Chi-Square test, which requires a fixed distribution for accurate calculations, was not applicable in our case.

In the case of the China-Uyghur conflict, Fig.  13 reveals a significant initial emphasis on the topic of “genocide" (topic_id 384), characterized by keywords such as “genocide, detainees, persecution, internment, holocaust". This is visually represented in blue, being a part of a large portion of the dialogue at depth 0.

Across various attributes, we observed a rapid decrease in the prevalence of the genocide topic in the initial depths, eventually vanishing in the latter stages. At the same time, new topics emerged, notably those related to soccer, as shown in Fig.  13 a, as well as themes involving actresses and singers, depicted in Fig.  13 b, folklore, highlighted in Fig.  13 c, and songs, detailed in Fig.  13 d by depth 4. This dramatic shift in thematic focus underscores a significant topic drift within the discourse surrounding the China-Uyghur Conflict.

For the South China Sea Dispute narrative, our analysis using BERTopic identified an initial focus on political elements, specifically the topic of “candidate" illustrated by keywords such as “candidacy, candidate, candidates, presidential, presidency" at depth 0. This focus was evident in titles, descriptions, and transcriptions as portrayed in Fig.  14 . Early discussions also highlighted environmental concerns, with “reefs" and “corals" mentioned as at risk from Chinese operations as illustrated in Fig.  14 a, alongside “harbour" and “naval" topics, indicating initial sea-related discussions. As the narrative progressed, these topics gave way to more militaristic themes, such as “warships" and “missiles" which maintained a connection to the sea and potential conflicts. Interestingly, the comments diverged, introducing unrelated topics like films, authors, and singers by depth 4 as shown in Fig.  14 d, illustrating a topic shift. However, except for comments, the emerging topics remained aligned with the narrative’s core themes, indicating a focused yet evolving discourse.

In the Cheng Ho narrative, the initial dominant topics were associated with keywords like “yang, yin, rituals, religions, shamanism" as illustrated in Fig.  15 . This suggests Zheng He’s voyages were not just exploratory but also aimed at fostering harmony and spiritual unity through engagement with various religious practices and shamanistic rituals. Throughout the narrative, the topics remained relevant to these original themes. By depth 4, discussions evolved to include cultural expressions such as festivals, celebrations, dance, and folklore. However, the comments section, as depicted in Fig.  15 d, showcased a mix of related and unrelated discussions. This narrative, akin to the South China Sea Dispute, experienced a thematic evolution where initial and later topics, despite their differences, were interconnected. The comments, however, displayed a broader spectrum of topics by the end, indicating a diversification of discussion themes.

figure 13

Topic distribution across all attributes of the China-Uyghur conflict

figure 14

Topic distribution across all attributes of the South China sea dispute

figure 15

Topic distribution across all attributes of the Cheng Ho propaganda

5.5 Influencer nodes

As detailed in Sect.  4.6 , we employed Gephi software for network visualization and identification of influencer nodes. To isolate influential entities, we computed modularity values and eigenvector centrality. Modularity values helped us to dismiss insignificant communities due to their minimal size and relevance. Furthermore, nodes demonstrating higher eigenvector centrality were indicative of their influential capacity, as represented by their enlarged node sizes. For each network depth, beginning from depth 1, we pinpointed the top 5 influencer nodes. This initial depth was chosen deliberately to concentrate on nodes that apply a direct impact on the network’s foundational layer, thereby playing a pivotal role in the dissemination of information or behaviors. Unlike previous approaches that utilized BERTopic for topic identification, we choose manual inspection of video titles. Our decision to focus on titles, rather than other elements, was strategic. While we acknowledge the importance of various factors in content engagement, titles are often the decisive factor for viewers when navigating through recommendation networks. This approach allowed us to focus on influencer videos effectively, understanding their pivotal role in guiding viewer choices across the network depths.

In exploring the China-Uyghur conflict narrative through YouTube’s recommendation network, we uncover the subtle role of influencer nodes across varying depths. This is illustrated in both Table 7 and Fig.  16 . Initially, viewers are presented with a diverse array of videos, from a Hebrew alphabet tutorial to an analysis of Taiwan-China military drills, illustrating the broad and wide-ranging gateway into the network. This initial diversity sets the stage for a journey through thematic shifts and content divergence.

As the viewer explores further, the network skillfully shifts attention, presenting videos on topics like alternative medicine discussions and financial questions. Although these topics are not directly connected to the central theme of geopolitics, they showcase how the algorithm plays a role in expanding the range of the conversation. These topics, despite diverging from the initial narrative, hold high eigenvector centrality scores, indicating their significant influence within the network’s structure.

By the third depth, a thematic centralization occurs around specific sub-themes such as legal critiques and financial scrutiny, further demonstrating the algorithm’s capacity to narrow or expand the viewer’s focus. This stage reflects a complex balance between engaging with the core narrative and exploring peripheral topics.

At the final analyzed depth, the narrative journey expands dramatically, introducing a wide range of topics from U.S. political scandals to space telescope discoveries. This broadening into unrelated areas showcases the influencer nodes’ pivotal role in shaping content pathways, revealing the algorithm’s potential to simultaneously narrow and broaden viewer exposure to diverse content.

The South China Sea dispute narrative within YouTube’s recommendation network presented in Table 8 and Fig.  17 offers a concise example of how influencer nodes shape content pathways, starting from a surprising entry point: a blizzard warning in San Diego with the highest eigenvector score at the first depth. This initial weather-related video, seemingly unrelated to geopolitical themes, underscores the algorithm’s capacity to introduce diverse topics, potentially affecting the subsequent recommendation chain.

As the narrative progresses to the second depth, the focus shifts to global issues and environmental concerns, reflecting a broader exploration of themes such as new social orders and nuclear disarmament. The significant presence of videos on environmental protection and global governance illustrates the network’s influence in steering the audience towards a complex understanding of the interplay between geopolitical conflicts and broader global challenges.

By the third depth, the narrative focuses on regional crises, highlighted by detailed coverage of weather-related disasters across California. This focus on immediate, tangible events suggests an algorithmic response to viewer interest, demonstrating the dynamic nature of content recommendations, which can swiftly pivot from global discussions back to localized concerns.

The final depth returns to strategic and speculative themes, with a notable focus on U.S. naval preparation against China, technological advancements, and global demographic issues. The appearance of a strategic military video as the influencer at this depth signifies a full-circle return to geopolitical considerations, although within a much broader context that includes scientific discovery and future technological impacts.

For the Cheng Ho narrative, the YouTube recommendation network depicted in Table 9 and Fig.  18 demonstrates a focused exploration of Cheng Ho’s historical legacy, with less drift in content across the initial depths, highlighting the algorithm’s ability to maintain thematic consistency.

At the first depth, the narrative firmly centers around Cheng Ho’s contributions and legacy, featuring videos on his life, the Cheng Hoo Mosque, and his significance as a Muslim admiral in Semarang. The highest eigenvector score is assigned to a video directly related to Cheng Ho’s legacy, indicating a strong thematic entry point into the narrative. This depth is dedicated to immersing viewers in the historical and cultural impacts of Cheng Ho, emphasizing his historical significance and the spread of Islam in Southeast Asia.

Progressing to the second depth, the focus remains on Cheng Ho, with videos exploring his mosque, the Sam Poo Kong Temple, and his historical footprints in the archipelago. The narrative continues to delve deeper into Cheng Ho’s cultural and religious heritage, maintaining a coherent and focused exploration of his enduring legacy.

By the third depth, while there’s a slight broadening of themes, including a live streaming from KOMPASTV, the content largely stays on topic. The sustained interest in Cheng Ho is evident with the reappearance of videos on his expeditions and the introduction of a film about him, suggesting a diversification within the bounds of the Cheng Ho narrative rather than a significant drift.

It is only in the final depth that the narrative begins to shift towards contemporary political and social issues, featuring videos on political commentary, suspicious financial transactions, and controversies surrounding the KPK Chairman. This late-stage drift indicates a departure from the historical and cultural focus of earlier depths, suggesting the algorithm’s inclination to eventually introduce current socio-political discussions, possibly in response to broader viewer engagement trends or the inherent dynamics of the recommendation algorithm.

In conclusion, our findings reveal the sophisticated mechanisms at play within YouTube’s recommendation system, where influencer nodes through their strategic position and thematic influence play a critical role in either maintaining narrative focus or facilitating thematic drift. This insight into the algorithm’s operation not only illuminates the challenges in navigating digital content landscapes but also underscores the importance of understanding influencer nodes’ impact on public discourse and perception.

figure 16

Network graphs of the China-Uyghur conflict

figure 17

Network graphs of the South China sea dispute

figure 18

Network graphs of the Cheng Ho propaganda

5.6 Engagement bias

In accordance with the methodology outlined in Sect.  4.7 , we examined engagement metrics such as views, likes, and comments across various levels of recommendation depth. Utilizing box plots enabled us to illustrate not only the average engagement but also the distribution’s range, median, variance, and other statistical indicators.

Our analysis, as detailed in the narratives highlighted in Figs.  19 , 20 , and 21 , revealed a consistent pattern of engagement metrics ranking in the order of views, likes, and comments. Notably, there was a significant increase in engagement at the initial recommendation depth, followed by minimal increases or stable engagement at subsequent depths. Furthermore, we observed an expansion in the outlier boundaries, increasing by orders of magnitude towards the final depths.

These observations support our hypothesis that the recommendation algorithm favors videos with higher engagement metrics, thus enhancing the visibility of content that is already popular. This trend indicates an algorithmic bias towards popular videos, potentially at the cost of less viewed but equally relevant content.

Further statistical analysis, using the method discussed in Sect.  4.8.2 , investigated the distribution of engagement across depths for potential inequality. The results, as presented in Table 10 , show values nearing 1, indicating a pronounced inequality in the distribution of videos across depths based on likes, views, and comment counts. Additionally, with each successive depth, the Atkinson index increased, nearly reaching 1 by depth 4. This suggests that with each recommendation cycle, the distribution of video engagements becomes increasingly unequal, highlighting a growing disparity in content visibility based on engagement metrics.

figure 19

China-Uyghur conflict box plot representation for engagement statistics

figure 20

South China Sea dispute box plot representation for engagement statistics

figure 21

Cheng Ho propaganda box plot representation for engagement statistics

6 Conclusion and discussion

In this study, we embarked on a comprehensive examination of YouTube’s recommendation algorithm to understand its impact on the narrative and diversity of content. Through a methodical approach that included gathering extensive data, analyzing emotions and morals conveyed in videos, assessing content toxicity, exploring topics, conducting network analysis, and scrutinizing engagement metrics, we aimed to uncover the subtle ways in which algorithmic suggestions might shape the evolution of narratives and influence wider discourse on a range of topics. Our investigation sought to peel back the layers of YouTube’s algorithmic ecosystem to reveal how it potentially directs narratives in specific directions, thereby affecting the broader conversation spectrum.

Our exploration of YouTube’s recommendation algorithm revealed a complex and varied terrain of influence. Key findings include:

Emotion Drift: We observed a significant evolution in the emotional tone of narratives, shifting from initially negative to increasingly neutral and positive emotions. This pattern was particularly pronounced in the discussions surrounding the China-Uyghur Conflict and the South China Sea Dispute. The significant shift from negative to neutral and positive emotions, as supported by very low p -values, indicates the algorithm’s strong influence in modifying the narrative tone. Conversely, the Cheng Ho narrative displayed remarkable stability, showing minimal alterations, suggesting a resilience against the algorithm’s tendency to modify emotional undertones.

Moral Values: Our examination of moral values within the narratives uncovered a complex interplay of ethical considerations. Specifically, we noted a trend towards the promotion of uplifting moral values in the narrative related to the China-Uyghur Conflict, implying an intentional curation by the algorithm. This shift towards more positive moral tones highlights the algorithm’s potential role in presenting sensitive topics more positively.

Content Toxicity: Our analysis revealed YouTube’s proficiency in refining its recommendations to sustain minimal toxicity levels. This effective moderation strategy ensures a healthier content ecosystem. However, this selective filtering may also introduce biases, prioritizing certain narratives or viewpoints and subtly influencing the diversity of discourse.

Topic Analysis: Significant thematic evolution was observed within the China-Uyghur Conflict narrative, aligning with our observations regarding emotional and moral shifts. This suggests a broader algorithmic effort to enrich and diversify the narrative landscape. In contrast, the Cheng Ho narrative exhibited remarkable thematic consistency, indicating the algorithm’s selective curation strategy to maintain the integrity of specific historical or cultural narratives.

Network Analysis: Our network analysis illuminated the pivotal role of specific videos that act as influential nodes within the recommendation ecosystem. These key influencer nodes, identifiable by high levels of engagement or central themes, significantly shape the trajectory of narrative flow, enhancing, altering, or inhibiting the dissemination of narratives across YouTube.

Engagement Metrics: The algorithm’s bias towards promoting content with higher user interaction was evident, highlighting its role in shaping the visibility and distribution of content based on viewer engagement levels. This preference influences the broader narrative discourse on the platform.

Collectively, our findings paint a detailed portrait of YouTube’s recommendation algorithm as a powerful influencer in the crafting and governance of narratives. By intricately weaving together elements of emotional resonance, moral context, content toxicity management, thematic direction, network dynamics, and engagement focus, the algorithm positions itself as a central figure in the construction of the digital content ecosystem. It subtly steers user interactions and shapes public conversation, underscoring its role as a critical determinant of the online narrative fabric. This complex coordination highlights the algorithm’s capacity to subtly navigate user experience and influence the broader discourse.

From our analysis we can conclude these generalized takeaways:

The recommendation algorithm significantly shifts the emotional tone of content, often from negative to more neutral and positive tones, particularly in sensitive geopolitical topics.

There is a notable trend towards the promotion of positive moral values in certain narratives, suggesting an algorithmic bias towards more uplifting content.

The platform’s moderation strategies effectively minimize content toxicity, though this selective filtering could introduce biases.

Thematic evolution within narratives indicates an effort by the algorithm to diversify content, though certain narratives are maintained with remarkable consistency.

Influential nodes within the recommendation network play a pivotal role in shaping narrative flow and user engagement.

Engagement metrics reveal a preference for content with higher user interaction, impacting content visibility and distribution.

The significance of our research reaches beyond mere academic interest, providing vital perspectives on the content moderation strategies and ethical frameworks within digital platforms. This work enriches the ongoing dialogue concerning the need for algorithmic openness, equity, and the broader societal consequences stemming from digital media practices. By illuminating the intricate ways in which recommendation algorithms affect content discoverability and audience interaction, our study advocates for a more profound exploration of the moral aspects surrounding algorithmic control.

In wrapping up, our investigation highlights the profound influence of recommendation algorithms in crafting online narratives and shaping user journeys. It encourages continued academic exploration into the ethical and social implications of algorithmic choices, calling for a future in which digital platforms not only seek to engage but also to uphold values of diversity, justice, and openness.

7 Ethical considerations and positionality

We recognize that our perspectives and backgrounds influence our approach and interpretation of the data. As researchers based in the United States, our views on geopolitical issues may be shaped by our sociopolitical context. While striving for objectivity, we acknowledge that complete neutrality is unattainable. Awareness of these potential biases helps us critically evaluate our findings and present a balanced analysis.

Ethical considerations were paramount throughout our study. We ensured that all data collected from YouTube were publicly accessible and did not involve any personally identifiable information, thus complying with ethical standards for data privacy. Although our study did not involve direct interaction with human subjects, we adhered to ethical guidelines for using publicly available data, ensuring transparency and respect for user-generated content.

We considered the potential impact of our findings on public discourse and the digital ecosystem. By highlighting biases in YouTube’s recommendation algorithms, our goal is to contribute to more equitable and transparent digital platforms, promoting constructive dialogue and positive change while being mindful of the ethical implications of our work.

Data availibility

No datasets were generated or analysed during the current study.

Adeliyi O, Solaiman I, Shajari S, et al (2024) Detecting and characterizing inorganic user engagement on Youtube. In: Workshop Proceedings of the 18th international AAAI conference on web and social media: CySoc 2024: 5th international workshop on cyber social threats. AAAI, Palo Alto, California, https://doi.org/10.36190/2024.01

Aïmeur E, Amri S, Brassard G (2023) Fake news, disinformation and misinformation in social media: a review. Soc Netw Anal Min 13(1):30

Article   Google Scholar  

Al-Asad H, Zarkachi I (2023) The role of international amnesty in China’s discrimination conflict against Uyghur Muslims in Xinjiang 2018–2022. Mediasi: J Int Relat 6(2):23–31

Google Scholar  

Alp E, Gergin B, Eraslan YA et al (2022) Covid-19 and vaccine tweet analysis. Springer International Publishing, Cham, pp 213–229. https://doi.org/10.1007/978-3-031-08242-9_9

Book   Google Scholar  

Atkinson AB et al (1970) On the measurement of inequality. J Econ Theory 2(3):244–263

Article   MathSciNet   Google Scholar  

Banjo DS, Trimmingham C, Yousefi N, et al (2022) Multimodal characterization of emotion within multimedia space. In: Proceedings of the international conference on computers and computation (COMPUTE 2022)

Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the international AAAI conference on web and social media, pp 361–362

Bellogín A, Castells P, Cantador I (2017) Statistical biases in information retrieval metrics for recommender systems. Inf Retr J 20:606–634

Bhattacharya S, Spann B, Agarwal N (2024a) A computational approach to analyze identity formation: A case study of brazil insurrection. In: AMCIS 2024 Proceedings, https://aisel.aisnet.org/amcis2024/social_comp/social_comput/19

Bhattacharya S, Spann B, Agarwal N (2024b) Solidarity to storming: Assessing the socio-technical factors behind modern social movements. In: ECIS 2024 Proceedings, https://aisel.aisnet.org/ecis2024/track24_socialmedia/track24_socialmedia/17

Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901

Burke R, Felfernig A, Göker MH (2011) Recommender systems: an overview. Ai Magazine 32(3):13–18

Cakmak MC, Agarwal N (2024) High-speed transcript collection on multimedia platforms: Advancing social media research through parallel processing. In: 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), p 857–860, https://doi.org/10.1109/IPDPSW63119.2024.00153

Cakmak MC, Okeke O, Onyepunuka U, et al (2024a) Analyzing bias in recommender systems: A comprehensive evaluation of youtube’s recommendation algorithm. In: Proceedings of the 2023 IEEE/ACM international conference on advances in social networks analysis and mining. Association for Computing Machinery, New York, NY, USA, ASONAM ’23, p 753–760, https://doi.org/10.1145/3625007.3627300

Cakmak MC, Okeke O, Onyepunuka U, et al (2024b) Investigating bias in Youtube recommendations: Emotion, morality, and network dynamics in China-Uyghur content. In: Cherifi H, Rocha LM, Cherifi C, et al (eds) Complex Networks & Their Applications XII. Springer Nature Switzerland, Cham, pp 351–362, https://doi.org/10.1007/978-3-031-53468-3_30

Cakmak MC, Okeke O, Spann B, et al (2023) Adopting parallel processing for rapid generation of transcripts in multimedia-rich online information environment. In: 2023 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 832–837, https://doi.org/10.1109/IPDPSW59300.2023.00139

Cakmak MC, Shaik M, Agarwal N (2024c) Emotion assessment of youtube videos using color theory. In: Proceedings of the 2024 9th International Conference on Multimedia and Image Processing. Association for Computing Machinery, New York, NY, USA, ICMIP ’24, p 6–14, https://doi.org/10.1145/3665026.3665028

Cakmak MC, Agarwal N, Dagtas S, et al (2024a) Unveiling bias in youtube shorts: Analyzing thumbnail recommendations and topic dynamics. In: Proceedings of the 17th International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), accepted for presentation

Cakmak MC, Agarwal N (2024b) Unpacking algorithmic bias in youtube shorts by analyzing thumbnails. In: The 58th Hawaii International Conference on System Sciences (HICSS), accepted for presentation

Carpenter J, Brady W, Crockett M et al (2020) Political polarization and moral outrage on social media. Conn L Rev 52:1107

Chen J, Dong H, Wang X et al (2023) Bias and debias in recommender system: a survey and future directions. ACM Trans Inf Syst 41(3):1–39

Chen J, Dong H, Wang X, et al (2020) Bias and debias in recommender system: a survey and future directions. corr abs/2010.03240 (2020). arXiv preprint arXiv:2010.03240

Chubb A (2020) PRC assertiveness in the south China sea: measuring continuity and change, 1970–2015. Int Secur 45(3):79–121

Chung W, Zeng D (2020) Dissecting emotion and user influence in social media communities: an interaction modeling approach. Inf Manag 57(1):103108

Davis EVW (2008) Uyghur Muslim ethnic separatism in Xinjiang, China. Asian Affairs: Am Rev 35(1):15–30

del Fresno García M, Daly AJ, Segado Sanchez-Cabezudo S (2024) Identifying the new influences in the internet era: social media and social network analysis. Evista Española de Investigaciones Sociológicas 153:23–42. https://doi.org/10.5477/cis/reis.153.23

Depoix J (2023) Youtube transcript API. https://github.com/jdepoix/youtube-transcript-api

Devlin J, Chang M, Lee K, et al (2019) Pre-training of deep bidirectional transformers for language understanding in: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). Minneapolis, MN: Association for Computational Linguistics pp 4171–86

DiCicco K, Noor NB, Yousefi N, et al (2020) Toxicity and networks of Covid-19 discourse communities: a tale of two social media platforms. Proceedings https://ceur-ws.org/issn-1613-0073

Dotson J (2011) The Confucian revival in the propaganda narratives of the Chinese government. US-China Economic and Security Review Commission

Dwyer AM (2005) The Xinjiang conflict: Uyghur identity, language policy, and political discourse. East-West Center Washington, Washington, DC

Estrada MS, Juarez Y, Piña-García CA (2022) Toxic social media: affective polarization after feminist protests. Soc Media Soc. https://doi.org/10.1177/20563051221098343

Finlay R (2008) The voyages of Zheng He: ideology, state power, and maritime trade in Ming China. J Hist Soc 8(3):327–347

Fravel MT (2011) China’s strategy in the south china sea. Contemporary Southeast Asia pp 292–319

Grootendorst M (2022) Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794

Grootendorst M (2023) Bertopic wikipedia. https://huggingface.co/MaartenGr/BERTopic_Wikipedia/

Gurung MI, Bhuiyan MMI, Al-Taweel A, et al (2024) Decoding Youtube’s recommendation system: a comparative study of metadata and GPT-4 extracted narratives. In: Companion Proceedings of the ACM on Web Conference 2024. Association for Computing Machinery, New York, NY, USA, WWW ’24, p 1468–1472, https://doi.org/10.1145/3589335.3651913

Hanu L (2020) Detoxify. https://github.com/unitaryai/detoxify

Haroon M, Chhabra A, Liu X, et al (2022) Youtube, the great radicalizer? auditing and mitigating ideological biases in youtube recommendations. arXiv preprint arXiv:2203.10666

Harrigan P, Daly TM, Coussement K et al (2021) Identifying influencers on social media. Int J Inf Manage 56:102246

Hartmann J (2022) Emotion english distilroberta-base. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/

Hasmath R (2019) What explains the rise of majority-minority tensions and conflict in Xinjiang? Central Asian Surv 38(1):46–60

Herlocker JL, Konstan JA, Terveen LG et al (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53

Hopp FR, Fisher JT, Cornell D et al (2021) The extended moral foundations dictionary (EMFD): development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behav Res Methods 53:232–246

Israeli R (2010) China’s Uyghur problem. Isr J Foreign Affairs 4(1):89–101

Khanam KZ, Srivastava G, Mago V (2023) The homophily principle in social network analysis: a survey. Multimed Tools Appl 82(6):8811–8854

Kirdemir B, Agarwal N (2022) Exploring bias and information bubbles in Youtube’s video recommendation networks. In: Benito RM, Cherifi C, Cherifi H et al (eds) Complex Networks & Their Applications X. Springer International Publishing, Cham, pp 166–177

Chapter   Google Scholar  

Kirdemir B, Kready J, Mead E, et al (2021a) Examining video recommendation bias on Youtube. In: International workshop on algorithmic bias in search and recommendation, Springer, pp 106–116

Kirdemir B, Kready J, Mead E, et al (2021b) Assessing bias in youtube’s video recommendation algorithm in a cross-lingual and cross-topical context. In: Social, Cultural, and Behavioral Modeling: 14th International Conference, SBP-BRiMS 2021, Virtual Event, July 6–9, 2021, Proceedings 14, Springer, pp 71–80

Knijnenburg BP, Willemsen MC (2015) Evaluating recommender systems with user experiments. In: Recommender systems handbook. Springer, p 309–352

Kušen E, Cascavilla G, Figl K, et al (2017) Identifying emotions in social media: comparison of word-emotion lexicons. In: 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), IEEE, pp 132–137

Li H, Qian Y, Jiang Y et al (2023) A novel label-based multimodal topic model for social media analysis. Decis Support Syst 164:113863. https://doi.org/10.1016/j.dss.2022.113863

Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

Lü L, Medo M, Yeung CH et al (2012) Recommender systems. Phys Rep 519(1):1–49

Macaraig CE, Fenton AJ (2021) Analyzing the causes and effects of the south china sea dispute. J Territorial Marit Stud 8(2):42–58

Maslowska E, Malthouse EC, Hollebeek LD (2022) The role of recommender systems in fostering consumers’ long-term platform engagement. J Serv Manag 33(4/5):721–732

Mbila-Uma S, Umoga I, Alassad M et al (2023) Conducting morality and emotion analysis on blog discourse. In: Takada H, Marutschke DM, Alvarez C et al (eds) Collaboration Technologies and Social Computing. Springer Nature Switzerland, Cham, pp 185–192

Nechushtai E, Zamith R, Lewis SC (2023) More of the same? Homogenization in news recommendations when users search on Google, Youtube, Facebook, and twitter. Mass Communication and Society pp 1–27

Neumann D, Rhodes N (2024) Morality in social media: a scoping review. New Media Soc 26(2):1096–1126

Noor NB, Yousefi N, Spann B, et al (2023) Comparing toxicity across social media platforms for covid-19 discourse. In: The Proceedings of the Ninth international conference on human and social analytics (HUSO 2023). Copyright (c) IARIA, 2023, Barcelona, Spain, pp 21–26, https://www.thinkmind.org/index.php?view=article&articleid=huso_2023_1_50_80036

Okeke O, Cakmak MC, Spann B, et al (2023) Examining Content and Emotion Bias in YouTube’s Recommendation Algorithm. In: The Proceedings of the Ninth International Conference on Human and Social Analytics (HUSO 2023). Copyright (c) IARIA, 2023, Barcelona, Spain, pp 15–20, https://www.thinkmind.org/index.php?view=article&articleid=huso_2023_1_40_80032

Onyepunuka U, Alassad M, Nwana L et al (2023) Multilingual analysis of youtube’s recommendation system: Examining topic and emotion drift in the ‘cheng ho’narrative. In: Sixth international workshop on narrative extraction from texts (Text2Story 2023) co-located with the 45th European conference on information retrieval (ECIR 2023), Dublin, Ireland

Panger GT (2017) Emotion in social media. University of California, Berkeley

Pascual-Ferrá P, Alperstein N, Barnett DJ et al (2021) Toxicity and verbal aggression on social media: Polarized discourse on wearing face masks during the Covid-19 pandemic. Big Data Soc. https://doi.org/10.1177/20539517211023533

Pearson K (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci 50(302):157–175

Ping Y, Li Y, Zhu J (2024) Beyond accuracy measures: the effect of diversity, novelty and serendipity in recommender systems on user engagement. Electronic Commerce Research pp 1–28

Polatidis N, Georgiadis CK (2013) Recommender systems: the importance of personalization in e-business environments. Int J E-Entrepreneurship Innov 4(4):32–46

Poudel D, Cakmak MC, Agarwal N (2024) Beyond the click: How youtube thumbnails shape user interaction and algorithmic recommendations. In: The 16th International Conference on Advances in Social Networks Analysis and Mining (ASONAM), accepted for presentation

Radford A, Kim JW, Xu T, et al (2023) Robust speech recognition via large-scale weak supervision. In: International conference on machine learning, PMLR, pp 28492–28518

Radford A, Narasimhan K (2018) Improving language understanding by generative pre-training. https://api.semanticscholar.org/CorpusID:49313245

Rohani VA, Shayaa S, Babanejaddehaki G (2016) Topic modeling for social media content: a practical approach. In: 2016 3rd international conference on computer and information sciences (ICCOINS), IEEE, pp 397–402

Shaik M, Cakmak MC, Spann B, et al (2024) Characterizing multimedia adoption and its role on mobilization in social movements. In: Bui TX (ed) 57th Hawaii International Conference on System Sciences, HICSS 2024, Hilton Hawaiian Village Waikiki Beach Resort, Hawaii, USA, January 3-6, 2024. ScholarSpace, pp 146–155, https://hdl.handle.net/10125/106393

Shajari S, Alassad M, Agarwal N (2024a) Characterizing suspicious commenter behaviors. In: Proceedings of the 2023 IEEE/ACM international conference on advances in social networks analysis and mining. Association for Computing Machinery, New York, NY, USA, ASONAM ’23, p 631–635, https://doi.org/10.1145/3625007.3627309 , https://doi.org/10.1145/3625007.3627309

Shajari S, Amure R, Agarwal N (2024b) Analyzing anomalous engagement and commenter behavior on youtube. In: AMCIS 2024 Proceedings, https://aisel.aisnet.org/amcis2024/social_comp/social_comput/6

Shani G, Gunawardana A (2011) Evaluating recommendation systems. Recommender systems handbook pp 257–297

Srba I, Moro R, Tomlein M et al (2023) Auditing Youtube’s recommendation algorithm for misinformation filter bubbles. ACM Trans Recomm Syst 1(1):1–33

Stinson C (2022) Algorithms are not neutral: bias in collaborative filtering. AI Ethics 2(4):763–770

Van Bavel JJ, Robertson CE, Del Rosario K et al (2024) Social media and morality. Annu Rev Psychol 75:311–340

Vinagre J, Jorge AM, Rocha C et al (2019) Statistically robust evaluation of stream-based recommender systems. IEEE Trans Knowl Data Eng 33(7):2971–2982

Wade G (2005) The Zheng He voyages: a reassessment. J Malaysian Branch Royal Asiatic Soc. pp 37–58

Yousefi N, Noor NB, Spann B et al (2024) Examining toxicity’s impact on reddit conversations. In: Cherifi H, Rocha LM, Cherifi C et al (eds) Complex Networks & Their Applications XII. Springer Nature Switzerland, Cham, pp 401–411. https://doi.org/10.1007/978-3-031-53503-1_33

Yousefi N, Cakmak MC, Agarwal N (2024a) Examining multimodel emotion assess- ment and resonance with audience on youtube. In: Proceedings of the 2024 9th International Conference on Multimedia and Image Processing. Association for Computing Machinery, New York, NY, USA, ICMIP ’24, p 85–93, https://doi.org/10.1145/3665026.3665039

Yousefi N, Noor NB, Spann B, et al (2023) Towards developing a measure to assess contagiousness of toxic tweets. In: Proceedings of the international workshop on combating health misinformation for social wellbeing (TrueHealth 2023) co-located with the 17th International Conference on Web and Social Media (ICWSM 2023)

Yu Q, Weng W, Zhang K, et al (2014) Hot topic analysis and content mining in social media. In: 2014 IEEE 33rd International performance computing and communications conference (IPCCC), IEEE, p 1–8

Zhan R, Pei C, Su Q, et al (2022) Deconfounding duration bias in watch-time prediction for video recommendation. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, p 4472–4481

Zhao Q, Harper FM, Adomavicius G, et al (2018) Explicit or implicit feedback? engagement or satisfaction? a field experiment on machine-learning-based recommender systems. In: Proceedings of the 33rd Annual ACM symposium on applied computing, pp 1331–1340

Zhong N, Schweidel DA (2020) Capturing changes in social media content: a multiple latent changepoint topic model. Mark Sci 39(4):827–846

Download references

Acknowledgements

This research is funded in part by the U.S. National Science Foundation (OIA-1946391, OIA-1920920, IIS-1636933, ACI-1429160, and IIS-1110868), U.S. Office of the Under Secretary of Defense for Research and Engineering (FA9550-22-1-0332), U.S. Army Research Office (W911NF-20-1-0262, W911NF-16-1-0189, W911NF-23-1-0011, W911NF-24-1-0078), U.S. Office of Naval Research (N00014-10-1-0091, N00014-14-1-0489, N00014-15-P-1187, N00014-16-1-2016, N00014-16-1-2412, N00014-17-1-2675, N00014-17-1-2605, N68335-19-C-0359, N00014-19-1-2336, N68335-20-C-0540, N00014-21-1-2121, N00014-21-1-2765, N00014-22-1-2318), U.S. Air Force Research Laboratory, U.S. Defense Advanced Research Projects Agency (W31P4Q-17-C-0059), Arkansas Research Alliance, the Jerry L. Maulden/Entergy Endowment at the University of Arkansas at Little Rock, and the Australian Department of Defense Strategic Policy Grants Program (SPGP) (award number: 2020-106-094). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations. The researchers gratefully acknowledge the support.

Author information

Authors and affiliations.

COSMOS Research Center, University of Arkansas, Little Rock, AR, USA

Mert Can Cakmak, Nitin Agarwal & Remi Oni

You can also search for this author in PubMed   Google Scholar

Contributions

M.C.C. was responsible for writing the main text of the manuscript, preparing all figures, and implementing the research methods described therein. N.A. critically reviewed the manuscript at each stage and provided substantial guidance on the research direction and methodology. R.O. conducted the analysis of the statistical measurements as outlined in Sect.  4.8 .

Corresponding author

Correspondence to Mert Can Cakmak .

Ethics declarations

Conflict of interest.

The authors declare no Conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cakmak, M.C., Agarwal, N. & Oni, R. The bias beneath: analyzing drift in YouTube’s algorithmic recommendations. Soc. Netw. Anal. Min. 14 , 171 (2024). https://doi.org/10.1007/s13278-024-01343-5

Download citation

Received : 25 April 2024

Revised : 24 June 2024

Accepted : 18 August 2024

Published : 24 August 2024

DOI : https://doi.org/10.1007/s13278-024-01343-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Algorithmic bias
  • Recommendation systems

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 29 August 2024

Single nucleotide polymorphisms (SNPs) that are associated with obesity and type 2 diabetes among Asians: a systematic review and meta-analysis

  • Kevina Yanasegaran 1   na1 ,
  • Jeremy Yung Ern Ng 1   na1 ,
  • Eng Wee Chua 1   na1 ,
  • Azmawati Mohammed Nawi 2   na1 ,
  • Pei Yuen Ng 1   na1 &
  • Mohd Rizal Abdul Manaf 2   na1  

Scientific Reports volume  14 , Article number:  20062 ( 2024 ) Cite this article

Metrics details

  • Pathogenesis
  • Risk factors

Single nucleotide polymorphisms (SNPs) could increase the susceptibility of individuals to develop obesity and type 2 diabetes (T2DM). Obesity and T2DM are closely related pathophysiologically, thus similar SNPs could mediate both these diseases, but this is rarely reported. Furthermore, limited studies have been performed to summarize SNP data in the Asian population compared to the Western population. In this study, we aimed to summarize SNPs that are associated with the development of obesity and T2DM among Asian populations. We searched six literature databases and Review Manager (RevMan) was used for meta-analysis. The pooled odds ratios (ORs) and 95% CIs were calculated with a random effects model for the heterogeneity among studies. The pooled analysis showed that rs9939609 (FTO gene) and rs17782313 and rs571312 (MC4R gene) are associated with obesity with an odd ratio (OR) of 1.37, 1.36 and 1.29 respectively. For T2DM, five SNPs, rs7903146 and rs12255372 (TCF7L2 gene), rs13266634 and rs11558471 (SLC30A8 gene) and rs2283228 (KCNQ1 gene) have also shown strong associations with T2DM at OR of 1.64, 1.61, 1.22, 1.29 and 1.60 respectively. This data could be used to develop a gene screening panel for assessing obesity and T2DM susceptibility.

Similar content being viewed by others

literature reviews bias

Association between the MTNR1B , HHEX , SLC30A8 , and TCF7L2 single nucleotide polymorphisms and cardiometabolic risk profile in a mixed ancestry South African population

literature reviews bias

Integrated analysis of probability of type 2 diabetes mellitus with polymorphisms and methylation of SLC30A8 gene: a nested case-control study

literature reviews bias

Common genetic variation in obesity, lipid transfer genes and risk of Metabolic Syndrome: Results from IDEFICS/I.Family study and meta-analysis

Introduction.

Over the past years, non-communicable diseases (NCDs) have led to substantial mortality and morbidity globally. According to the World Health Organization, non-communicable diseases (NCDs) cause 71% of all deaths worldwide 1 . An understanding of the pathogenesis of NCDs such as obesity and Type 2 Diabetes (T2DM) is important to know the risk factors of the diseases and to ensure proper preventive measures can be taken to reduce the mortality due to NCDs 2 , 3 . Obesity is a condition where excessive fat accumulation occurs in the body 4 whereas T2DM is a condition where lesser insulin is secretion by pancreatic β-cells with diminished insulin efficacy in target tissues 5 . By sharing strong genetic and environmental aspects in their pathogenesis, obesity increases the impact of genetic susceptibility and environmental factors on T2DM. Once obesogenic and diabetogenic environmental factors amplify genetic susceptibilities, ectopic adipose tissue expansion and excessive accumulation of certain nutrients and metabolites sabotage metabolic balance. Processes including insulin resistance, dysfunctional autophagy, and the microbiome-gut-brain axis will be activated to exacerbate immunometabolism dysregulation through systemic inflammation, leading to accelerated loss of β-cell function and gradual elevation of blood glucose level 6 , 7 , 8 , 9 , 10 .

The role of SNP in obesity and T2DM is not very straightforward due to the involvement of multiple genes in their pathogenesis. Some of these SNPs, when found in isolation, do not confer any added risks for obesity 10 , 11 , 12 . However, when combined, these SNPs increase obesity risk. For example, rs1801282 of the peroxisome proliferator-activated receptor γ2 (PPARγ2) gene had no significant association with obesity (OR 0.837; 95% CI 0.485–1.443) among Taiwanese until it was found in combination with SDC3 rs2282440 (combined OR 6.77; 95% CI 1.87–24.54) 11 . This showed a significant association with obesity when combined, suggesting gene–gene interactions are at play.

To date, many genetic variants that are associated with the development of obesity and T2DM have been identified through genome-wide association studies (GWAS), mostly conducted in European Descendants and some Asian populations 13 , 14 . However, many common genetic variants that are associated with NCDs in Europeans have not been observed in Asian populations due to differences in biological traits, cultural practices, and lifestyle habits 15 , 16 . For example, SNPs G2548A, H1328080, and A19G of the leptin gene are associated with obesity among Malays in the Malaysian population, only SNP G2548A is associated with obesity among Tunisian and none of these SNPs was associated with obesity among the Turkish Population 15 , 16 , 17 . For T2DM, three SNPs namely rs2028299 of adaptor-related protein complex 3 subunit sigma 2 (AP3S2) gene, rs3923113 of growth factor receptor-bound protein 14 (GRB14) gene and rs4812829 of hepatocyte nuclear factor 4α (HNF4α) gene are associated with increased risk of T2DM among the South Asian population whereas these effects are not observed in white Europeans 14 , 18 . Thus, genetic variants vary according to nativity which means populations within the same continental group evince the same allele enrichment or depletion patterns compared to inter-continental populations which show distinct patterns 19 . As a result, the identification of genes and SNPs that are involved in the pathogenesis of T2DM and obesity across a different population is important as it can affect the diagnosis, treatment, and prevention of the disease across a different population 20 . Therefore, this systematic review and meta-analysis is conducted to investigate single nucleotide polymorphisms (SNPs) in candidate genes associated with the development of obesity and T2DM across different ethnic groups in Asian populations.

Systematic review of search results and risk of bias within the studies

The initial database search identified 11,860 articles from Ovid/Embase, Scopus, and the Cochrane, PubMed, Web of Science, and Science Direct databases (Fig.  1 ). After screening abstracts and titles, 98 articles were screened upon removal of 630 duplicates. During the second screening step for full-text articles, 90 articles related to the study area were selected. After excluding 36 articles with reasons, 54 qualified articles were included in this systematic review that was conducted in 14 different Asian countries. Of the included articles, 49 (90.74%) had case–control studies and 5 (9.26%) were cross-sectional designs. The included studies were case–control studies and cross-sectional studies in which the total number of cases and controls in the included studies was 58,601. The number of captured SNPs was 76, which mapped onto 41 different genes.

figure 1

PRISMA flow diagram of study selection process.

The assessment of the ROBINS-I tool is shown in Supplementary Table S1 and Supplementary Fig. S1 . Based on the ROBINS-I tool, 25 studies were identified as “low risk”, 9 studies were assessed as “moderate risk” studies, and 3 studies were considered as “Serious risk”. Due to the distinctiveness of data extracted from each study, assessments of certainty and sensitivity analysis could not be completed.

SNPs in the Asian obesity population

From the included studies, 38 SNPs were significantly associated with obesity. The SNPs for FTO gene were most frequently reported for association with obesity compared to the other genes with 10 reported FTO SNPs (refer to Table 1 ). rs9939609 FTO was most reported, as supported in 5 studies 25 , 26 , 27 , 28 , 29 . The next frequently reported SNPs belonged to the leptin gene with 7 different SNPs reported in 6 studies 15 , 33 , 39 , 42 . The melanocortin-4-receptor gene (MC4R) gene with 5 different SNPs was reported in 4 studies 26 , 37 , 38 , 39 . The adiponectin gene (ADIPOQ) reported 4 SNPs 33 , 35 , 36 . Lastly, brain-derived neurotrophic factor gene (BDNF), Syndecan 3 gene (SDC3), beta-2 adrenergic receptor gene (ADRB2), TCF7L2 gene, glucagon-like peptide-1 receptor gene (GLP1R), CDK5 regulatory subunit associated protein 1 like 1 gene (CDKAL1), TMEM18 gene, fas apoptotic inhibitory molecule gene (FAIM2), nuclear receptor coactivator 2 (NCOA2) and GA binding protein transcription factor subunit beta 1 gene (GABPB1), Ectonucleotide Pyrophosphatase/Phosphodiesterase 1 gene (ENPP1), Cholesterol ester transfer protein gene (CETP) and combined genotypes of FTO and TCF7L2 reported one SNP each respectively 11 , 30 , 31 , 33 , 34 , 40 , 41 , 41 , 42 , 43 , 44 , 45 . Detailed information on participants’ recruitment countries is available in Supplementary Table 1 .

SNPs in the Asian T2DM population

A total of 55 SNPs were captured across 36 different genes that were significantly associated with T2DM (refer to Table 2 ). Alike obesity SNP rs9939609 of the FTO gene, SNP rs266729 of the ADIPOQ gene, SNP rs12970134 of the MC4R gene, SNP rs6548238 of the TMEM18 gene, SNP rs7754840 of CDKAL1 and SNP rs7138803 of FAIM2 gene were also reported among T2DM Asian population 35 , 44 , 46 , 47 , 48 , 49 , 50 , 59 , 62 . The FTO gene and TCF7L2 gene reported four SNPs respectively 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 whereas SLC30A8 gene, insulin-like growth factor 2 mRNA binding protein 2-gene (IGF2BP2), CDKAL1 gene, haematopoietically expressed homeobox gene (HHEX) and KCNQ1 gene reported three SNPs respectively 52 , 54 , 55 , 57 , 58 , 59 , 60 , 61 , 64 , 70 .

Two SNPs from the ADIPOQ gene and BDNF gene were associated with T2DM 35 , 46 , 62 , 63 , 65 . One SNP was also reported for each of the following genes for increased risk of T2D in various Asian ethnicities: sarcoglycan gamma gene (SGCG), PPARγ2 gene, MC4R gene, glucokinase (GCK), adenylate cyclase type 5 gene (ADCY5), cyclin dependent kinase inhibitor 2b gene (CDKN2B), plexin A4 gene (PLXNA4 ), FAIM2 gene, glucosamine-6-phosphate deaminase 2 gene (GNPDA2), bicoid interacting 3 domain-containing rna methyltransferase- fas apoptotic inhibitory molecule gene (BCDIN3D-FAIM2), tumour protein p53-inducible nuclear protein 1 gene (TP53INP1), CDKN2A/2B, melatonin receptor 1b (MTNR1B), ENPP1 gene, protein tyrosine phosphatase receptor type D gene (PTPRD), glutathione s-transferase theta 1 gene (GSTT1), glutathione s-transferase mu 1 gene (GSTM1), glutathione s-transferase pi 1 gene (GSTP1), angiotensin I converting enzyme gene (ACE), rho GTPase activating protein 22 gene (ARHGAP22), signal transducer and activator of transcription 4 gene (STAT4), ADP ribosylation factor like GTPase 15 gene (ARL15), dipeptidyl peptidase-4 (DPP-IV), ankyrin repeat and PH domain 1(ARAP1) and aquaporin-7 gene (AQP7) 46 , 49 , 50 , 54 , 55 , 57 , 66 , 67 , 68 , 69 , 71 , 72 , 73 , 74 , 75 , 76 . Detailed information on participants’ recruitment countries is available in Supplementary Table 2 .

Meta‑analyses

We first conducted a meta-analysis to analyze the association of the following SNPs with obesity, i.e. rs9939609 of the FTO gene and rs17782313, rs571312 and rs12970134 of the MC4R gene and rs7799039 of the leptin gene (refer Figs. 2 , 3 , 4 , 5 and 6 ) 25 , 26 , 27 , 28 , 29 , 33 , 37 , 38 , 39 . The data from the five studies of rs9939609 of the FTO gene under the allelic model (A vs T) yielded a significant association with obesity (OR: 1.37; CI 1.26–1.49; P  < 0.00001; I 2  = 0%) (Fig.  2 ) 25 , 26 , 27 , 28 , 29 . Similarly, rs17782313 and rs571312 of the MC4R gene showed a significant association with obesity (OR: 1.36; CI 1.22–1.52; P  < 0.00001; I 2  = 0%, OR: 1.29; CI 1.11–1.51; P  = 0.001; I 2  = 0%; Figs. 3 and 4 ) 26 , 37 , 38 , 39 . Although both rs9939609 and rs17782313 were highly significant SNPs with very low P -values, rs9939609 gave a marginally higher OR indicating a stronger association with obesity.

figure 2

Forest plot of FTO rs9939609 and obesity using the allelic model (A vs T).

figure 3

Forest plot of MC4R rs17782313 and obesity using the allelic model (C vs T).

figure 4

Forest plot of MC4R rs571312 and obesity using the allelic model (A vs C).

figure 5

Forest plot of MC4R rs12970134 and obesity using the allelic model (A vs G).

figure 6

Forest plot of leptin rs7799039 and obesity using the allelic model (G vs A).

Next, we conducted a meta-analysis to analyze the association of the following SNPs with T2DM, i.e. rs9939609 of the FTO gene, rs7903146 and rs12255372 of the TCF7L2 gene, rs13266634 and rs11558471 of the SCL30A8 gene, rs2237892 and rs2283228 of the KCNQ1 gene, rs266729 of the ADIPOQ gene, rs1801282 of the PPARγ2 gene and rs4402960 of the IGF2BP2 gene (refer Figs. 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 and 16 ) 35 , 47 , 48 , 49 , 52 , 53 , 54 , 55 , 56 , 58 , 59 , 60 , 61 , 62 , 67 , 70 . From the pooled data, only five SNPs showed a significant association with T2DM under the allelic model. The data from five studies showed that rs7903146 of the TCF7L2 gene was significantly associated with T2DM (OR 1.64; CI 1.38–1.96; P  < 0.00001; I 2  = 40%) (Fig.  8 ) 52 , 53 , 54 , 55 , 56 ; two studies similarly reported that rs12255372 of the TCF7L2 gene was significantly associated with T2DM under the allelic model G vs T (OR 1.61; CI 1.02–2.54; P  = 0.04; I 2  = 77%) (Fig.  8 ) 55 , 56 . On the other hand, rs13266634 and rs11558471 of the SCL30A8 gene were found to be significantly associated with T2DM (OR 1.22; CI 1.11–1.33; P  < 0.0001; I 2  = 0%, OR 1.29; CI 1.18–1.41; P  < 0.00001; I 2  = 0%) (Figs. 10 and 13 ) 52 , 54 , 58 . Two studies showed that rs2283228 of the KCNQ1 gene under the allelic model C versus A was significantly associated with T2DM (OR 1.60; CI 1.31–1.96; P  < 0.00001; I 2  = 0%; Fig.  12 ) 60 , 61 . Comparing the ORs, we can conclude that rs7903146 of the TCF7L2 gene has the strongest association with T2DM.

figure 7

Forest plot of FTO rs9939609 and T2DM using the allelic model (A vs T).

figure 8

Forest plot of TCF7L2 rs7903146 and T2DM using the allelic model (T vs C).

figure 9

Forest plot of TCF7L2 rs12255372 and T2DM using the allelic model (G vs T).

figure 10

Forest plot of SCL30A8 rs13266634 and T2DM using the allelic model (C vs T).

figure 11

Forest plot of KCNQ1 rs2237892 and T2DM using the allelic model (C vs T).

figure 12

Forest plot of KCNQ1 rs2283228 and T2DM using the allelic model (C vs A).

figure 13

Forest plot of SLC30A8 rs11558471 and T2DM using the allelic model (A vs G).

figure 14

Forest plot of ADIPOQ rs266729 and T2DM using the allelic model (C vs G).

figure 15

Forest plot of PPARγ2 rs1801282 and T2DM using the allelic model (C vs G).

figure 16

Forest plot of IGF2BP2 rs4402960 and T2DM using the allelic model (G vs T).

Publication bias

Funnel plots were constructed to assess the publication bias for two SNPs with at least five studies making the analysis feasible, namely FTO rs9939609 associated with obesity and TCF7L2 rs7903146 linked to T2DM. Neither of them showed significant publication bias (Figs. 17 and 18 ).

figure 17

Funnel plot of FTO rs9939609. The Eggers’ test does not support the presence of funnel plot asymmetry (intercept: 2.19, 95% CI 0.63–3.75, t: 2.747, P -value: 0.071).

figure 18

Funnel plot of TCF7L2 rs7903146. The Eggers’ test does not support the presence of funnel plot asymmetry (intercept: 1.03, 95% CI 1.59–3.65, t: 0.771, P -value: 0.497).

In this study, we reviewed studies that reported the association between various SNPs of different genes with obesity and T2DM among the Asian population. Our findings indicated that FTO rs9939609 SNPs were associated with both obesity and T2DM. Other SNPs namely MC4R rs17782313 were strongly associated with obesity, whereas TCF7L2 rs7903146, KCNQ1 rs2237892 and SCL30A8 rs13266634 significantly increased the risk of T2DM development.

FTO gene is the most known gene for the predisposition of obesity as the GWAS study identified FTO as an obesity sensitivity gene, and multiple SNPs in the intron 1 region were strongly associated with BMI, body fat rate and waist and hip circumference 77 . FTO-induced obesity and increased BMI initiate the progression of T2DM. Fat cells induce insulin resistance and proinflammatory cytokine production of leptin, tumor necrosis factor and interleukin 6 to increase fasting blood glucose levels 78 . SNP rs9939609 of the FTO gene is significantly associated with both obesity and T2DM in a various population of different Asian countries. For the association of obesity, this SNP was observed among Kuwaiti, Chinese, Pakistani, Indonesian and Japanese populations with ORs ranging from 1.27 to 3.72 25 , 26 , 27 , 28 , 29 . Despite different risk alleles, SNP rs9939609 has a similar obesity risk among the European population proving that the FTO gene is associated with increased body weight across various populations with elevating BMI and obesity risk 79 , 80 . On the other hand, for T2DM, some meta-analyses that were only focused on FTO gene SNPs pooling studies reported that positive associations with rs9939609 and T2DM conducted on the East and South Asian population confirmed that there in an involvement of this SNP in susceptibility to T2DM 81 . Furthermore, a Norwegian population-based Nord- Trøndelag Health Study (HUNT study) reported a strong association for rs9939609 with both type 2 diabetes (OR 1.13; P  = 4.5 × 10(− 8)) and the risk of developing incident type 2 diabetes (OR 1.16; P  = 3.2 × 10(− 8)) in Scandinavians after adjustment for age, sex and BMI giving us confidence that this gene predisposes inT2DM 82 , 83 .

The next common gene associated with obesity risk among Asians is the MC4R gene, which regulates food intake and energy homeostasis via the hormone leptin. Among 5 reported SNPs (rs17782313, rs2331841, rs6567160, rs571312 and rs12970134), rs17782313 was captured in 3 different studies with ORs ranging from 1.3 to 1.87 27 , 37 , 38 . GWAS studies have identified that the polymorphism of rs17782313 of the MC4R gene is also associated with obesity risk among Europeans (OR 1.12; 95% CI 1.08–1.16) and this variant contributes to increased BMIs in Europeans and East Asians 33 . It is also well established that the MC4R variant CC genotype of rs17782313 is associated with a higher intake of energy and a higher percentage of energy from fatty diets 34 , 84 .

Next, the ADIPOQ gene has been reported with 3 SNPs to be linked with obesity. GWAS has identified that SNPs of the ADIPOQ gene can decrease the serum levels of adiponectin and alter metabolic traits, such as waist-hip ratio 85 . 2 SNPs (rs822396 and rs1501299) of the ADIPOQ gene were reported among North Indian populations whereas another SNP (rs266729) was from Taipei. A case–control study conducted among South Indians replicated similar findings whereby SNPs rs822396 and rs1501299 are associated with obesity and central obesity 51 . One meta-analysis in the Chinese population found that SNPs in the ADIPOQ gene were positively linked to metabolic syndrome (which predisposes to obesity and T2D) 86 . However, there are some controversial results about ADIPOQ gene polymorphisms in the Asian population. The current understanding is that ADIPOQ SNPs alter the concentrations of adiponectin proteins, leading to metabolic changes that lead to obesity 87 . However, in Malaysian Malays, one study found no effect on adiponectin levels in individuals carrying SNPs of the ADIPOQ gene 87 . Another study found that AQIPOQ rs266729, which has previously been associated with obesity in the Indian and Thai populations, is not associated with obesity in the Taiwanese population 52 , 68 , 88 .

Amongst 3 SNPs (rs7903146, rs6585205 and rs12255372) of the TCF7L2 gene, rs7903146 of the TCF7L2 gene is significantly associated with the development of T2DM among Chinese, Indian, Thai and Palestine populations with ORs ranging from (1.11–3.34). A case–control study conducted among the Thai population reported that SNP rs7903146 of the TCF7L2 gene is associated with the development of T2DM (OR 1.7 95% CI 1.06–2.72) 52 . Similarly, the risk allele T of rs7903146 was associated with T2DM in the three ethnic groups, in Caucasians (OR 1.573; 95% CI 1.100–2.250; P  = 0.0131), African Americans (OR 2.011; 95% CI 1.265–3.196; P  = 0.003), and Hispanics (OR 1.897; 95% CI 1.204–2.989; P  = 0.006) 89 . This might be due to overexpression of the risk allele of the TCF7L2 gene in β cells, which results in reduced insulin secretion and causes a predisposition to T2DM directly and indirectly 90 , 91 .

This is the first study to reveal SNPs that could increase the risk of both obesity and T2DM in the Asian population via systematic review and meta-analysis, namely FTO rs9939609. Several limitations of this study warrant consideration. Firstly, there is a potential for language bias since we have excluded articles not published in English; however, we speculate that the number is probably small and unlikely to affect our findings. Secondly, one notable limitation of this systematic review is the inability to conduct a mediation analysis. This limitation arises from the unavailability of raw data from the included studies. For mediation analysis to occur, individual-level data are required to evaluate the impact of the mediating variables between independent and dependent variables. Without access to these data, it was not possible to explore the potential pathways and mechanisms underlying the observed effects. Thirdly, the effect size was not established for the SNPs included in our review. As a result, ORs have been used as an alternative (but valid) criterion to assess the association of the SNPs with obesity and/or T2DM. Another limitation is that we were not able to examine and correct for population stratification. The absence of effect size calculation and the lack of a clear consideration of potential biases or heterogeneity in the meta-analysis might impact the robustness and interpretability of the results. Moreover, there was a lack of information such as confidence intervals (CIs), effect size and risk allele frequency (RAF) in certain articles. Finally, our meta-analysis results may be affected by the confounding factors present in the original studies where our data was taken. This is because we specifically used the allele frequencies provided for our meta-analysis, and this could explain any discrepancies found in the reported ORs between the original studies and our meta-analyses.

In summary, we have presented a systematic review of SNPs associated with the development of obesity and T2DM among the Asian population. From the meta-analysis we conducted to compare the individual allele effects of SNPs that were reported more than once, we found that FTO rs9939609 was the most strongly associated SNP with obesity (OR 1.37; 95% CI 1.26–1.49), while TCF7L2 rs7903146 was the most strongly associated SNP with T2D (OR 1.64; 95% CI 1.38–1.96). As T2DM and obesity are multicausal disorders, these findings can help in Asian-specific gene screening panel development for assessing obesity and T2DM susceptibility. However, large-scale genome-wide association study studies and larger population cohort studies are required in the future to further validate these SNPs candidates among Asians.

Study design

In conducting this systematic review and meta-analysis, we adhered to established guidelines to ensure methodological rigor and transparency. Specifically, we followed the preferred reporting item for systematic reviews meta-analysis (PRISMA) statement recommendation for reviewing all reported SNPs that were associated with obesity and T2DM among Asians 21 . Additionally, we adhered to the guidelines for Meta-analysis of Observational Studies in Epidemiology (MOOSE) statements 22 to guide the planning, execution, and reporting of our meta-analysis.

Search strategy

An electronic literature search on peer-reviewed research articles containing case–control and cross-sectional studies published between January 2005 and April 2024 was screened to search for SNPs that were associated with obesity and T2DM among Asians. Two investigators independently identified articles (titles, abstracts, and then full texts) and screened them sequentially for inclusion criteria. We searched six literature databases: Ovid/Embase, Scopus, and the Cochrane, PubMed, Web of Science, and Science Direct databases. The search terms were ‘‘SNPs” AND “adults”. Each term was used individually in combination with one of these terms: (obesity OR type 2 diabetes OR T2DM) AND (Country). For example, ‘‘SNPs” AND ‘‘adults” AND Type 2 diabetes” AND ‘‘Malaysia”.

Selection criteria

Prior to the literature search, selection criteria were established to avoid selection bias. Our selection criteria included (i) articles published in English (ii) original papers containing independent data conducted in humans, (iii) research articles consisting of case–control or cross-sectional studies or randomized controlled trials, (iv) articles that only reported on Asian countries, (v) articles that contain studies that compare healthy adults and adults with obesity and/or T2DM (vi) articles with genetic variants that were associated with obesity or T2DM which reports an odds ratio (OR) and 95% confidence intervals (CIs). All articles that did not meet our inclusion criteria were excluded. Articles that were eligible for further review were identified by the authors through initial screening of the search terms. The second screening was based on a full-text review according to the selection criteria. The process of searching and selection was independently performed by two reviewers (K.Y. and J.N.Y.E) and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram and the guidelines for Meta-analysis of Observational Studies in Epidemiology (MOOSE) statements 21 , 22 . Any disagreement between the two reviewers was solved through discussion and consensus with a third reviewer (N.P.Y).

Data extraction

Information was carefully extracted from all eligible studies independently by two authors. Our search strategy resulted in 11,860 studies. Those studies were then exported to Mendeley, and 630 duplicates were detected and removed. According to our selection criteria, 54 studies were selected for further full-article screening. The selection was done by three reviewers independently to ensure that the data were captured correctly. The following information was extracted from each study: country, gene, SNPs, study design, sample size, the average age of participants, disease diagnostic standard, odds ratio (OR), 95% confidence intervals (CIs) and author and year of publication. To account for confounding factors in the studies, we used the adjusted ORs whenever provided by the original authors. For SNPs that were reported more than once, where the data was available, the allelic frequencies of the SNPs were collected as well for use in our meta-analysis.

Risk of bias

The searching and selection process was independently performed by two reviewers (K.Y. and J.N.Y.E) and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram. Any disagreement between the two reviewers was solved through discussion and consensus with a third reviewer (N.P.Y). In addition, we used the ROBINS‐I tool to evaluate the risk of bias for all the included articles from seven aspects (Supplementary Figs. 1 and 2 ) 23 . Two authors (K.Y. and J.N.Y.E) independently assessed the risk of bias. Any disagreement on the risk of bias score was resolved by (N.P.Y). We assessed bias due to a confounding domain according to whether the control and case groups were matched by age and gender. The biases in study participants, classification of intervention, deviations from intended interventions, and measurement of outcomes were "Low" and "Moderate". Bias due to missing data were rated whether data were reported completely. Bias in the selection of the reported result was evaluated whether the outcome was reported completely. Based on the evaluation of 7 domains, we compute the overall risk of bias and the results were reported in a rating of low, moderate, and serious 24 .

Statistical analysis

Forest plots for meta-analysis were generated using ReviewManager (RevMan) 5.4.1 (The Cochrane Collaboration, Copenhagen) software. In our meta-analyses, the summary ORs and 95% CIs were calculated using the random-effects model (because the comparison of data from the three papers comparing FTO rs9939609 and T2D yields an I2 value of 95%) using the Mantel–Haenszel statistical method. To summarize study estimates (odds ratios and 95% confidence intervals) when there were two or more studies for a variant.

Data availability

All data and materials used in this review are included in the main text.

Abbreviations

Smaller than

Greater than

Greater than and equal to

Bigna, J. J. & Noubiap, J. J. The rising burden of non-communicable diseases in sub-Saharan Africa. Lancet Glob. Health 7 (10), e1295–e1296 (2019).

Article   PubMed   Google Scholar  

Zhao, N. N. et al. FTO gene polymorphisms and obesity risk in Chinese population: A meta-analysis. World J. Pediatr. 15 (4), 382–389 (2019).

Dietrich, S. et al. Gene-lifestyle interaction on risk of type 2 diabetes: A systematic review. Obes. Rev. 20 (11), 1557–1571 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Raja Kumar, S., Muhammad, N., Fernandez, M. N. & Mohd Fahami, N. A. Preventive effects of Moringa oleifera on obesity and hyperlipidaemia: A systematic review. Sains Malays. 51 (7), 2159–2171. https://doi.org/10.17576/jsm-2022-5107-18 (2022).

Article   CAS   Google Scholar  

Abdullah, N., Attia, J., Oldmeadow, C., Scott, R. J. & Holliday, E. G. The architecture of risk for type 2 diabetes: Understanding Asia in the context of global findings. Int. J. Endocrinol. 2014 , 1–21. https://doi.org/10.1155/2014/593982 (2014).

Article   Google Scholar  

Yusof, H. et al. Technology advancement enabling the link of gut microbiota with obesity and metabolic disorder. J. Sains Kesihat. Malays. 13 (1), 77–91. https://doi.org/10.17576/jskm-2015-1301-10 (2015).

Ruze, R. et al. Obesity and type 2 diabetes mellitus: Connections in epidemiology, pathogenesis, and treatments. Front. Endocrinol. https://doi.org/10.3389/fendo.2023.1161521 (2023).

Nik, N. I. & Shahar, S. Dietary patterns of the metabolic syndrome among older adults in Malaysia. J. Sains Kesihat. Malays. 16 (si), 237. https://doi.org/10.17576/jskm-2018-41 (2018).

Mazlan, A., Adzharuddin, N. A., Omar, S. Z. & Tamam, E. Online health information seeking behavior of non-communicable disease (NCD) among government employees in Putrajaya Malaysia. J. Komun.: Malays. J. Commun. 37 (1), 419–433. https://doi.org/10.17576/jkmjc-2021-3701-24 (2021).

Teh, C. P. et al. Association between polymorphisms of insulin and insulin receptor gene with childhood obesity in Malay population. J. Sains Kesihat. Malays. 14 (1), 5–9. https://doi.org/10.17576/jskm-2016-1401-02 (2016).

Huang, W. H., Hwang, L. C., Chan, H. L., Lin, H. Y. & Lin, Y. H. Study of seven single-nucleotide polymorphisms identified in East Asians for association with obesity in a Taiwanese population. BMJ Open https://doi.org/10.1136/bmjopen-2016-011713 (2016).

Sergey, C., Agarval, R., Singh, R., Wilzynsca, A. & Meester, F. Can environmental factors predispose noncommunicable diseases?. Open Nutraceuticals J. 4 (1), 45–51 (2011).

Abuhendi, N. et al. Genetic polymorphisms associated with type 2 diabetes in the Arab world: A systematic review and meta-analysis. Diabet. Res. Clin. Pract. 151 , 198–208 (2019).

Yako, Y. Y. et al. Genetic risk of type 2 diabetes in populations of the African continent: A systematic review and meta-analyses. Diabet. Res. Clin. Pract. 114 , 136–150 (2016).

Wan Rohani, W. T., Aryati, A. & Amiratul, A. S. Haplotype analysis of leptin gene polymorphisms in obesity among Malays in Terengganu, Malaysia population. Med. J. Malays. 73 (5), 281–285 (2018).

CAS   Google Scholar  

Boumaiza, I. et al. Relationship between leptin G2548A and leptin receptor Q223R gene polymorphisms and obesity and metabolic syndrome risk in Tunisian volunteers. Genet. Test Mol. Biomark. 16 (7), 726–733 (2012).

Şahın, S. et al. Investigation of associations between obesity and LEP G2548A and LEPR 668A/G polymorphisms in a Turkish population. Dis Mark. 35 (6), 673–677 (2013).

Sohani, Z. N. et al. Does genetic heterogeneity account for the divergent risk of type 2 diabetes in South Asian and white European populations?. Diabetologia 57 (11), 2270–2281 (2014).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mao, L., Fang, Y., Campbell, M. & Southerland, W. M. Population differentiation in allele frequencies of obesity-associated SNPs. BMC Genom. 18 (1), 861. https://doi.org/10.1186/s12864-017-4262-9 (2017).

McCarthy, M. I. Genomics, type 2 diabetes, and obesity. N. Engl. J. Med. 363 (24), 2339–2350 (2010).

Article   CAS   PubMed   Google Scholar  

Kelishadi, R., Hovsepian, S. & Haghjooy Javanmard, S. A systematic review of single nucleotide polymorphisms associated with metabolic syndrome in children and adolescents. J. Pediatr. Rev. https://doi.org/10.5812/jpr.10536 (2017).

Stroup, D. F. Meta-analysis of observational studies in epidemiologyA proposal for reporting. JAMA 283 (15), 2008 (2000).

Sterne, J. A. et al. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355 , i4919. https://doi.org/10.1136/bmj.i4919 (2016).

McGuinness, L. A. & Higgins, J. P. T. Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Res. Synth. Methods 12 (1), 55–61. https://doi.org/10.1002/jrsm.1411 (2021).

Al-Serri, A. et al. Association of FTO rs9939609 with obesity in the Kuwaiti population: A public health concern?. Med. Princ. Pract. 27 (2), 145–151 (2018).

Huang, W., Sun, Y. & Sun, J. Combined effects of FTO rs9939609 and MC4R rs17782313 on obesity and BMI in Chinese Han populations. Endocrine 39 (1), 69–74 (2011).

Shabana, H. S. Effect of the common fat mass and obesity associated gene variants on obesity in Pakistani population: A case-control study. Biomed. Res. Int. 2015 , 852920 (2015).

Daya, M. et al. Obesity risk and preference for high dietary fat intake are determined by FTO rs9939609 gene polymorphism in selected Indonesian adults. Asia Pac. J. Clin. Nutr. 28 (1), 183–191 (2019).

CAS   PubMed   Google Scholar  

Hotta, K. et al. Variations in the FTO gene are associated with severe obesity in the Japanese. J. Hum. Genet. 53 (6), 546–553 (2008).

Tan, P. Y. & Mitra, S. R. The combined effect of polygenic risk from FTO and ADRB2 gene variants, odds of obesity, and post-hipcref diet differences. Lifestyle Genom. 13 (2), 84–98 (2020).

Alharbi, K. K. et al. Influence of adiposity-related genetic markers in a population of saudi arabians where other variables influencing obesity may be reduced. Dis. Mark. 2014 , 758232 (2014).

Google Scholar  

Ramya, K., Radha, V., Ghosh, S., Majumder, P. & Mohan, V. Genetic variations in the FTO gene are associated with type 2 diabetes and obesity in south Indians (CURES-79). Diabet. Technol. Ther. 13 (1), 33–42 (2011).

Saqlain, M. et al. Risk variants of obesity associated genes demonstrate BMI raising effect in a large cohort. PLOS ONE 17 (9), e0274904 (2022).

Srivastava, A. et al. A multianalytical approach to evaluate the association of 55 SNPs in 28 genes with obesity risk in North Indian adults. Amer. J. Human Biol. 29 (2), e22923 (2016).

Hsiao, T. J. & Lin, E. A validation study of adiponectin rs266729 gene variant with type 2 diabetes, obesity, and metabolic phenotypes in a Taiwanese population. Biochem. Genet. 54 (6), 830–841 (2016).

Kaur, H., Badaruddoza, B., Bains, V. & Kaur, A. Genetic association of ADIPOQ gene variants (-3971A>G and +276G>T) with obesity and metabolic syndrome in North Indian Punjabi population. PLoS One 13 (9), e0204502 (2018).

Mutombo, P. B. et al. MC4R rs17782313 gene polymorphism was associated with glycated hemoglobin independently of its effect on BMI in Japanese: the Shimane COHRE study. Endocr. Res. 39 (3), 115–119 (2014).

Gao, L. et al. MC4R single nucleotide polymorphisms were associated with metabolically healthy and unhealthy obesity in Chinese northern Han populations. Int. J. Endocrinol. 6 (2019), 4328909 (2019).

Sharma, T. & Badaruddoza, B. Association of Leptin-Melanocortin gene polymorphisms with the risk of obesity in northwest Indian population. Egypt. J. Med. Human Genet. https://doi.org/10.1186/s43042-024-00529-y (2024).

Lu, Y. et al. Association of NCOA2 gene polymorphisms with obesity and dyslipidemia in the Chinese Han population. Int. J. Clin. Exp. Pathol. 8 (6), 7341–7349 (2015).

PubMed   PubMed Central   Google Scholar  

Umapathy, D. et al. Association of SNP rs7181866 in the nuclear respiratory factor-2 beta subunit encoding GABPB1 gene with obesity and type-2 diabetes mellitus in South Indian population. Int. J. Biol. Macromol. 1 (132), 606–614 (2019).

Shabana, Shahid, S. U. & Hasnain, S. Use of a gene score of multiple low-modest effect size variants can predict the risk of obesity better than the individual snps. Lipids Health Dis. https://doi.org/10.1186/s12944-018-0806-5 (2018).

Xu, T. et al. Associations of TCF7L2 RS11196218 (A/G) and GLP-1R RS761386 (C/T) gene polymorphisms with obesity in Chinese population. Diab. Metab. Syndr. Obes.: Targ. Ther. 14 , 2465–2472 (2021).

Kang, J., Guan, R., Zhao, Y. & Chen, Y. Obesity-related loci in TMEM18, CDKAL1 and FAIM2 are associated with obesity and type 2 diabetes in Chinese Han patients. BMC Med. Genet. https://doi.org/10.1186/s12881-020-00999-y (2020).

Arianti, R. et al. Influence of single nucleotide polymorphism of ENPP1 and ADIPOQ on insulin resistance and obesity: A case–control study in a Javanese population. Life 11 (6), 552 (2021).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Takeuchi, F. et al. Association of genetic variants for susceptibility to obesity with type 2 diabetes in Japanese individuals. Diabetologia 54 (6), 1350–1359 (2011).

Binh, T. Q. et al. Association of the common FTO-rs9939609 polymorphism with type 2 diabetes, independent of obesity-related traits in a Vietnamese population. Gene 513 (1), 31–35 (2013).

Fawwad, A. et al. Common variant within the FTO gene, rs9939609, obesity and type 2 diabetes in population of Karachi, Pakistan. Diabet. Metab. Syndr. 10 (1), 43–47 (2016).

Phani, N. M. et al. Implications of critical PPARγ2, ADIPOQ and FTO gene polymorphisms in type 2 diabetes and obesity-mediated susceptibility to type 2 diabetes in an Indian population. Mol. Genet. Gen. 291 (1), 193–204 (2016).

Abdullah, N. et al. Characterizing the genetic risk for Type 2 diabetes in a Malaysian multi-ethnic cohort. Diabet. Med. 32 (10), 1377–1380 (2015).

Ramya, K., Ayyappa, K. A., Ghosh, S., Mohan, V. & Radha, V. Genetic association of ADIPOQ gene variants with type 2 diabetes, obesity and serum adiponectin levels in south Indian population. Gene 532 (2), 253–262 (2013).

Lin, Y. et al. Association study of genetic variants in eight genes/loci with type 2 diabetes in a Han Chinese population. BMC Med. Genet. 15 (11), 97 (2010).

Ereqat, S. et al. Association of a common variant in TCF7L2 gene with type 2 diabetes mellitus in the Palestinian population. Acta Diabetol. 47 (Suppl 1), 195–198 (2010).

Plengvidhya, N. et al. Impact of KCNQ1, CDKN2A/2B, CDKAL1, HHEX, MTNR1B, SLC30A8, TCF7L2, and UBE2E2 on risk of developing type 2 diabetes in Thai population. BMC Med. Genet. 19 (1), 93 (2018).

Shitomi-Jones, L. M., Akam, L., Hunter, D., Singh, P. & Mastana, S. Genetic risk scores for the determination of type 2 diabetes mellitus (T2DM) in North India. Int. J. Environ. Res. Pub. Health 20 (4), 3729 (2023).

Anjum, N., Jehangir, A. & Liu, Y. Two TCF7L2 variants associated with type 2 diabetes in the Han nationality residents of China. J. Coll. Phys. Surg.-Pak.: JCPSP 28 (10), 794–797 (2018).

Liu, J. et al. Association of 48 type 2 diabetes susceptibility loci with fasting plasma glucose and lipid levels in Chinese Hans. Diabet. Res. Clin. Pract. 139 , 114–121 (2018).

Seman, N. A., Mohamud, W. N., Östenson, C.-G., Brismar, K. & Gu, H. F. Increased DNA methylation of the SLC30A8 gene promoter is associated with type 2 diabetes in a Malay population. Clin. Epigenet. https://doi.org/10.1186/s13148-015-0049-5 (2015).

Wu, Q. et al. CDKAL1, KCNQ1, and IGF2BP2 are identified as type 2 diabetes susceptibility genes in a regional Chinese population. Brit. J. Med. Med. Res. 9 (6), 1–8 (2015).

Saif-Ali, R. et al. KCNQ1 haplotypes associate with type 2 diabetes in Malaysian Chinese Subjects. Int. J. Mol. Sci. 12 (9), 5705–5718 (2011).

Saif-Ali, R. et al. KCNQ1 variants associate with type 2 diabetes in Malaysian Malay subjects. Ann. Acad. Med. Singap. 40 (11), 488–492 (2011).

Truong, S. et al. Association of ADIPOQ single-nucleotide polymorphisms with the two clinical phenotypes type 2 diabetes mellitus and metabolic syndrome in a Kinh Vietnamese population. Diabet. Metab. Syndr. Obes.: Targ. Ther. 15 , 307–319 (2022).

Karimi, H., Nezhadali, M. & Hedayati, M. Association between adiponectin rs17300539 and rs266729 gene polymorphisms with serum adiponectin level in an Iranian diabetic/pre-diabetic population. Endocr. Regul. 52 (4), 176–184 (2018).

Mansoori, Y., Daraei, A., Naghizadeh, M. M. & Salehi, R. Significance of a common variant in the CDKAL1 gene with susceptibility to type 2 diabetes mellitus in Iranian population. Adv. Biomed. Res. 11 (4), 45 (2015).

Han, X. et al. Rs4074134 near BDNF gene is associated with type 2 diabetes mellitus in Chinese Han population independently of body mass index. PLoS One 8 (2), e56898 (2013).

Hsiao, T. J. & Lin, E. The ENPP1 K121Q polymorphism is associated with type 2 diabetes and related metabolic phenotypes in a Taiwanese population. Mol. Cell. Endocrinol. 15 (433), 20–25 (2016).

Baqer, S. K. & Shwan, N. A. A. PPARG and FTO gene variants and their association with type 2 diabetes in the Kurdish population. Passer J. Basic Appl. Sci. 5 (1), 171–177 (2023).

Wang, Y. et al. Associations between aquaglyceroporin gene polymorphisms and risk of type 2 diabetes mellitus. Biomed. Res. Int. 27 (2018), 8167538 (2018).

Chen, M. et al. Three single nucleotide polymorphisms associated with type 2 diabetes mellitus in a Chinese population. Exp. Ther. Med. 13 (1), 121–126 (2016).

Li, Y. et al. Evidence of association between single-nucleotide polymorphisms in lipid metabolism-related genes and type 2 diabetes mellitus in a Chinese population. Int. J. Med. Sci. 18 (2), 356–363 (2021).

Saif-Ali, R. et al. Association of protein tyrosine phosphatase receptor type D and serine racemase genetic variants with type 2 diabetes in Malaysian Indians. Indian J. Endocrinol. Metab. 28 (1), 55–59 (2024).

Li, J. et al. Impact of diabetes-related gene polymorphisms on the clinical characteristics of type 2 diabetes Chinese Han population. Oncotarget 7 (51), 85464–85471 (2016).

Cui, J. et al. Association between Stat4 gene polymorphism and type 2 diabetes risk in Chinese Han population. BMC Med. Genom. 14 (1), 1–6 (2021).

Cui, Z. et al. Association between single nucleotide polymorphisms (snps) in the gene of ADP-ribosylation factor-like 15 (ARL15) and type 2 diabetes (T2D) in Korean Chinese population in Yanbian, China. Int. J. Diabet. Dev. Ctries. 37 (2), 124–128 (2015).

Bhargave, A., Devi, K., Ahmad, I., Yadav, A. & Gupta, R. Genetic variation in DPP-IV gene linked to predisposition of T2DM: A case control study. J. Diabet. Metab. Disord. 21 (2), 1709–1716 (2022).

Li, Y. et al. Identifying the association between single nucleotide polymorphisms in KCNQ1, ARAP1, and KCNJ11 and type 2 diabetes mellitus in a Chinese population. Int. J. Med. Sci. 17 (15), 2379–2386 (2020).

Lan, N. et al. FTO: A common genetic basis for obesity and cancer. Front. Genet. https://doi.org/10.3389/fgene.2020.559138 (2020).

Wang, Q. et al. Relationship between fat mass and obesity-associated gene expression and type 2 diabetes mellitus severity. Exp. Therap. Med. https://doi.org/10.3892/etm.2018.5752 (2018).

Chang, Y. C. et al. Common variation in the fat mass and obesity-associated (FTO) gene confers risk of obesity and modulates BMI in the chinese population. Diabetes 57 (8), 2245–2252 (2008).

Loos, R. J. & Yeo, G. S. The bigger picture of FTO: The first GWAS-identified obesity gene. Nat. Rev. Endocrinol. 10 (1), 51–61 (2014).

Yang, Y. et al. FTO genotype and type 2 diabetes mellitus: Spatial analysis and meta-analysis of 62 case-control studies from different regions. Genes 8 (2), 70. https://doi.org/10.3390/genes8020070 (2017).

Hertel, J. K. et al. fto, type 2 diabetes, and weight gain throughout adult life. Diabetes 60 (5), 1637–1644. https://doi.org/10.2337/db10-1340 (2011).

Xi, B., Chandak, G. R., Shen, Y., Wang, Q. & Zhou, D. Association between common polymorphism near the MC4R gene and obesity risk: A systematic review and meta-analysis. PLoS ONE 7 (9), e45731 (2012).

Lu, J. F. et al. Association of ADIPOQ polymorphisms with obesity risk: A meta-analysis. Hum. Immunol. 75 (10), 1062–1068 (2014).

Zhou, J. M. et al. Association of the ADIPOQ Rs2241766 and Rs266729 polymorphisms with metabolic syndrome in the Chinese population: A meta-analysis. Biomed. Environ. Sci. 29 (7), 505–515. https://doi.org/10.3967/bes2016.066 (2016).

Jee, S. H. et al. Adiponectin concentrations: A genome-wide association study. Am. J. Hum. Genet. 87 (4), 545–552. https://doi.org/10.1016/j.ajhg.2010.09.004 (2010).

Apalasamy, Y. D. et al. Association of ADIPOQ gene with obesity and adiponectin levels in Malaysian Malays. Mol. Biol. Rep. 41 (5), 2917–2921 (2014).

Suriyaprom, K., Phonrat Msc, B. & Phd, R. T. Association of adiponectin gene-11377C>G polymorphism with adiponectin levels and the metabolic syndrome in Thais. Asia Pac. J. Clin. Nutr. 23 (1), 167–173. https://doi.org/10.6133/apjcn.2014.23.1.01 (2014).

Cropano, C. et al. The RS7903146 variant in the tcf7l2 gene increases the risk of prediabetes/type 2 diabetes in obese adolescents by impairing β-cell function and hepatic insulin sensitivity. Diabet. Care 40 (8), 1082–1089. https://doi.org/10.2337/dc17-0290 (2017).

Hattersley, A. T. Prime suspect: The TCF7L2 gene and type 2 diabetes risk. J. Clin. Invest. 117 (8), 2077–2079. https://doi.org/10.1172/jci33077 (2007).

Ding, W. et al. Meta-analysis of association between TCF7L2 polymorphism rs7903146 and type 2 diabetes mellitus. BMC Med. Genet. 19 (1), 38 (2018).

Download references

Acknowledgements

The authors thank Ms Serene Lalitha Kandikatti for proofreading the paper.

This research received funding from internal university grant of Dana Cabaran Perdana (DCP-2018-005/1) and Geran Universiti Penyelidikan (GUP-2021–009) from Universiti Kebangsaan Malaysia. The funder has no role in study method, data analysis, decision to publish or preparation of the manuscript. PROSPERO registration name and number is Kevina A/P N Yanasegaran and CRD42020164731.

Author information

These authors contributed equally: Kevina Yanasegaran, Jeremy Yung Ern Ng, Eng Wee Chua, Azmawati Mohammed Nawi, Pei Yuen Ng and Mohd Rizal Abdul Manaf.

Authors and Affiliations

Centre for Drug and Herbal Development, Faculty of Pharmacy, Universiti Kebangsaan Malaysia, 50300, Kuala Lumpur, Malaysia

Kevina Yanasegaran, Jeremy Yung Ern Ng, Eng Wee Chua & Pei Yuen Ng

Department of Public Health Medicine, Faculty of Medicine, Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia

Azmawati Mohammed Nawi & Mohd Rizal Abdul Manaf

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, K.Y.; methodology, K.Y.; software, K.Y. and J.N.Y.E.; validation, K.Y. and J.N.Y.E.; formal analysis, K.Y. and J.N.Y.E.; data curation, writing—original draft preparation, K.Y.; writing—review and editing, K.Y., J.N.Y.E., N.P.Y., M.R.A.M., E.W.C. and A.M.N.; supervision, N.P.Y., M.R.A.M., E.W.C. and A.M.N.; funding acquisition, N.P.Y. and M.R.A.M. All authors have reviewed and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Pei Yuen Ng or Mohd Rizal Abdul Manaf .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yanasegaran, K., Ng, J.Y.E., Chua, E.W. et al. Single nucleotide polymorphisms (SNPs) that are associated with obesity and type 2 diabetes among Asians: a systematic review and meta-analysis. Sci Rep 14 , 20062 (2024). https://doi.org/10.1038/s41598-024-70674-2

Download citation

Received : 18 April 2024

Accepted : 20 August 2024

Published : 29 August 2024

DOI : https://doi.org/10.1038/s41598-024-70674-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Type 2 diabetes

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

literature reviews bias

  • Open access
  • Published: 31 August 2024

Impaired glucose metabolism and the risk of vascular events and mortality after ischemic stroke: A systematic review and meta-analysis

  • Nurcennet Kaynak   ORCID: orcid.org/0000-0002-0637-8421 1 , 2 , 3 , 4 , 5 ,
  • Valentin Kennel   ORCID: orcid.org/0009-0000-0354-4167 1 , 2 ,
  • Torsten Rackoll   ORCID: orcid.org/0000-0003-2170-5803 2 , 6 ,
  • Daniel Schulze   ORCID: orcid.org/0000-0001-9415-2555 7 ,
  • Matthias Endres   ORCID: orcid.org/0000-0001-6520-3720 1 , 2 , 4 , 5 , 8 &
  • Alexander H. Nave   ORCID: orcid.org/0000-0002-0101-4557 1 , 2 , 3 , 5  

Cardiovascular Diabetology volume  23 , Article number:  323 ( 2024 ) Cite this article

Metrics details

Diabetes mellitus (DM), prediabetes, and insulin resistance are highly prevalent in patients with ischemic stroke (IS). DM is associated with higher risk for poor outcomes after IS.

Investigate the risk of recurrent vascular events and mortality associated with impaired glucose metabolism compared to normoglycemia in patients with IS and transient ischemic attack (TIA).

Systematic literature search was performed in PubMed, Embase, Cochrane Library on 21st March 2024 and via citation searching. Studies that comprised IS or TIA patients and exposures of impaired glucose metabolism were eligible. Study Quality Assessment Tool was used for risk of bias assessment. Covariate adjusted outcomes were pooled using random-effects meta-analysis.

Main outcomes

Recurrent stroke, cardiac events, cardiovascular and all-cause mortality and composite of vascular outcomes.

Of 10,974 identified studies 159 were eligible. 67% had low risk of bias. DM was associated with an increased risk for composite events (pooled HR (pHR) including 445,808 patients: 1.58, 95% CI 1.34–1.85, I 2  = 88%), recurrent stroke (pHR including 1.161.527 patients: 1.42 (1.29–1.56, I 2  = 92%), cardiac events (pHR including 443,863 patients: 1.55, 1.50–1.61, I 2  = 0%), and all-cause mortality (pHR including 1.031.472 patients: 1.56, 1.34–1.82, I 2  = 99%). Prediabetes was associated with an increased risk for composite events (pHR including 8,262 patients: 1.50, 1.15–1.96, I 2  = 0%) and recurrent stroke (pHR including 10,429 patients: 1.50, 1.18–1.91, I 2  = 0), however, not with mortality (pHR including 9,378 patients, 1.82, 0.73–4.57, I 2  = 78%). Insulin resistance was associated with recurrent stroke (pHR including 21,363 patients: 1.56, 1.19–2.05, I 2  = 55%), but not with mortality (pHR including 21,363 patients: 1.31, 0.66–2.59, I 2  = 85%).

DM is associated with a 56% increased relative risk of death after IS and TIA. Risk estimates regarding recurrent events are similarly high between prediabetes and DM, indicating high cardiovascular risk burden already in precursor stages of DM. There was a high heterogeneity across most outcomes.

Introduction

Ischemic stroke (IS) is associated with high mortality and high risk of recurrent vascular events worldwide [ 1 , 2 , 3 ]. Despite adequate secondary prevention, about 11% of patients suffer a recurrent stroke within the first year [ 4 ]. Diabetes mellitus (DM) is a highly prevalent cardiovascular risk factor and is present in about one-third of IS patients [ 5 , 6 ]. Stroke prevention guidelines recommend screening for unrecognized DM after IS [ 7 ]. Besides DM, other forms of impaired glucose metabolism (IGM), such as prediabetes and insulin resistance (IR) have been gaining importance over the last decades in terms of their association with increased cardiovascular risk [ 8 ]. Prediabetes, comprising impaired fasting glucose and impaired glucose tolerance, represents a hyperglycemic condition of patients not yet within the diabetic range [ 9 ]. In comparison, IR constitutes a pathophysiological mechanism, which usually precedes and coexists with both DM and prediabetes [ 10 ]. Observational studies report that 70% of the patients with IS have either DM (46%) or prediabetes (24%), and 50% of those who have no DM at baseline have IR [ 11 , 12 ].

Considering that the majority of patients with stroke have some form of IGM, it represents an important aspect of secondary stroke prevention. Numerous studies, including systematic reviews, have shown the association between DM and prediabetes and stroke recurrence [ 13 , 14 , 15 ]. However, only few studies have looked at composite vascular events as an outcome. Furthermore, mortality risk associated with DM after stroke has not been addressed in previous meta-analyses. A comprehensive systematic approach is needed to identify and compare risks associated with composite vascular events and mortality after IS and TIA between different forms of IGM.

Stroke prevention guidelines recommend the use of new generation antidiabetics based on the finding that these agents demonstrated cardiovascular protective effects in patients with previous cardiovascular disease including stroke [ 7 ]. However, only the minority of patients had a history of stroke and subgroup analyses of patients with a previous IS or TIA remained mostly inconclusive [ 16 , 17 ]. In contrast, in the IRIS Trial only patients with IR and a recent IS or TIA were included [ 18 ]. Despite the lower risk of cardiovascular events associated with pioglitazone, the high risk of adverse events restricted the clinical implication of the drug. Currently, it remains unclear which pharmacological treatments are beneficial in terms of secondary stroke prevention in patients with acute or subacute IS or TIA and different forms of IGM.

Identifying increased cardiovascular risk not only in DM but also other forms of IGM would capture a greater population at risk and eventually prompt implementation of secondary preventive measures. We conducted a systematic literature review and meta-analysis to extend our knowledge on the burden of IGM in patients with IS and TIA in the context of cardiovascular events and mortality.

This manuscript adheres to the PRISMA guideline [ 19 ]. Study protocol was pre-registered in open science framework in 2021 [ 20 ].

Information sources

We conducted a systematic literature search on Medline via Pubmed, Ovid via Embase, and Cochrane Library that was last updated on March 21, 2024. Search terms included “diabetes”, “prediabetes”, “insulin resistance”, “stroke” and “transient ischemic attack”, restricted to English language. See full search strategy in supplementary material methods. Reference lists of previous systematic reviews and of studies included in our review were searched manually.

Study selection and data extraction

Screening was performed by two reviewers independently (NK and VK) and consensus was reached with two additional reviewers (TR and AHN) in case of disagreement. Eligible studies were observational studies that included patients within 3 months after an IS or TIA and reported at least one of the following outcomes: composite vascular events, recurrent stroke, cardiovascular and all-cause mortality, cardiac events including but not limited to myocardial infarction, all regardless of follow-up duration (see supplementary Table 1 for the eligibility criteria). Composite events comprised at least stroke, cardiac events, and cardiovascular death. Studies were required to report hazard ratios (HR), odds ratios (OR), or risk ratios using a multivariable model. Exposures of interest were DM, prediabetes and IR, which were included independently of the definition used in the respective study. Additionally, we screened for studies that compared the use of an antidiabetic therapy to placebo or another antidiabetic therapy within the same population and outcomes mentioned above, regardless of study design.

Data extraction and assessment of risk of bias were performed by one reviewer (NK) and the internal validity was checked with a second reviewer (VK) for a random sample of 10% of studies. Interrater reliability was calculated. Authors were contacted via email if substantial outcome data were lacking, unclear or discrepant. Risk of bias assessment was made using the Study Quality Assessment Tool of National Heart, Lung, and Blood Institute [ 21 ]. A detailed methodological description can be found in the methods section of the supplementary material.

Data synthesis

We performed random effects meta-analyses with the restricted maximum likelihood estimator method after grouping studies into outcome measures HR for each study outcome. OR were pooled using meta-regression with follow-up duration as moderator and with random effects meta-analysis if moderator showed no significant effect (p < 0.05). Studies used different sets of covariates that included sociodemographic and clinical characteristics. We included the effect size from the models with the most adjusting factors available. We calculated the 95% confidence interval (CI) and prediction intervals. Prediction intervals describe the expected range of future study results, while confidence intervals relate to the precision of the aggregated effect. Multi-level meta-analysis was performed if multiple subgroups from a single study were included in the analysis. Furthermore, we performed meta-analyses of absolute risks derived from event numbers for each outcome and exposure group, whenever such data were reported. Heterogeneity was assessed using Cochran’s Q and I 2 and was assumed present when p < 0.05 or I 2  > 50% [ 22 ]. Results of meta-analyses were visualized using forest plots. Subgroup analyses were conducted based on history of previous stroke (first-ever event, yes/no) and type of ischemic event (IS/TIA/both). Subgroup analyses based on sex were not conducted because the studies included both sexes in their analyses, and individual patient data were not available. As a sensitivity analysis, we conducted meta-analyses using unadjusted odds ratios. Publication bias was assessed by funnel plots and Egger´s regression. Statistical calculations were performed using the Software R Version 4.0.2 with the package “Metafor” [ 23 ]. Studies investigating the association between antidiabetic therapies and recurrent cardiovascular events after IS or TIA were summarized narratively.

Systematic literature search

The systematic literature search yielded 10,974 records. After screening titles and abstracts, 8,219 records were excluded, and 1,717 records were further screened based on full texts (Fig.  1 ). Finally, 159 studies met the eligibility criteria (supplementary references). Of those, 26 reported data for composite outcome, 71 for recurrent stroke, 10 for cardiac events, 104 for all-cause mortality, and five for cardiovascular mortality (Table  1 ). During data extraction an inter-rater reliability of 90% was reached. Authors of twenty-six studies were contacted for missing information, and seven of them provided the requested data. Most studies were observational studies (n = 146), and others were post-hoc analyses of randomized trials (n = 13). Follow-up duration ranged from end-of-hospital-stay to longer than 20 years. The diagnostic criteria used for DM varied highly including based on medical records or medication history only (n = 61), laboratory biomarkers only (n = 14) and both (n = 50). Twenty-one studies did not report the definition used. Prediabetes was defined either according to American Diabetes Association [ 24 ] or World Health Organization criteria [ 25 ], whereas one study defined prediabetes as a non-fasting glucose level of 140–198 mg/dL. IR was quantified using: HOMA-IR, Triglyceride-Glucose Index, Matsuda Insulin Sensitivity Index, Glucose/Insulin Ratio, QUICKI Index, and estimated glucose disposal rate. Overall, 67% (n = 107) of the included studies were rated as having good quality of evidence, 27% (n = 43) as fair and 6% (n = 9) as poor (supplementary Fig. 1). Study characteristics are presented in supplementary Table 2.

figure 1

Flowchart of the screening and selection process of the systematic review

Association of IGM with cardiovascular events

Composite vascular events.

Twenty-four studies were eligible for the exposure DM, three studies for prediabetes and two studies for IR. Five studies reporting data from the same cohort were excluded, resulting in 19 eligible studies for the exposure DM (16 reported HR, three reported OR; see supplementary Table 3). Except for one study reporting a 3-month follow-up period, all studies reported at least 1-year follow-up. One study that assessed incident DM during follow-up opposed to pre-existing DM as an exposure was not included in the analysis [ 26 ].

Presence of DM was statistically significantly associated with an increased risk of composite vascular events with a pooled HR (pHR) of 1.58 (95% confidence interval (CI) 1.34 to 1.85, I 2  = 88%) including 445,808 patients (Fig.  2 A) and a pooled OR (pOR) of 1.87 (95% CI 0.76 to 4.60, I 2  = 64%) including 1,609 patients. No publication bias was observed (supplementary Fig. 2). The meta-analysis of absolute risks reported in seven studies revealed that during a mean follow-up of three years, 43% (95% CI 23% to 64%) of stroke patients with DM reached a composite endpoint of a recurrent cardiovascular event or death. This rate was 17% (95% CI 3% to 31%) in patients without DM (supplementary Table 4).

figure 2

a Forest plot for the meta-analysis of studies that reported the association of diabetes with composite outcome. b Forest plot for the meta-analysis of studies that reported the association of prediabetes with composite outcome

Meta-analysis of two studies showed an increased risk of composite events associated with prediabetes with a pHR of 1.50 (95% CI 1.15 to 1.96, I 2  = 0%; Fig.  2 B) in 8,262 patients. An absolute risk of 31% (95% CI 12% to 50%) and 7% (95% CI 5% to 10%) was observed in the group of patients with and without prediabetes, respectively. IR was reported in two studies, which were derived from the same cohort. One of the studies demonstrated no association between high IR and composite vascular events [ 27 ]. In the other study, which only encompassed patients without DM, increased IR based on HOMA-IR was statistically significantly associated with an increased risk for vascular events [ 28 ].

Recurrent stroke

Sixty-three studies reported recurrent stroke outcome data in patients with DM, see supplementary Table 5. Follow-up duration ranged from discharge from hospital to a mean follow-up time of 12.3 years. Studies encompassing the same population were excluded from the analysis. Finally, 40 studies reporting HR and 12 studies reporting OR were eligible for analysis, respectively. The pHR was 1.42 (95% CI 1.29 to 1.56, I 2  = 92%; Fig.  3 A) involving 1.161.527 patients. There was evidence for possible publication bias (supplementary Fig. 3). Studies that reported OR involving 47,629 patients showed a similar increase of risk (pOR 1.33, 95% CI 1.13 to 1.56, I 2  = 48%; supplementary Fig. 4). Follow-up duration was not a statistically significant moderator for the outcome (p = 0.40). Neither the type of baseline event (IS or TIA), nor previous stroke was a statistically significant moderator (p = 0.08 and p = 0.90, respectively, see supplementary Fig. 5) in subgroup analyses. Baujat plots revealed that the studies contributing most to heterogeneity had a design of post-hoc analysis of randomized trials. Meta-analysis of absolute risks extracted from 23 studies resulted in 13% (95% CI 10% to 16%) for patients with diabetes vs. 9% (95% CI 6% to 11%) without, within a follow-up period of more than a year.

figure 3

a Forest plot for the meta-analysis of studies that reported the association of diabetes with recurrent stroke. b Forest plot for the meta-analysis of studies that reported the association of prediabetes with recurrent stroke. c Forest plot for the meta-analysis of studies that reported the association of insulin resistance with recurrent stroke

Patients with prediabetes had an increased risk for recurrent stroke compared to patients with normoglycemia (pHR in 10,429 patients 1.50, 95% CI 1.18 to 1.91, I 2  = 0%, see Fig.  3 B). This was also the case in terms of absolute risk 10% (95% CI 8% to 12%) and 7% (95% CI 7% to 8%), respectively. Of five studies eligible for IR, only three could be included in the meta-analysis, because multiple studies were conducted in the same cohort. The pHR for recurrent stroke associated with IR in 21,363 patients was 1.56, 95% CI 1.19 to 2.05, I 2  = 55% (Fig.  3 C). Absolute risks associated with IR during 10.4 months follow-up was 10% (95% CI 5% to 15%) vs. 7% (95% CI 6% to 7%) in patients without increased IR.

Cardiac events

All studies eligible for cardiac events comprised DM as the exposure, see supplementary Table 6. The shortest follow-up time was three months, all other studies followed patients for at least one year. One study that investigated new DM during follow-up was not included in the meta-analysis [ 26 ]. Presence of DM was associated with an increased risk of cardiac events with a pHR of 1.55 (95% CI 1.50 to 1.61, I 2  = 0%) involving 443,863 patients. The pOR of two studies with 839,029 patients was 1.47 (95% CI 0.48 to 4.44), I 2  = 89% (supplementary Fig. 6). Meta-analysis of three studies reporting data revealed an absolute risk of 5% (95% CI − 1% to 11%) in patients with DM and 3% (95% CI 0% to 6%) without DM. One study that investigated prediabetes reported a HR of 2.0 (95% CI 1.30 to 3.20) for cardiac events. No study reported IR as an exposure.

Association between IGM and mortality

Cardiovascular mortality.

Five studies reported data of cardiovascular mortality in patients with DM (supplementary Table 7). Meta-analysis involving 127,445 patients showed a statistically significant association between DM and cardiovascular mortality (pHR 1.65, 95% CI 1.41 to 1.93, I 2  = 50%, see supplementary Fig. 7). Pooling available data of absolute risks from three studies, resulted in a pooled risk of 18% (95% CI −10% to 47%) in patients with DM vs. 16% (95% CI −9% to 41%) in patients without DM, during 1 year of follow-up.

All-cause mortality

Ninety-four studies investigated associations between all-cause mortality and DM, see supplementary Table 8. Studies that included patients from the same population were excluded from the analysis (n = 10). Presence of DM was associated with an increased risk for all-cause mortality (pHR 1.56, 95% CI 1.34 to 1.82, I 2  = 99%, see Fig.  4 A) summarizing 42 studies including 1.031.472 patients. Subgroup analyses based on follow-up duration resulted in a pHR of 1.10 (95% CI 0.72 to1.68) during hospitalization (n = 3 studies), pHR of 1.35 (95% CI 1.18 to 1.56) up to one year (n = 12 studies), and pHR of 1.74 (95% CI 1.40 to 2.17) longer than one year (n = 27 studies). However, follow-up duration was not revealed as a statistically significant moderator (p = 0.15, see supplementary Fig. 8). The Galbraith plot revealed the most influential studies to be the subgroups of the study from Zamir et al. (supplementary Fig. 9). The meta-analysis of forty-two studies involving 3.290.353 patients reporting OR showed a risk estimate of 1.30 (95% CI 1.21 to 1.41, see supplementary Fig. 10). Subgroup analyses based on first-ever vs. recurrent event at baseline and the type of ischemic event revealed no statistically significant difference between groups. Funnel plots suggested existence of publication bias (supplementary Fig. 11). During a mean follow-up of 1.8 months, the absolute risk of all-cause mortality was 23% (95% CI 14% to 31%) for patients with DM vs. 17% (95% CI 11% to 23%) without DM.

figure 4

a Forest plot for the meta-analysis of studies that reported the association of diabetes with all-cause mortality. b Forest plot for the meta-analysis of studies that reported the association of prediabetes with all-cause mortality. c Forest plot for the meta-analysis of studies that reported the association of insulin resistance with all-cause mortality

Six studies were eligible for prediabetes and all-cause mortality (3 HR, 3 OR). Prediabetes was not statistically significantly associated with an increased risk for mortality after IS (pHR 1.82, 95% CI 0.73 to 4.57, I 2  = 78% in 9,378 patients, and pOR 1.37, 95% CI 0.54 to 3.43, I 2  = 71% in 1,969 patients, see Fig.  4 B & supplementary Fig. 12). Meta-analysis of absolute risks during a mean follow-up of seven months was 8% (95% CI 2% to 15%) for patients with prediabetes vs. 9% (95% CI 0% to 18%) with normoglycemia.

Nine studies reported IR as an exposure. The meta-analyses could not demonstrate an association between increased IR and mortality (pHR 1.31, 95% CI 0.66 to 2.59, I 2  = 85%, including 21,363 patients across three studies and pOR 1.05, 95% CI 0.76 to 1.45, I 2  = 16%, including 6,434 patients across 2 studies). Absolute risks were 6% (95% CI -1% to 12%) for patients with increased IR and 4% (95% CI 2% to 6%) without.

Sensitivity analyses with crude odds ratios

Sensitivity analyses using unadjusted odds ratios, to accommodate the variation in adjustment factors used across studies, revealed similar risk estimates, though often slightly higher than the respective adjusted pooled outcomes (supplementary Fig. 13 and 14).

Antidiabetic therapy and recurrent vascular events

Nine observational studies investigated the association between antidiabetic therapies and cardiovascular events after an IS or TIA in the preceding three months, see Table  2 . The drug classes investigated were metformin, sulfonylurea, thiazolidinedione, and incretin-mimetics. We did not identify and studies with SGLT-2 Inhibitors or alfa glucosidase inhibitors. Due to the differences in the exposure and comparator groups, we did not perform a meta-analysis. Studies showed a risk reduction for recurrent stroke, mortality and composite vascular events associated with the use of pioglitazone and lobeglitazone as well as a lower risk of mortality associated with metformin use [ 29 , 30 , 31 , 32 ]. There were no clear benefits in terms of decreased risk of cardiovascular events associated with sulfonylurea or incretin-mimetics [ 33 , 34 , 35 , 36 , 37 ].

In this systematic review and meta-analysis, we provide a comprehensive and up-to-date summary of previous studies investigating the association between IGM and residual cardiovascular risk following IS and TIA. To our knowledge, this is the first meta-analysis to investigate the risk of composite vascular events associated with IGM as well as the risk of mortality associated with DM in this population. The results of the presented meta-analysis indicate that (1) patients with DM have an approximately 1.6-fold (60%) increased risk of both death and recurrent vascular events after IS and TIA, (2) the risk of recurrent vascular events after stroke is already increased in the prediabetic stage and appears just as high as in patients with DM, and (3) presence of IR is associated with recurrent stroke risk. In contrast, this meta-analysis was unable to demonstrate an increased mortality risk after stroke associated with prediabetes or IR. Overall, there were significantly fewer eligible studies on prediabetes and IR compared to DM (Table  1 ).

DM is a well-known risk factor for cardiovascular disease. The results of our study confirm a robust association between DM and risk of composite recurrent vascular events after IS and TIA. We could confirm the risk of recurrent stroke associated with DM that was previously reported in a meta-analysis by Zhang et al . [ 14 ] The risk of mortality in patients with DM is observed to be 56% higher compared to patients without DM. Although mortality risk estimates were greater for diabetic patients with increasing mean follow-up durations of studies, we could not observe a statistically significant interaction between mortality risk and follow-up duration. This could be due to the fact that there were only a few studies with short-term follow-up in studies that reported HR (supplementary Fig. 8) and only a few studies with long-term follow-up in studies that reported OR (supplementary Fig. 10). Still, inferring from this finding, DM likely remains a relevant risk factor over time and an important target for secondary prevention strategies, given the high prevalence of DM in this population [ 6 ].

Our analyses demonstrated a positive relationship between prediabetes and recurrent vascular events as well as between IR and stroke recurrence. However, there was no association detected between the two conditions and mortality. This difference could have several reasons: First, patients with prediabetes or IR are less likely to have been exposed to deleterious effects of a dysregulated glucose metabolism for a longer time, compared to patients with DM. Second, the shorter follow-up duration of studies investigating prediabetes and IR generally limits the probability to detect difference in mortality risk. The risk associated with prediabetes and recurrent stroke is in line with a previous meta-analysis conducted by Pan et al . in 2019 [ 15 ]. Despite substantial methodological differences such as avoiding pooling ORs and HRs together and excluding studies with hemorrhagic stroke in our study, also having identified two more studies, similar to Pan et al ., we also could not demonstrate a relationship between prediabetes and mortality.

Contrary to DM, prediabetes has rather recently been regarded as a cardiovascular risk factor [ 39 ]. The meta-analysis conducted by Cai et al. showed a risk increase in all-cause mortality and vascular events associated with prediabetes in population-based cohorts as well as in patients with previous atherosclerotic disease [ 40 ]. Further, a recent analysis of the UK Biobank cohort including more than 400 thousand individuals confirmed the excess risk for any cardiovascular disease in patients with IGM compared to normoglycemia [ 41 ]. The risk was higher for DM than for prediabetes. Still, after accounting for obesity and use of antihypertensive and statins both risks were attenuated, lending support to the modifiability of the excess risk. Together with these previous findings, our results strongly support considering prediabetes as a continuous entity with DM on the spectrum of IGM, with a relevant increase in cardiovascular and mortality risk.

There was a statistically significant association between increased IR and stroke recurrence. However, it should be noted that, there were only three studies eligible for the analysis and the parameters used to define an increased IR as well as the timing of measurement after stroke (7 days and 14 days) was heterogeneous between studies. IR can be increased during the acute phase of the stroke due to the stress reaction and show changes during this time [ 42 ]. The increased relative risk for recurrent stroke observed in patients with IR compared to patients without IR was higher than the relative risk in diabetics compared to non-diabetics. This might be explained by the differences in the patient groups. Patients with DM are more likely to receive antidiabetic treatment and have a higher risk of dying before suffering a recurrent stroke. Another difference could be in the comparator groups, namely that the patients without IR could be generally healthier than patients without DM.

Despite the association between increased IR and stroke recurrence, we could not identify many studies with other cardiovascular outcomes. Furthermore, we encountered different parameters and criteria to define IR across studies. Thus, prognostic value of increased IR in terms of composite cardiovascular risk as well as the best biomarker to predict the said risk remains speculative in patients with IS or TIA. Further research is needed to investigate this conundrum.

We observed a significant research gap in the number of large studies with congruent definitions of prediabetes and IR. Uncertainty remains about the different diagnostic criteria for both prediabetes and IR [ 24 , 25 , 43 , 44 ], leading to the lack of adequate implementation of preventive strategies [ 45 ]. As the prevalence of prediabetes expected to rise, the whole spectrum of IGM rather than DM alone is assumed to gain more significance in terms of primary and secondary stroke prevention [ 46 ]. Consistent diagnostic criteria would facilitate a reliable data synthesis and the development of prevention strategies.

Until the advent of the GLP1 and SGLT2 therapies, no antidiabetic therapy has improved cardiovascular risk or death despite improvements in glucose control [ 47 ]. Both classes of drugs revolutionized the field after randomized controlled trials showed cardiovascular risk reduction in patients with DM [ 48 , 49 , 50 , 51 ]. However, until now, it is unclear if these drugs are equally effective at reducing cardiovascular risk in patients with IS [ 33 , 34 , 52 ]. As our systematic review indicates, to date, only few studies exist that investigated the effectiveness of antidiabetic therapy in preventing recurrent vascular events after an acute or subacute IS. Even though the promising results related to pioglitazone use in patients with IR from the IRIS trial unfortunately faced a limitation due to side effects [ 18 ], recent cohort studies shown beneficial effects associated with thiazolidinediones [ 29 , 30 ]. Clinical trial investigating secondary stroke prevention in patients with prediabetes are yet to been undertaken.

Strengths and limitations

The most important strength of our study lies in the comprehensiveness, encompassing over 10.000 records and having included more than seven million patients over all exposures and outcomes. This enabled us to investigate all three entities of IGM together. Another strength constitutes the methodology. We included studies with both outcome measures HR and OR, which led us to identify more studies. We also used multi-level meta-analysis to account for multiple subgroups of the same cohorts and used meta-regression to account for moderators.

There are limitations to this study. Firstly, as in every meta-analysis, the quality of synthesized evidence depends on the quality of evidence of the individual studies. We assessed the risk of bias of the included studies and could not identify an influence of studies with high risk of bias on the effect estimates. Secondly, we encountered high heterogeneity between studies. As this systematic review included observational studies, the high variability across study populations and diagnostic criteria used was expected. Further, the fact that studies used different adjustment factors in their multivariable analyses most likely contributed substantially to the high heterogeneity. To alleviate the difference in the adjustment factors, we have conducted sensitivity analyses. Both crude odds ratios and absolute risks indicated a similar change of risk estimates to the per protocol analyses, strengthening our primary results. Another factor contributing to heterogeneity could be methodological differences between studies, such as how competing events were treated. This could not be taken into consideration when determining eligibility, since the information was mostly not available. Finally, severity and duration of DM could not be taken into consideration.

Different types of IGM are associated with increased cardiovascular risk and mortality after IS and TIA. The entities of IGM should be considered as a continuous spectrum with increased cardiovascular risk that represent an important target for early cardiovascular prevention programs.

Availability of data and materials

The extracted data from the involved studies in this systematic review have been made available in supplementary material.

Abbreviations

  • Ischemic stroke

Diabetes mellitus

Impaired glucose metabolism

  • Insulin resistance

Hazard ratios

Odds ratios

Confidence interval

GBD 2019 Stroke Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 2021;20(10):795–820. https://doi.org/10.1016/S1474-4422(21)00252-0 .

Article   Google Scholar  

Chen Y, Wright N, Guo Y, et al. Mortality and recurrent vascular events after first incident stroke: a 9-year community-based study of 0·5 million Chinese adults. Lancet Glob Health. 2020;8(4):e580–90. https://doi.org/10.1016/S2214-109X(20)30069-3 .

Article   PubMed   PubMed Central   Google Scholar  

Carlsson A, Irewall AL, Graipe A, Ulvenstam A, Mooe T, Ögren J. Long-term risk of major adverse cardiovascular events following ischemic stroke or TIA. Sci Rep. 2023;13(1):8333. https://doi.org/10.1038/s41598-023-35601-x .

Mohan KM, Wolfe CDA, Rudd AG, Heuschmann PU, Kolominsky-Rabas PL, Grieve AP. Risk and cumulative risk of stroke recurrence: a systematic review and meta-analysis. Stroke. 2011;42(5):1489–94. https://doi.org/10.1161/STROKEAHA.110.602615 .

Article   PubMed   Google Scholar  

Sarwar N, Gao P, Kondapally Seshasai SR, et al. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. The Lancet. 2010;375(9733):2215–22. https://doi.org/10.1016/S0140-6736(10)60484-9 .

Lau LH, Lew J, Borschmann K, Thijs V, Ekinci EI. Prevalence of diabetes and its effects on stroke outcomes: a meta-analysis and literature review. J Diabetes Investig. 2019;10(3):780–92. https://doi.org/10.1111/jdi.12932 .

Kleindorfer DO, Towfighi A, Chaturvedi S, et al. Guideline for the Prevention of Stroke in Patients With Stroke and Transient Ischemic Attack: A Guideline From the American Heart Association/American Stroke Association. Stroke. 2021. https://doi.org/10.1161/str.0000000000000375 .

Schlesinger S, Neuenschwander M, Barbaresko J, et al. Prediabetes and risk of mortality, diabetes-related complications and comorbidities: umbrella review of meta-analyses of prospective studies. Diabetologia. 2022;65(2):275–85. https://doi.org/10.1007/s00125-021-05592-3 .

Nathan DM, Davidson MB, DeFronzo RA, et al. Impaired fasting glucose and impaired glucose tolerance: implications for care. Diabetes Care. 2007;30(3):753–9. https://doi.org/10.2337/dc07-9920 .

Abdul-Ghani MA, Tripathy D, DeFronzo RA. Contributions of β-cell dysfunction and insulin resistance to the pathogenesis of impaired glucose tolerance and impaired fasting glucose. Diabetes Care. 2006;29(5):1130–9. https://doi.org/10.2337/dc05-2179 .

Jia Q, Zheng H, Zhao X, et al. Abnormal glucose regulation in patients with acute stroke across China: prevalence and baseline patient characteristics. Stroke. 2012;43(3):650–7. https://doi.org/10.1161/STROKEAHA.111.633784 .

Kernan WN, Inzucchi SE, Viscoli CM, et al. Impaired insulin sensitivity among nondiabetic patients with a recent TIA or ischemic stroke. Neurology. 2003;60(9):1447–51. https://doi.org/10.1212/01.WNL.0000063318.66140.A3 .

Echouffo-Tcheugui JB, Xu H, Matsouaka RA, et al. Diabetes and long-term outcomes of ischaemic stroke: findings from Get With The Guidelines-Stroke. Eur Heart J. 2018;39(25):2376–86. https://doi.org/10.1093/eurheartj/ehy036 .

Zhang L, Li X, Wolfe CDA, O’Connell MDL, Wang Y. Diabetes as an independent risk factor for stroke recurrence in ischemic stroke patients: an updated meta-analysis. Neuroepidemiology. 2021;55(6):427–35. https://doi.org/10.1159/000519327 .

Pan Y, Chen W, Wang Y. Prediabetes and outcome of ischemic stroke or transient ischemic attack: a systematic review and meta-analysis. J Stroke Cerebrovasc Dis. 2019;28(3):683–92. https://doi.org/10.1016/j.jstrokecerebrovasdis.2018.11.008 .

Strain WD, Frenkel O, James MA, et al. Effects of semaglutide on stroke subtypes in type 2 diabetes: post hoc analysis of the randomized SUSTAIN 6 and PIONEER 6. Stroke. 2022;53(9):2749–57. https://doi.org/10.1161/STROKEAHA.121.037775 .

Zhou Z, Lindley RI, Rådholm K, et al. Canagliflozin and stroke in type 2 diabetes mellitus. Stroke. 2019;50(2):396–404. https://doi.org/10.1161/STROKEAHA.118.023009 .

Kernan WN, Viscoli CM, Furie KL, et al. Pioglitazone after ischemic stroke or transient ischemic attack. N Engl J Med. 2016;374(14):1321–31. https://doi.org/10.1056/NEJMoa1506930 .

Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021. https://doi.org/10.1136/bmj.n71 .

Kaynak N, Rackoll T, Endres M, Nave AH. The residual risk of impaired glucose metabolism on vascular events and mortality after ischemic stroke and the effect of antidiabetic therapy on reducing this risk: A systematic review and Meta-analysis. 2021. https://doi.org/10.17605/OSF.IO/JVYHW .

National Heart Lung and Blood Institute. Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies. 2013. Accessed December 18, 2020. https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools .

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60. https://doi.org/10.1136/bmj.327.7414.557 .

Viechtbauer W. metafor: Meta-analysis package for R. R package version 2.4–0. R package version 24–0. 2020;(1):1–275.

American Diabetes Association. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes-2020. Diabetes Care. 2020;43(January):S14-S31. https://doi.org/10.2337/dc20-S002 .

Geneva: World Health Organization. Classification of Diabetes Mellitus. 2019.

Rutten-Jacobs LCA, Keurlings PAJ, Arntz RM, et al. High incidence of diabetes after stroke in young adults and risk of recurrent vascular events: The FUTURE study. PLoS ONE. 2014. https://doi.org/10.1371/journal.pone.0087171 .

Lu Z, Xiong Y, Feng X, et al. Insulin resistance estimated by estimated glucose disposal rate predicts outcomes in acute ischemic stroke patients. Cardiovasc Diabetol. 2023;22(1):225. https://doi.org/10.1186/s12933-023-01925-1 .

Jin A, Wang S, Li J, et al. Mediation of systemic inflammation on insulin resistance and prognosis of nondiabetic patients with ischemic stroke. Stroke. 2023;54(3):759–69. https://doi.org/10.1161/STROKEAHA.122.039542 .

Yoo J, Jeon J, Baik M, Kim J. Lobeglitazone, a novel thiazolidinedione, for secondary prevention in patients with ischemic stroke: a nationwide nested case-control study. Cardiovasc Diabetol. 2023;22(1):106. https://doi.org/10.1186/s12933-023-01841-4 .

Woo MH, Lee HS, Kim J. Effect of pioglitazone in acute ischemic stroke patients with diabetes mellitus: a nested case-control study. Cardiovasc Diabetol. 2019;18(1):67. https://doi.org/10.1186/s12933-019-0874-5 .

Morgan CL, Inzucchi SE, Puelles J, Jenkins-Jones S, Currie CJ. Impact of treatment with pioglitazone on stroke outcomes: a real-world database analysis. Diabetes Obes Metab. 2018;20(9):2140–7. https://doi.org/10.1111/dom.13344 .

Tu WJ, Liu Z, Chao BH, et al. Metformin use is associated with low risk of case fatality and disability rates in first-ever stroke patients with type 2 diabetes. Ther Adv Chronic Dis. 2022;13:20406223221076896. https://doi.org/10.1177/20406223221076894 .

Chen DY, Wang SH, Mao CT, et al. Sitagliptin after ischemic stroke in type 2 diabetic patients: a nationwide cohort study. Medicine. 2015;94(28): e1128. https://doi.org/10.1097/MD.0000000000001128 .

Chen DY, Li YR, Mao CT, et al. Cardiovascular outcomes of vildagliptin in patients with type 2 diabetes mellitus after acute coronary syndrome or acute ischemic stroke. J Diabetes Investig. 2020;11(1):110–24. https://doi.org/10.1111/jdi.13078 .

Li YR, Tsai SS, Chen DY, et al. Linagliptin and cardiovascular outcomes in type 2 diabetes after acute coronary syndrome or acute ischemic stroke. Cardiovasc Diabetol. 2018;17(1):2. https://doi.org/10.1186/s12933-017-0655-y .

Favilla CG, Mullen MT, Ali M, Higgins P, Kasner SE. Sulfonylurea use before stroke does not influence outcome. Stroke. 2011;42(3):710–5. https://doi.org/10.1161/STROKEAHA.110.599274 .

Tsivgoulis G, Goyal N, Iftikhar S, et al. Sulfonylurea Pretreatment and In-Hospital Use Does Not Impact Acute Ischemic Strokes (AIS) Outcomes Following Intravenous Thrombolysis. J Stroke Cerebrovasc Dis. 2017;26(4):795–800. https://doi.org/10.1016/j.jstrokecerebrovasdis.2016.10.019 .

Horsdal HT, Mehnert F, Rungby J, Johnsen SP. Type of preadmission antidiabetic treatment and outcome among patients with ischemic stroke: a nationwide follow-up study. J Stroke Cerebrovasc Dis. 2012;21(8):717–25. https://doi.org/10.1016/j.jstrokecerebrovasdis.2011.03.007 .

Richter B, Hemmingsen B, Metzendorf MI, Takwoingi Y. Development of type 2 diabetes mellitus in people with intermediate hyperglycaemia. Cochrane Database Syst Rev. 2018;10(10): CD012661. https://doi.org/10.1002/14651858.CD012661.pub2 .

Cai X, Zhang Y, Li M, et al. Association between prediabetes and risk of all cause mortality and cardiovascular disease: updated meta-analysis. BMJ. 2020;370: m2297. https://doi.org/10.1136/bmj.m2297 .

Rentsch CT, Garfield V, Mathur R, et al. Sex-specific risks for cardiovascular disease across the glycaemic spectrum: a population-based cohort study using the UK Biobank. The Lancet Regional Health - Europe. 2023;32:1–14. https://doi.org/10.1016/j.lanepe.2023.100693 .

Huff TA, Lebovitz HE, Heyman A, Davis L. Serial changes in glucose utilization and insulin and growth hormone secretion in acute cerebrovascular disease. Stroke. 1972;3(5):543–52. https://doi.org/10.1161/01.STR.3.5.543 .

Cleeman JI. Executive summary of the third report of the National Cholesterol Education Program (NCEP) expert panel on detection, evaluation, and treatment of high blood cholesterol in adults (adult treatment panel III). J Am Med Assoc. 2001;285(19):2486–97. https://doi.org/10.1001/jama.285.19.2486 .

Kahn R, Buse J, Ferrannini E, Stern M. The metabolic syndrome: time for a critical appraisal: joint statement from the American Diabetes Association and the European Association for the Study of Diabetes. Diabetes Care. 2005;28(9):2289–304. https://doi.org/10.2337/diacare.28.9.2289 .

Echouffo-Tcheugui JB, Selvin E. Prediabetes and what it means: the epidemiological evidence. Annu Rev Public Health. 2020;42:59–77. https://doi.org/10.1146/annurev-publhealth-090419-102644 .

Lee M, Saver JL, Hong KS, Song S, Chang KH, Ovbiagele B. Effect of pre-diabetes on future risk of stroke: meta-analysis. BMJ. 2012;344: e3564. https://doi.org/10.1136/bmj.e3564 .

Gerstein HC, Miller ME, Byington RP, et al. Effects of intensive glucose lowering in type 2 diabetes. N Engl J Med. 2008;358(24):2545–59.

Marso SP, Bain SC, Consoli A, et al. Semaglutide and cardiovascular outcomes in patients with type 2 diabetes. N Engl J Med. 2016;375(19):1834–44. https://doi.org/10.1056/NEJMoa1607141 .

Zinman B, Wanner C, Lachin JM, et al. Empagliflozin, cardiovascular outcomes, and mortality in type 2 diabetes. N Engl J Med. 2015;373(22):2117–28. https://doi.org/10.1056/NEJMoa1504720 .

Neal B, Perkovic V, Mahaffey KW, et al. Canagliflozin and cardiovascular and renal events in type 2 diabetes. N Engl J Med. 2017;377(7):644–57. https://doi.org/10.1056/NEJMoa1611925 .

Dawson J, Béjot Y, Christensen LM, et al. European Stroke Organisation (ESO) guideline on pharmacological interventions for long-term secondary prevention after ischaemic stroke or transient ischaemic attack. Eur Stroke J. 2022;7(3):I–II. https://doi.org/10.1177/23969873221100032 .

Gerstein HC, Hart R, Colhoun HM, et al. The effect of dulaglutide on stroke: an exploratory analysis of the REWIND trial. Lancet Diabetes Endocrinol. 2020;8(2):106–14. https://doi.org/10.1016/S2213-8587(19)30423-1 .

Download references

This study was partially funded by the Corona Foundation. Protocol https://osf.io/jvyhw . The funder had no role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Center for Stroke Research Berlin (CSB), Charité– Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany

Nurcennet Kaynak, Valentin Kennel, Matthias Endres & Alexander H. Nave

Department of Neurology with Experimental Neurology, Charité– Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10117, Berlin, Germany

Nurcennet Kaynak, Valentin Kennel, Torsten Rackoll, Matthias Endres & Alexander H. Nave

Berlin Institute of Health at Charité, Charité– Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany

Nurcennet Kaynak & Alexander H. Nave

German Center for Neurodegenerative Diseases (DZNE), partner site Berlin, Berlin, Germany

Nurcennet Kaynak & Matthias Endres

German Center for Cardiovascular Research (DZHK), partner site Berlin, Berlin, Germany

Nurcennet Kaynak, Matthias Endres & Alexander H. Nave

Berlin Institute of Health (BIH) QUEST Center for Responsible Research, Charité– Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany

Torsten Rackoll

Department of Biometry and Clinical Epidemiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin und Humboldt-Universität zu Berlin, Berlin, Germany

Daniel Schulze

German Center for Mental Health (DZPG), partner site Berlin, Berlin, Germany

Matthias Endres

You can also search for this author in PubMed   Google Scholar

Contributions

NK had full access to study data and is the guarantor of the study, taking full responsibility for the conduct of the study. NK, AHN, TR and ME conceived the study design and contributed to study protocol. NK, VK, and TR acquired data and performed the analysis. DS contributed to statistical methods and analyses. NK drafted the manuscript and all authors contributed to interpretation of the data and critical appraisal of the final work. AHN supervised the study. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Corresponding author

Correspondence to Alexander H. Nave .

Ethics declarations

Ethics approval.

Ethics approval was not required.

Consent for publication

Not applicable.

Competing interests

NK, VK, DS report no conflicts of interest. ME reports grants from Bayer and fees paid to the Charité from Amgen, AstraZeneca, Bayer Healthcare, Boehringer Ingelheim, BMS, Daiichi Sankyo, Sanofi, Pfizer, all outside the submitted work. AHN receives funding from the Corona foundation and the German Center for cardiovascular research (DZHK), no conflict of interest. TR receives funding from the European Commission, no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file1 (docx 3212 kb), rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kaynak, N., Kennel, V., Rackoll, T. et al. Impaired glucose metabolism and the risk of vascular events and mortality after ischemic stroke: A systematic review and meta-analysis. Cardiovasc Diabetol 23 , 323 (2024). https://doi.org/10.1186/s12933-024-02413-w

Download citation

Received : 30 June 2024

Accepted : 19 August 2024

Published : 31 August 2024

DOI : https://doi.org/10.1186/s12933-024-02413-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Prediabetes
  • Vascular events

Cardiovascular Diabetology

ISSN: 1475-2840

literature reviews bias

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

medicina-logo

Article Menu

literature reviews bias

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Evaluating the effectiveness of proton beam therapy compared to conventional radiotherapy in non-metastatic rectal cancer: a systematic review of clinical outcomes.

literature reviews bias

1. Introduction

2. materials and methods, 2.1. review protocol and registration, 2.2. literature search, 2.3. inclusion and exclusion criteria, 2.4. literature screening, 2.5. outcomes, 2.6. data extraction, 2.7. quality and risk of bias assessment, 2.8. data synthesis, 2.9. subgroup analysis, 3.1. the results of the literature search, 3.2. overview of included studies, 3.3. adverse treatment outcomes, 3.3.1. acute toxicity, 3.3.2. late toxicity, 3.3.3. radiation exposure from dosimetric analyses, other structures, models for tumour control, 3.4. oncological outcomes, 3.4.1. local and distant recurrence, 3.4.2. overall survival, 3.4.3. progression-free survival, 3.5. results from quality and risk of bias assessment, 4. discussion, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, appendix a. search query.

  • exp Rectal Neoplasms/
  • ((rectal or rectum or recti or pararect* or anal or anus or perianal) adj5 (adenoma* or cancer* or carcinoma* or tumo* or malign* or neoplas* or mass)).tw,kf.
  • Proton Therapy/
  • exp Radiotherapy, Computer-Assisted/
  • photon*.tw,kf.
  • proton*.tw,kf.
  • 4 or 6 or 9
  • 5 or 7 or 8
  • 3 and 10 and 11
  • Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021 , 71 , 209–249. [ Google Scholar ] [ CrossRef ]
  • Xi, Y.; Xu, P. Global colorectal cancer burden in 2020 and projections to 2040. Transl. Oncol. 2021 , 14 , 101174. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bhudia, J.; Glynne-Jones, R. The Evolving Neoadjuvant Treatment Paradigm for Patients with Locoregional mismatch Repair Proficient Rectal Cancer. Curr. Treat. Options Oncol. 2022 , 23 , 453–473. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • van Gijn, W.; Marijnen, C.A.; Nagtegaal, I.D.; Kranenbarg, E.M.-K.; Putter, H.; Wiggers, T.; Rutten, H.J.; Påhlman, L.; Glimelius, B.; van de Velde, C.J.; et al. Preoperative radiotherapy combined with total mesorectal excision for resectable rectal cancer: 12-year follow-up of the multicentre, randomised controlled TME trial. Lancet Oncol. 2011 , 12 , 575–582. [ Google Scholar ] [ CrossRef ]
  • Sebag-Montefiore, D.; Stephens, R.J.; Steele, R.; Monson, J.; Grieve, R.; Khanna, S.; Quirke, P.; Couture, J.; de Metz, C.; Myint, A.S.; et al. Preoperative radiotherapy versus selective postoperative chemoradiotherapy in patients with rectal cancer (MRC CR07 and NCIC-CTG C016): A multicentre, randomised trial. Lancet 2009 , 373 , 811–820. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hong, Y.S.; Nam, B.-H.; Kim, K.-P.; Kim, J.E.; Park, S.J.; Park, Y.S.; Park, J.O.; Kim, S.Y.; Kim, J.H.; Ahn, J.B.; et al. Oxaliplatin, fluorouracil, and leucovorin versus fluorouracil and leucovorin as adjuvant chemotherapy for locally advanced rectal cancer after preoperative chemoradiotherapy (ADORE): An open-label, multicentre, phase 2, randomised controlled trial. Lancet Oncol. 2014 , 15 , 1245–1253. [ Google Scholar ] [ CrossRef ]
  • Martling, A.; Holm, T.; Johansson, H.; Rutqvist, L.; Cedermark, B. The Stockholm II trial on preoperative radiotherapy in rectal carcinoma: Long-term follow-up of a population-based study. Cancer 2001 , 92 , 896–902. [ Google Scholar ] [ CrossRef ]
  • Figueiredo, N.; Panteleimonitis, S.; Popeskou, S.; Cunha, J.F.; Qureshi, T.; Beets, G.L.; Heald, R.J.; Parvaiz, A. Delaying surgery after neoadjuvant chemoradiotherapy in rectal cancer has no influence in surgical approach or short-term clinical outcomes. Eur. J. Surg. Oncol. 2018 , 44 , 484–489. [ Google Scholar ] [ CrossRef ]
  • Appelt, A.L.; Pløen, J.; Harling, H.; Jensen, F.S.; Jensen, L.H.; Jørgensen, J.C.R.; Lindebjerg, J.; Rafaelsen, S.R.; Jakobsen, A. High-dose chemoradiotherapy and watchful waiting for distal rectal cancer: A prospective observational study. Lancet Oncol. 2015 , 16 , 919–927. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Habr-Gama, A.; Perez, R.O.; Nadalin, W.; Sabbaga, J.; Ribeiro Jr, U.; e Sousa Jr, A.H.S.; Campos, F.G.; Kiss, D.R.; Gama-Rodrigues, J. Operative versus nonoperative treatment for stage 0 distal rectal cancer following chemoradiation therapy: Long-term results. Ann. Surg. 2004 , 240 , 711–718. [ Google Scholar ] [ CrossRef ]
  • Sauer, R.; Becker, H.; Hohenberger, W.; Rodel, C.; Wittekind, C.; Fietkau, R.; Martus, P.; Tschmelitsch, J.; Hager, E.; Hess, C.F.; et al. Preoperative versus Postoperative Chemoradiotherapy for Rectal Cancer. N. Engl. J. Med. 2004 , 351 , 1731–1740. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fok, M.; Toh, S.; Easow, J.; Fowler, H.; Clifford, R.; Parsons, J.; Vimalachandran, D. Proton beam therapy in rectal cancer: A systematic review and meta-analysis. Surg. Oncol. 2021 , 38 , 101638. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Brændengen, M.; Tveit, K.M.; Bruheim, K.; Cvancarova, M.; Berglund; Glimelius, B. Late Patient-Reported Toxicity after Preoperative Radiotherapy or Chemoradiotherapy in Nonresectable Rectal Cancer: Results From a Randomized Phase III Study. Int. J. Radiat. Oncol. 2011 , 81 , 1017–1024. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhao, L.; Liu, R.; Zhang, Z.; Li, T.; Li, F.; Liu, H.; Li, G. Oxaliplatin/fluorouracil-based adjuvant chemotherapy for locally advanced rectal cancer after neoadjuvant chemoradiotherapy and surgery: A systematic review and meta-analysis of randomized controlled trials. Color. Dis. 2016 , 18 , 763–772. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bosset, J.; Calais, G.; Daban, A.; Berger, C.; Radosevic-Jelic, L.; Maingon, P.; Bardet, E.; Pierart, M.; Briffaux, A.; Group, E.R. Preoperative chemoradiotherapy versus preoperative radiotherapy in rectal cancer patients: Assessment of acute toxicity and treatment compliance: Report of the 22921 randomised trial conducted by the EORTC Radiotherapy Group. Eur. J. Cancer 2004 , 40 , 219–224. [ Google Scholar ] [ CrossRef ]
  • Jin, J.; Tang, Y.; Hu, C.; Jiang, L.-M.; Jiang, J.; Li, N.; Liu, W.-Y.; Chen, S.-L.; Li, S.; Lu, N.-N.; et al. Multicenter, Randomized, Phase III Trial of Short-Term Radiotherapy Plus Chemotherapy Versus Long-Term Chemoradiotherapy in Locally Advanced Rectal Cancer (STELLAR). J. Clin. Oncol. 2022 , 40 , 1681–1692. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Morton, A.J.; Rashid, A.; Shim, J.S.C.; West, J.; Humes, D.J.; Grainge, M.J. Long-term adverse effects and healthcare burden of rectal cancer radiotherapy: Systematic review and meta-analysis. ANZ J. Surg. 2023 , 93 , 42–53. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mohan, R. A review of proton therapy—Current status and future directions. Precis. Radiat. Oncol. 2022 , 6 , 164–176. [ Google Scholar ] [ CrossRef ]
  • Gaito, S.; Aznar, M.; Burnet, N.; Crellin, A.; France, A.; Indelicato, D.; Kirkby, K.; Pan, S.; Whitfield, G.; Smith, E. Assessing Equity of Access to Proton Beam Therapy: A Literature Review. Clin. Oncol. 2023 , 35 , e528–e536. [ Google Scholar ] [ CrossRef ]
  • Vitti, E.T.; Parsons, J.L. The Radiobiological Effects of Proton Beam Therapy: Impact on DNA Damage and Repair. Cancers 2019 , 11 , 946. [ Google Scholar ] [ CrossRef ]
  • Levin, W.; Kooy, H.; Loeffler, J.S.; Delaney, T.F. Proton beam therapy. Br. J. Cancer 2005 , 93 , 849–854. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hu, M.; Jiang, L.; Cui, X.; Zhang, J.; Yu, J. Proton beam therapy for cancer in the era of precision medicine. J. Hematol. Oncol. 2018 , 11 , 136. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Barsky, A.R.; Reddy, V.K.; Plastaras, J.P.; Ben-Josef, E.; Metz, J.M.; Wojcieszynski, A.P. Proton beam re-irradiation for gastrointestinal malignancies: A systematic review. J. Gastrointest. Oncol. 2020 , 11 , 187–202. [ Google Scholar ] [ CrossRef ]
  • Leeman, J.E.; Romesser, P.B.; Zhou, Y.; McBride, S.; Riaz, N.; Sherman, E.; Cohen, M.A.; Cahlon, O.; Lee, N. Proton therapy for head and neck cancer: Expanding the therapeutic window. Lancet Oncol. 2017 , 18 , e254–e265. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Vaios, E.J.; Wo, J.Y. Proton beam radiotherapy for anal and rectal cancers. J. Gastrointest. Oncol. 2020 , 11 , 176–186. [ Google Scholar ] [ CrossRef ]
  • Rombi, B.; Ares, C.; Hug, E.B.; Schneider, R.; Goitein, G.; Staab, A.; Albertini, F.; Bolsi, A.; Lomax, A.J.; Timmermann, B. Spot-Scanning Proton Radiation Therapy for Pediatric Chordoma and Chondrosarcoma: Clinical Outcome of 26 Patients Treated at Paul Scherrer Institute. Int. J. Radiat. Oncol. 2013 , 86 , 578–584. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, X.; van Rossum, P.S.; Chu, Y.; Hobbs, B.P.; Grassberger, C.; Hong, T.S.; Liao, Z.; Yang, J.; Zhang, X.; Netherton, T.; et al. Severe Lymphopenia During Chemoradiation Therapy for Esophageal Cancer: Comprehensive Analysis of Randomized Phase 2B Trial of Proton Beam Therapy Versus Intensity Modulated Radiation Therapy. Int. J. Radiat. Oncol. 2024 , 118 , 368–377. [ Google Scholar ] [ CrossRef ]
  • Chang, C.-L.; Lin, K.-C.; Chen, W.-M.; Shia, B.-C.; Wu, S.-Y. Comparing the oncologic outcomes of proton therapy and intensity-modulated radiation therapy for head and neck squamous cell carcinoma. Radiother. Oncol. 2024 , 190 , 109971. [ Google Scholar ] [ CrossRef ]
  • Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E. The Prisma 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021 , 372 , 71. [ Google Scholar ] [ CrossRef ]
  • Glynne-Jones, R.; Wyrwicz, L.; Tiret, E.; Brown, G.; Rödel Cd Cervantes, A.; Arnold, D. Rectal cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann. Oncol. 2017 , 28 , iv22–iv40. [ Google Scholar ] [ CrossRef ]
  • Sterne, J.A.C.; Hernán, M.A.; Reeves, B.C.; Savović, J.; Berkman, N.D.; Viswanathan, M.; Henry, D.; Altman, D.G.; Ansari, M.T.; Boutron, I.; et al. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016 , 355 , i4919. [ Google Scholar ] [ CrossRef ]
  • Wan, X.; Wang, W.; Liu, J.; Tong, T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med. Res. Methodol. 2014 , 14 , 135. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Higgins, J.P.; Thompson, S.G. Quantifying heterogeneity in a meta-analysis. Stat. Med. 2002 , 21 , 1539–1558. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Colaco, R.J.; Nichols, R.C.; Huh, S.; Getman, N.; Ho, M.W.; Li, Z.; Morris, C.G.; Mendenhall, W.M.; Mendenhall, N.P.; Hoppe, B.S. Protons offer reduced bone marrow, small bowel, and urinary bladder exposure for patients receiving neoadjuvant radiotherapy for resectable rectal cancer. J. Gastrointest. Oncol. 2014 , 5 , 388. [ Google Scholar ]
  • Isacsson, U.; Montelius, A.; Jung, B.; Glimelius, B. Comparative treatment planning between proton and X-ray therapy in locally advanced rectal cancer. Radiother. Oncol. 1996 , 41 , 263–272. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kronborg, C.J.; Jørgensen, J.B.; Petersen, J.B.; Jensen, L.N.; Iversen, L.H.; Pedersen, B.G.; Spindler, K.-L.G. Pelvic insufficiency fractures, dose volume parameters and plan optimization after radiotherapy for rectal cancer. Clin. Transl. Radiat. Oncol. 2019 , 19 , 72–76. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Moningi, S.; Ludmir, E.B.; Polamraju, P.; Williamson, T.; Melkun, M.M.; Herman, J.D.; Krishnan, S.; Koay, E.J.; Koong, A.C.; Minsky, B.D.; et al. Definitive hyperfractionated, accelerated proton reirradiation for patients with pelvic malignancies. Clin. Transl. Radiat. Oncol. 2019 , 19 , 59–65. [ Google Scholar ] [ CrossRef ]
  • Pedone, C.; Sorcini, B.; Staff, C.; Färlin, J.; Fokstuen, T.; Frödin, J.-E.; Nilsson, P.J.; Martling, A.; Valdman, A. Preoperative short-course radiation therapy with PROtons compared to photons in high-risk RECTal cancer (PRORECT): Initial dosimetric experience. Clin. Transl. Radiat. Oncol. 2023 , 39 , 100562. [ Google Scholar ] [ CrossRef ]
  • Radu, C.; Norrlid, O.; Brændengen, M.; Hansson, K.; Isacsson, U.; Glimelius, B. Integrated peripheral boost in preoperative radiotherapy for the locally most advanced non-resectable rectal cancer patients. Acta Oncol. 2013 , 52 , 528–537. [ Google Scholar ] [ CrossRef ]
  • Rønde, H.S.; Kallehauge, J.F.; Kronborg, C.J.S.; Nyvang, L.; Rekstad, B.L.; Hanekamp, B.A.; Appelt, A.L.; Guren, M.G.; Spindler, K.L.S. Intensity modulated proton therapy planning study for organ at risk sparing in rectal cancer re-irradiation. Acta Oncol. 2021 , 60 , 1436–1439. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wolff, H.A.; Wagner, D.M.; Conradi, L.-C.; Hennies, S.; Ghadimi, M.; Hess, C.F.; Christiansen, H. Irradiation with protons for the individualized treatment of patients with locally advanced rectal cancer: A planning study with clinical implications. Radiother. Oncol. 2012 , 102 , 30–37. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Benson, A.B.; Venook, A.P.; Al-Hawary, M.M.; Azad, N.; Chen, Y.-J.; Ciombor, K.K.; Cohen, S.; Cooper, H.S.; Deming, D.; Garrido-Laguna, I. Rectal cancer, version 2.2022, NCCN clinical practice guidelines in oncology. J. Natl. Compr. Cancer Netw. 2022 , 20 , 1139–1167. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Berman, A.T.; Both, S.; Sharkoski, T.; Goldrath, K.; Tochner, Z.; Apisarnthanarax, S.; Metz, J.M.; Plastaras, J.P. Proton Reirradiation of Recurrent Rectal Cancer: Dosimetric Comparison, Toxicities, and Preliminary Outcomes. Int. J. Part. Ther. 2014 , 1 , 2–13. [ Google Scholar ] [ CrossRef ]
  • Jeans, E.B.; Jethwa, K.R.; Harmsen, W.S.; Neben-Wittich, M.; Ashman, J.B.; Merrell, K.W.; Giffey, B.; Ito, S.; Kazemba, B.; Beltran, C.; et al. Clinical Implementation of Preoperative Short-Course Pencil Beam Scanning Proton Therapy for Patients with Rectal Cancer. Adv. Radiat. Oncol. 2020 , 5 , 865–870. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ofuya, M.; McParland, L.; Murray, L.; Brown, S.; Sebag-Montefiore, D.; Hall, E. Systematic review of methodology used in clinical studies evaluating the benefits of proton beam therapy. Clin. Transl. Radiat. Oncol. 2019 , 19 , 17–26. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jabbour, S.K.; Patel, S.; Herman, J.M.; Wild, A.; Nagda, S.N.; Altoos, T.; Tunceroglu, A.; Azad, N.; Gearheart, S.; Moss, R.A.; et al. Intensity-Modulated Radiation Therapy for Rectal Carcinoma Can Reduce Treatment Breaks and Emergency Department Visits. Int. J. Surg. Oncol. 2012 , 2012 , 891067. [ Google Scholar ] [ CrossRef ]
  • Samuelian, J.M.; Callister, M.D.; Ashman, J.B.; Young-Fadok, T.M.; Borad, M.J.; Gunderson, L.L. Reduced Acute Bowel Toxicity in Patients Treated With Intensity-Modulated Radiotherapy for Rectal Cancer. Int. J. Radiat. Oncol. 2012 , 82 , 1981–1987. [ Google Scholar ] [ CrossRef ]
  • Sipaviciute, A.; Sileika, E.; Burneckis, A.; Dulskas, A. Late gastrointestinal toxicity after radiotherapy for rectal cancer: A systematic review. Int. J. Color. Dis. 2020 , 35 , 977–983. [ Google Scholar ] [ CrossRef ]
  • Koroulakis, A.; Molitoris, J.; Kaiser, A.; Hanna, N.; Bafford, A.; Jiang, Y.; Bentzen, S.; Regine, W.F. Reirradiation for Rectal Cancer Using Pencil Beam Scanning Proton Therapy: A Single Institutional Experience. Adv. Radiat. Oncol. 2021 , 6 , 100595. [ Google Scholar ] [ CrossRef ]
  • Hiroshima, Y.; Ishikawa, H.; Murakami, M.; Nakamura, M.; Shimizu, S.; Enomoto, T.; Oda, T.; Mizumoto, M.; Nakai, K.; Okumura, T.; et al. Proton Beam Therapy for Local Recurrence of Rectal Cancer. Anticancer Res 2021 , 41 , 3589–3595. [ Google Scholar ] [ CrossRef ]
  • Jankowski, M.; Bała, D.; Las-Jankowska, M.; Wysocki, W.; Nowikiewicz, T.; Zegarski, W. Overall treatment outcome—Analysis of long-term results of rectal cancer treatment on the basis of a new parameter. Arch. Med. Sci. 2020 , 16 , 825–833. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Takagawa, Y.; Suzuki, M.; Seto, I.; Azami, Y.; Machida, M.; Takayama, K.; Sulaiman, N.S.; Nakasato, T.; Kikuchi, Y.; Murakami, M.; et al. Proton beam reirradiation for locally recurrent rectal cancer patients with prior pelvic irradiation. J. Radiat. Res. 2024 , 65 , 379–386. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tambas, M.; van der Laan, H.P.; Steenbakkers, R.J.; Doyen, J.; Timmermann, B.; Orlandi, E.; Hoyer, M.; Haustermans, K.; Georg, P.; Burnet, N.G.; et al. Current practice in proton therapy delivery in adult cancer patients across Europe. Radiother. Oncol. 2022 , 167 , 7–13. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Amini, A.; Raben, D.; Crawford, E.D.; Flaig, T.W.; Kessler, E.R.; Lam, E.T.; Maroni, P.; Pugh, T.J. Patient characterization and usage trends of proton beam therapy for localized prostate cancer in the United States: A study of the National Cancer Database. Urol. Oncol. Semin. Orig. Investig. 2017 , 35 , 438–446. [ Google Scholar ] [ CrossRef ]
  • Willmann, J.; Leiser, D.; Weber, D.C. Oncological Outcomes, Long-Term Toxicities, Quality of Life and Sexual Health after Pencil-Beam Scanning Proton Therapy in Patients with Low-Grade Glioma. Cancers 2023 , 15 , 5287. [ Google Scholar ] [ CrossRef ]
  • Khong, J.; Tee, H.; Gorayski, P.; Le, H.; Penniment, M.; Jessop, S.; Hansford, J.; Penfold, M.; Green, J.; Skelton, K.; et al. Proton beam therapy in paediatric cancer: Anticipating the opening of the Australian Bragg Centre for Proton Therapy and Research. J. Med. Imaging Radiat. Oncol. 2023 . [ Google Scholar ]

Click here to enlarge figure

AuthorYearCountry of PublicationStudy DesignTotal Patients (n)Age (Median; Range)Sex (M:F)Tumour GradeTumour Stage (UICC-TMN)Treatment
Colaco et al. [ ]2014USAComparative dosimetric study8NRNRT2-T3NRPBT, 3DCRT, IMRT
Isacsson et al. [ ]1996SwedenComparative dosimetric study660 (47–79)4:2T4IIIPBT, X-ray, Mixed
Kronborg et al. [ ]2020DenmarkComparative dosimetric study969 (35–81)5:4T3-T4IIa-IIIPBT, 3DCRT, IMRT, VMAT
Moningi et al. [ ]2019USAComparative dosimetric study and retrospective single-arm non-randomised trial1574 (55–91)8:7NRNRPBT, 3DCRTPBT reirridiation
Pedone et al. [ ]2023SwedenComparative dosimetric study2057 (36–73)12:8T2-T4IIb-IIIPBT, VMAT
Radu et al. [ ]2013SwedenComparative dosimetric study764 (52–75)6:1T4IIIPBT, VMAT
Rønde et al. [ ]2021Denmark, NorwayComparative dosimetric study15NR8:7NRNRPBT, 3DCRT,
Wolff et al. [ ]2012GermanyRetrospective comparative dosimetric study2562.5 (44–73)15:10T2-T4IIa-IIIPBT, 3DCRT, IMRT, RapidArc
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Le, K.; Marchant, J.N.; Le, K.D.R. Evaluating the Effectiveness of Proton Beam Therapy Compared to Conventional Radiotherapy in Non-Metastatic Rectal Cancer: A Systematic Review of Clinical Outcomes. Medicina 2024 , 60 , 1426. https://doi.org/10.3390/medicina60091426

Le K, Marchant JN, Le KDR. Evaluating the Effectiveness of Proton Beam Therapy Compared to Conventional Radiotherapy in Non-Metastatic Rectal Cancer: A Systematic Review of Clinical Outcomes. Medicina . 2024; 60(9):1426. https://doi.org/10.3390/medicina60091426

Le, Kelvin, James Norton Marchant, and Khang Duy Ricky Le. 2024. "Evaluating the Effectiveness of Proton Beam Therapy Compared to Conventional Radiotherapy in Non-Metastatic Rectal Cancer: A Systematic Review of Clinical Outcomes" Medicina 60, no. 9: 1426. https://doi.org/10.3390/medicina60091426

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Psychiatry Neurosci
  • v.34(6); 2009 Nov

Logo of jpn

Bias in the research literature and conflict of interest: an issue for publishers, editors, reviewers and authors, and it is not just about the money

Conflicts of interest (COIs) of researchers have been a frequent topic recently in the popular press and scientific journals. Of particular interest to psychiatric researchers are the investigations in the US senate, led by Senator Charles Grassley. A recent article in Science discusses the history and current state of these investigations. 1 For those who like to keep score, Science has a list of the 9 psychiatric researchers who have been investigated, the amounts of money they received from drug companies and the amounts they mention in COI disclosures. 2 Much of what has been written about COI concerns drug company payments to researchers. However, COIs are an issue for publishers and editors of journals, reviewers of manuscripts and authors. Conflicts of interest exist in every aspect of the production of research journals, and the conflicts derive from more than just money paid to researchers by drug companies. The purpose of this editorial is first to discuss the nature of COIs and to describe some of the human behavioural research relevant to COIs. I will then discuss how COIs pervade every aspect of publishing and how the Journal of Psychiatry and Neuroscience attempts to deal with these issues. Finally, I will argue that there is no entirely satisfactory way of dealing with COIs, but that all researchers should be aware of the issues discussed here to minimize the extent to which COIs can distort the scientific literature.

Creation of bias and the nature of COIs

A COI occurs when individuals’ personal interests are in conflict with their professional obligations. Often this means that someone will profit personally from decisions made in his or her professional role. The personal profit is not necessarily monetary; it could be progress toward the personal goals of the individual or organization, for example the success of a journal for a publisher or editor or the acceptance of ideas for a researcher. The concern is that a COI may bias behaviour, and it is the potential for bias that makes COIs so important. Before getting into the specifics of COIs, I will describe some of the research on the biases we all have, the evidence that we are not always aware of our own biases, how biases can be created by vested interests and how people behave in response to revelations of COIs. The idea that scientists are objective seekers of truth is a pleasing fiction, but counterproductive in so far as it can lessen vigilance against bias.

A recent short review in Science asks how well people know their own minds and concludes the answer is not very well. 3 This is because “In real life, people do not realize that their self-knowledge is a construction, and fail to recognize that they possess a vast adaptive unconscious that operates out of their conscious awareness.” Wilson and Brekke 4 reviewed some of the unwanted influences on judgments and evaluations. They concluded that people find it difficult to avoid unwanted responses because of mental processing that is unconscious or uncontrollable. Moore and Loewenstein 5 argue that “the automatic nature of self-interest gives it a primal power to influence judgment and makes it difficult for people to understand its influence on their judgment, let alone eradicate its influence.” They also point out that in contrast to self-interest, understanding one’s ethical and professional obligations involves a more thoughtful process. The involvement of different cognitive processes may make it difficult to reconcile self-interest and obligations. MacCoun, 6 in an extensive review, examined the experimental evidence about bias in the interpretation and use of research results. He also discussed the evidence and theories concerning the cognitive and motivational mechanisms that produce bias. He concluded that people assume that their own views are objective and “that subjectivity (e.g., due to personal ideology) is the most likely explanation for their opponents’ conflicting perceptions.” This is consistent with the suggestion of Platt, almost 50 years ago, that researchers’ attachment to their own ideas results in competition among researchers rather than ideas. 7

An early experimental study by Mahoney 8 is a particularly striking example of how researchers’ bias can influence their behaviour. Reviewers were asked to referee manuscripts, all of which had identical methodology but reported different results. Reviewers were strongly biased against manuscripts that reported results that contradicted their own theoretical perspectives. This can have a deleterious effect as ideas that have long since been contradicted can persist in the literature. 9 , 10 Researchers’ biases caused by preference for their own ideas can cause a serious COI when they present their own work and when they are involved in any aspect of peer review. Nonetheless, much more attention is paid to COIs owing to external influences such as money than to COIs related to researchers’ inherent biases.

Cain and Detsky 11 reviewed some of the evidence on how biases can be created and how they can bias opinions in everyone. Experimental evidence supports the idea that “individuals use different strategies to evaluate propositions depending on whether the hypothesis is desirable or threatening/disagreeable to them.” For example, a much higher proportion of people agree with the proposition that if someone sues you and you win the case the other person should pay your legal costs than with essentially the same proposition that if you sue someone and lose the case you should pay the costs. Cain and Detsky discuss some of the experimental work that demonstrates how people come to have biased opinions. For example, opinion can be biased by the first information encountered on a topic, a conclusion with obvious implications if the first information a physician or researcher learns about a drug is from the pharmaceutical company developing that drug. Experimental evidence also supports the idea that it is difficult to overcome the biases created by the effect of early information on beliefs. This explains why beliefs derived from experimental or epidemiological studies persist even after contradictory evidence from clinical trials provides more compelling contradictory evidence. Cain and Detsky suggest that “physicians have many relationships that may results in bias” — not just those involving pharmaceutical companies and not just those involving money —and warn that “such bias may be difficult to undo.” The same conclusions surely apply to researchers. In another review, Dana and Loewenstein 12 describe the evidence indicating that gifts from industry can create bias. They conclude that self-serving bias prevents individuals from being objective even when they have a motivation to be objective; that instructions given to individuals about bias do not prevent them from becoming biased, suggesting a role for the unconscious in this process; and that self-interest alters the way individuals seek out and assess information.

One of the main strategies used to mitigate the effects of bias related to COI is disclosure. Most peer-reviewed journals require authors to make a COI statement that is often published with the article. The idea behind disclosure is that the reader of the article will be more skeptical about any claims made in the article. In an experimental study, different groups read a manuscript in which a COI was mentioned or not mentioned. Those reading the study with the mention of a COI considered the study to be less interesting and important. 13 However, given the evidence that people do not always know their own minds, these results have limitations. On the basis of a review of the evidence on the effectiveness of disclosing COIs and on an experimental study, Cain and colleagues 14 concluded that disclosure may not always be useful for 2 reasons. First, those declaring a COI may feel entitled to deviate from what they consider objectivity because they have declared a COI. They may also exaggerate to overcome any diminished weight that the reader may put on what they have written. Second, those who read articles in which the author declares a COI may not discount biased information as much as they should because of a tendency to be influenced by information they know they should ignore and possibly because the act of disclosure may make them more likely to place greater weight on the author’s statements given the author’s openness in admitting to the COI. Whatever the reason, in some circumstances disclosure may result in the recipient of the biased information placing greater weight on the biased information.

A recent editorial in Nature Medicine discusses the difference between a perceived and an actual COI. 15 The editorial discusses the fact that the casual reader may consider there is a COI in sponsored content, but that because the “sponsors never have a say on the editorial content of anything [they] publish,” and because the editorial content for supplements is already commissioned before potential sponsors are approached, any COI is apparent rather than real. However, as discussed, humans do not always know their own minds and are not always aware of their own biases. Articles may be commissioned to suit a particular sponsor’s biases even without the person commissioning them being aware of that fact. In my opinion, it is not possible to state categorically that a COI is apparent rather than real.

All those involved in the research literature, including publishers, editors of journals, reviewers of manuscripts and authors, can have COIs. In the rest of this article I will discuss some of the factors that lead to COIs for each of these groups, describe how the Journal of Psychiatry and Neuroscience tries to deal with each of these issues and suggest how the current situation can be improved.

The pervasiveness of COIs in publishing

Publishers are acting with a COI whenever they interfere with the day-to-day management of a journal by the editorial staff. Two extreme versions of this have come to light recently. According to a recent report in the BMJ 16 concerning a court case about the Merck anti-arthritis drug rofecoxib (Vioxx), Elsevier has apologized for the improper publication of Merck-sponsored marketing material “that was made to look like journals.” More details are given in a report in Nature Medicine . 17 In a second case reported in Nature , 18 a computer-generated hoax article was submitted to The Open Information Science Journal published by Bentham Science Publishing. The paper was accepted and the authors were asked to pay US$800 for publication. At this point the authors withdrew their manuscript. The editor-in-chief of the journal, when contacted by Nature, reported that he had not seen the article and stated that he would resign.

Several of the top medical journals are owned by medical associations. As these journals often carry news and opinion items in addition to research reports there may sometimes be a conflict between the opinions of the editor of a journal and those of the officers of the association that owns the journal. Such conflicts have resulted in the departure of the editors of the New England Journal of Medicine , 19 the Journal of the American Medical Association 20 and the Canadian Medical Association Journal ( CMAJ ). 21 The CMAJ is published by the Canadian Medical Association, also the publisher of the Journal of Psychiatry and Neuroscience . The firing of the editor of the CMAJ led to the Canadian Medical Association adopting 25 recommendations of a review panel that enshrine editorial independence in the governance structure of all journals published by the association. 22

Given the cost of publishing, money is an important factor that can lead to COIs for publishers. This is true whether a publisher is for-profit or not-for-profit given that even not-for-profit publishers have to remain financially sound. The costs of publishing must be funded somehow, and the most common sources are journal subscriptions, advertising and publication charges. Advertising by drug companies is common in medical journals, and this is sometimes problematic. Othman and colleagues 23 did a systematic review of articles on advertisements in medical journals that included 24 articles assessing advertisements from journals in 26 countries. Although most of the advertisements made claims that were supported by a systematic review, meta-analysis or randomized controlled trial, some advertisements made claims that were not well supported by evidence. In some countries, most claims were not well supported. Another issue is that advertisements sometimes focus on the newest, most expensive drugs that may not be superior to cheaper alternatives. 24 One point of view is that medical journals should not accept advertising from industries relevant to medicine. 24 The alternatives, subscriptions and publication charges, also have their problems. The money spent on journal subscriptions by university and hospital libraries is not available for other purposes, and publication charges, which are usually paid from research grants, take away money that could otherwise be devoted to research. Thus, there is always a conflict between the publisher’s interest in remaining financially sound and its responsibility to the researchers who provide the manuscripts and read the papers. A recent article in Nature (published by the Nature Publishing Group, a for-profit publisher) on one of the most prominent open-access not-for-profit publishers, the Public Library of Science (PLoS), gives an interesting perspective on publication charges. 25 The title of the article is “PLoS stays afloat with bulk publishing.” The article states that the financial situation of PLoS has improved “thanks to a cash cow in the form of PLoS One, ” which “uses a system of ‘light’ peer review” and has generated substantial amounts of money from author fees. PLoS One reviews only for methodology, not for significance of the results, and minimizes costs by publishing only online. My own perspective is that this is an imaginative innovation that, in addition to being financially sound, may become an important model for publishing research. As the editorial board of Nature knows well, the significance of research is sometimes hard to discern. Nature itself turned down the opportunity to publish the paper by Hans Krebs describing what Krebs called the citric acid cycle and everyone else calls the Krebs cycle. 26 , 27 The issue with publication charges, as with advertising, is how the COI is addressed. Policies related to advertising in medical journals are usually available, and a recent review summarizes some of those policies from 9 of the top medical journals. 24

Publishers are capable of finding surprising ways to act inappropriately in the face of COIs. According to a recent report in the BMJ , Elsevier offered $25 gift cards to academics to encourage them to post favourable reviews of the academic textbook Clinical Psychology , although subsequently Elsevier admitted this was a mistake. 28

The Journal of Psychiatry and Neuroscience is an open-access journal that has no publication charge. Its main source of revenue is advertising in the print edition). The policies that govern advertising in the journal are available on the Canadian Medical Association website ( www.cma.ca/index.cfm/ci_id/25274/la_id/1.htm ). For me as an editor, the important issues are that I have no contact with those who obtain advertising for the journal and do not know what advertisements will appear in any issue. The administrative staff ensures that advertisements do not appear in inappropriate places (e.g., an advertisement for an antidepressant next to an article on depression or antidepressants).

Conflicts of interest for editors are usually taken to mean conflicts related to funding from industry, and the Journal of Psychiatry and Neuroscience is among those journals that publishes this information on the journal website ( www.cma.ca/jpn ). However, in my opinion non-financial issues are probably more important. Every editor wants his or her journal to be a success. The measure of success of a journal that has become widely used, but is much criticized, is the impact factor. Acting in a way that will increase the impact factor of a journal is not always entirely compatible with the professional responsibilities of an editor.

The impact factor for a journal is based on the rate at which articles in the journal are cited. For example, the impact factor for 2008 is the sum of citations in 2008 to articles published in the journal in 2006 and 2007, divided by the number of articles published in 2006 and 2007. The number of citations a paper received can certainly be an indication of its importance. However, the relation between citations and importance is not a tight one. Obviously papers in a popular field will tend to receive more citations than those in a less popular field, irrespective of quality. This is an issue of some concern. In an important paper on “Why most published research findings are false,” Ioannidis 29 discusses some of the factors that lead to false findings. He points out that, from a theoretical perspective, “the hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.” Pfeiffer and Hoffmann 30 have provided some empirical support for this prediction. In biological psychiatry research, one popular area is psychiatric genetics. Unfortunately, associations that are reported between particular gene polymorphisms and disorders or symptoms are often not replicated or confirmed by meta-analyses. 31 , 32 The false discovery rate may be as high as 95%. 33 Interestingly, genetic association studies published in journals with a high impact factor are more likely to provide an overestimate of the true effect size owing in part to small sample sizes. 34 The International Journal of Neuropsychopharmacology demonstrated an interesting approach to the problem of non-replication in psychiatric genetic studies when it published a paper on the interaction between the 5-HTTLPR serotonin transporter polymorphism and environmental adversity and the risk for depression and anxiety. 35 In the same issue there was a review on the lack of replication in such genetic studies that suggested the former paper might provide “further evidence that the literature to date is compatible with chance findings.” 36 All judgments about the quality of research papers are subjective. Nonetheless, an editor who selects for publication a psychiatric genetic study with a relatively small sample size and a level of significance not much better than 0.05 over an innovative and methodologically sound manuscript dealing with a topic that is not currently popular may be helping to enhance the impact factor of the journal at the expense of its scientific quality.

One direct way in which editors can manipulate impact factors is by altering the timing of publication of papers. If, for example, a paper that is likely to be highly cited was published in the December 2010 issue of a journal, citations that would contribute to the 2011 impact factor would have to occur within between 1 and 13 months after publication, but citations are unlikely to occur within 6 months of publication. If the same paper were published in January 2011, citations that occur between 12 and 24 months after publication would contribute to the 2012 impact factor. Thus, publishing papers that are likely to have a high citation rate early in any year will help to inflate the impact factor of a journal. Obviously this is unfair to authors if the publication of their paper is delayed, and I am not aware if it ever occurs. Nonetheless there is evidence that some editors do take estimated citation rates into account when making decisions. Chew and colleagues 37 analyzed impact factor trends for medical journals and interviewed the editors. They concluded that rising impact factors were due to deliberate editorial practices in spite of the editors’ dissatisfaction with impact factors as the measure of the quality of a journal. One quotation from an editor is particularly salient: “our basis for rejection is often ‘I don’t think this paper is going to be cited.’” It is not clear from this quotation whether the editors would reject a manuscript because they thought the citation rate was more important than the quality of the science or because they equated the citation rate with the quality of the science.

Not all COIs for editors are related to impact factors. The desire of editors to please authors by having a manuscript reviewed as quickly as possible, thereby encouraging authors to submit further manuscripts, can be in conflict with getting excellent reviews. The assertion by Ioannidis 29 that much of what is in research journals is false can only be correct if standards of reviewing are not very good. Unfortunately this idea is supported by research. In a test of what errors peer reviewers detect, reviewers detected an average of 2.6 of 9 major errors in test manuscripts, and this number was not improved after reviewer training. 38 Serious statistical errors are common even in some high-profile journals. 39 The best peer reviewers are usually busy people who will not necessarily be able to produce reviews promptly, and adding an expert statistical review to the content reviews may increase the time needed to review a manuscript. However, it is not possible to say to what extent, if at all, the poor standards of reviewing are due to the desire of some editors to speed up the process of review at the expense of the quality of the reviews.

Conflicts of interest for editors may also arise from the publication of supplements, the publication of papers by an editor, and the non-adherence to important guidelines for reporting. Journal supplements, which are often subsidized by the pharmaceutical industry, can help improve the financial standing of a journal, which is often a concern for editors and publishers. However a study concluded that manuscripts “published in journal supplements are generally of inferior quality compared with articles published in the parent journal.” 40 Editors can legitimately publish a peer-reviewed article in the journal they edit as long as the manuscript undergoes peer review that is as thorough as all other manuscripts, and the member of the editorial board overseeing the peer review does his or her best to ensure that any bias in the assessment of the manuscript is minimized. This may not always be so. Nature recently reported on the editor of a theoretical physics journal who was facing growing criticism after publishing nearly 60 papers in 1 year in the journal he edited. 41 In terms of guidelines for reporting, many journals adhere to the statement of the International Committee of Medical Journal Editors ( www.icmje.org/publishing_10register.html ). This requires that to be considered for publication clinical trials must be registered in a public trials registry at or before the onset of patient enrolment. However, some well-known journals in biological psychiatry publish the results of clinical trials without giving any information about trial registration, suggesting that the trials may not have been registered. One possible explanation for this is that the editors value the citations received by clinical trials, which are often highly cited, more than adherence to the trial registration policy.

Among the policies that the Journal of Psychiatry and Neuroscience has adopted to minimize any effect of editors’ COIs are reporting of financial COIs of editorial board members on the journal website, publishing peer-reviewed papers in the order in which they were accepted (with the exception of including short commentaries on topical subjects or moving a shorter paper forward when a longer paper will not fit the page allotment of the journal), giving all published papers that contain statistics a full review by a statistician, not publishing supplements, ensuring that all papers from members of the editorial board go through full peer review and adhering to guidelines such as the registration of clinical trials.

Conflicts of interest for reviewers are, in part, similar to those for authors. If a manuscript discusses medications and a reviewer has some connection with a pharmaceutical company that is involved with any medication mentioned in the manuscript or a drug of the same class produced by a competitor, this COI should be mentioned to the editor; the Journal of Psychiatry and Neuroscience asks reviewers to mention any COI to the editor. Other COIs for reviewers are less clear and are, in my experience, seldom mentioned. These include any possible personal relationship (positive or negative) with any of the authors of a manuscript and professional rivalry owing to the reviewer and authors researching similar topics. Reviewers have their own biases based on their own research approaches. In my experience, if a reviewer recommends that the authors cite an additional reference, more often than not it is to one of the reviewer’s own papers, and the recommendation is not always appropriate. An important COI for reviewers is the conflict between the professional obligation to produce a well thought-out review in a timely manner and the desire not to spend too much time on a task that is relatively thankless. Reviewers seldom read the instructions on what is required in a review. The editor of Obstetrics and Gynecology inserted the following sentence in the middle of a paragraph of instructions for reviewers: “If you read this and call or fax our office, we will send you a gift worth 20 dollars.” 42 The response rate was 17%. A minority of reviewers who agree to review a manuscript never submit their reviews or clearly do not devote the time needed to their reviews. The latter is readily apparent when, for example, a reviewer’s assessment includes factual errors about the design of the study. Behaviours like this inconvenience editors and can adversely impact authors by delaying decisions on manuscripts.

Little research has been done on the factors that influence reviewers’ decisions, and more is needed so that editors can take into account possible biases in reviewers’ assessments. As mentioned, reviewers miss many important flaws in manuscripts, and training does not improve this situation. 38 In ecology research, recommendations to reject are not influenced by age, but those who have more papers in high-impact journals recommend rejection of manuscripts at up to twice the rate of reviewers with few or no papers in high-impact journals. 43 Although this is an indication of different biases among different authors, it does not necessarily reflect a COI.

The COIs of authors include those conflicts that have potential to affect how the research was conducted and interpreted as well as those that influence how it is presented, which is why financial COIs for authors are an important issue. A review of studies on the extent, impact and management of financial COIs reported a significant association between industry sponsorship and pro-industry outcomes in published papers and concluded that financial ties between industry and academia influence biomedical research in important ways. 44 This is consistent with the idea discussed earlier in this editorial that admitting to a financial COI does not necessarily deal with the bias that the financial COI creates. The issue of what exactly constitutes a financial COI can be complex. The website of the National Institutes of Health (NIH) in the United States on frequently asked questions about financial COIs is more than 5,000 words long (http ://grants.nih.gov/grants/policy/coifaq.htm#c1). However, the bottom line is that NIH requires anything over $10 000 per year to be declared. This may seem high to some, but GlaxoSmithKline recently announced that they would limit the advisory payments and honoraria it gives to US doctors to (only?) $150 000 per year. 45 Some journals require any financial COI to be declared, no matter how small. Although payment of a $500 honorarium may not create as big a bias as a $50 000 consultant payment, it is unrealistic to think that researchers might mention the exact amount of payments when declaring COIs.

Because financial COIs have been the subject of many recent articles, this editorial focuses on other COIs that authors should be attempting to deal with. The first, and by no means trivial, COI that is an issue for the vast majority of authors is the pride and sense of ownership that authors take in the work they submit for publication. This presumably is responsible for the fact that when authors were interviewed about their published papers “important weaknesses were often admitted on direct questioning but were not included in the published article.” 46 Certainly editors are used to asking authors to mention the limitations of their studies and to be more cautious about the implications of the research. Another related factor is the desire for researchers to advance their careers and get recognition from their peers. Research suggests that social and monetary reward may work through both psychological 47 and neuroanatomical processes 48 that overlap to some extent. The big difference in relation to COIs is that social rewards, unlike monetary rewards, cannot be disclosed in any meaningful way.

In some situations COIs can arise because all the authors need to take responsibility for the content of a manuscript. If an author is included who does not fulfill the requirements of the International Committee of Medical Journal Editors for authorship ( www.icmje.org/ethical_1author.html ), then both that person and the other authors are not fulfilling their professional obligations. Another related problem is that of ghost authorship (i.e., when someone who was not involved in the work, often a pharmaceutical company employee, writes a manuscript but does not appear as an author; see Ross and colleagues 49 ). Ghostwriting may be part of a pharmaceutical company effort to promote products through “carefully orchestrated campaigns to pass off sympathetic, if not biased, research and review articles as the work of academic scientists rather than of their own contracted employees.” 50 Finally, there may be conflict among the different authors in how to present and interpret the results of a study. Attempts to resolve these issues are not always successful. Interviews of authors of papers published in The Lancet revealed that individual authors often disagreed with opinions expressed in the papers and that the papers revealed “evidence of (self)-censored criticism, obscured meaning, and confused assessment of implications.” 46 Overall, the evidence suggests that non-monetary COIs can create similar problems to monetary ones.

The Journal of Psychiatry and Neuroscience asks all authors to sign a statement about any financial COIs they may have, state what role they played in the research and writing of the manuscript, state whether they approved the final version of the manuscript, and indicate whether there was anyone involved in writing the manuscript who was not an author.

In spite of all the problems created by bias and COIs, research continues to advance. However, the speed of the advance might be enhanced if these problems could be reduced. Obviously there needs to be better training and mentoring of scientists concerning COIs and bias. Unfortunately, a recent study on the effects of mentoring and training in responsible conduct of research concluded that these interventions have the potential to influence behaviour in ways that can both increase and decrease the likelihood of problematic behaviour. 51 More research on effective training and mentoring techniques is needed urgently. Fortunately, some relevant information is available in the psychology literature. In experimental studies, for example, asking participants to consider the opposite of their own opinion was more effective in reducing their biases than asking them to be as fair and unbiased as possible without giving them a specific strategy to achieve this aim. 52

The investigations of Senator Charles Grassley have intensified the debate about sources of bias in the literature and how they may be reduced. However, the debate has focused rather narrowly on money and the objective of a literature relatively free of bias remains a pious but distant hope.

Competing interests: None declared (if you consider competing interests to be limited to financial ones).

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 14, Issue 8
  • Association between different levels of suppressed viral load and the risk of sexual transmission of HIV among serodiscordant couples on antiretroviral therapy: a protocol for a two-step systematic review and individual participant data meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Pascal Djiadeu 1 ,
  • Housne Begum 1 ,
  • Chris Archibald 1 ,
  • Taline Ekmekjian 2 ,
  • Giovanna Busa 1 ,
  • Jeffery Dansoh 1 ,
  • Phu Van Nguyen 1 ,
  • Annie Fleurant 1
  • 1 Sexually Transmitted and Blood Borne Infections Surveillance Division , Public Health Agency of Canada , Ottawa , Ontario , Canada
  • 2 PHAC Library, Office of the Chief Science Officer , Public Health Agency of Canada , Ottawa , Ontario , Canada
  • Correspondence to Dr Pascal Djiadeu; sti.secretariat-its{at}phac-aspc.gc.ca

Introduction HIV is a major global public health issue. The risk of sexual transmission of HIV in serodiscordant couples when the partner living with HIV maintains a suppressed viral load of <200 copies of HIV copies/mL has been found in systematic reviews to be negligible. A recent systematic review reported a similar risk of transmission for viral load<1000 copies/mL, but quantitative transmission risk estimates were not provided. Precise estimates of the risk of sexual transmission at sustained viral load levels between 200 copies/mL and 1000 copies/mL remain a significant gap in the literature.

Methods and analysis A systematic search of various electronic databases for the articles written in English or French will be conducted from January 2000 to October 2023, including MEDLINE, Embase, the Cochrane Central Register of Controlled Trials via Ovid and Scopus. The first step of a two-step meta-analysis will consist of a systematic review along with a meta-analysis, and the second step will use individual participant data for meta-analysis. Our primary outcome is the risk of sexual HIV transmission in serodiscordant couples where the partner living with HIV is on antiretroviral therapy. Our secondary outcome is the dose-response association between different levels of viral load and the risk of sexual HIV transmission. We will ascertain the risk of bias using the Risk Of Bias in Non-randomised Studies of Interventions (ROBINS-I) and Quality in Prognostic Studies (QUIPS), the risk of publication bias using forest plots and Egger’s test and heterogeneity using I 2 . A random effects model will estimate the pooled incidence of sexual HIV transmission, and multivariate logistic regression will be used to assess the viral load dose-response relationships. The Grading of Recommendations, Assessment, Development and Evaluation system will determine the certainty of evidence.

Ethics and dissemination The meta-analysis will be conducted using deidentified data. No human subjects will be involved in the research. Findings will be disseminated through peer-reviewed publications, presentations and conferences.

PROSPERO registration number CRD42023476946.

  • HIV and AIDS
  • Sexually Transmitted Disease
  • Public health
  • Systematic Review
  • Epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2023-082254

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THE STUDY

The proposed individual participant data (IPD) meta-analysis will be conducted using raw data from individual participants with advantages, including greater quantity of data, more flexibility in analytic approaches, the ability to conduct subgroup analyses and improved ability to detect and address biases.

This innovative two-step meta-analysis will still collect and synthesise evidence and answer the research questions even if the IPD part is not feasible.

Studies collected in the review may have differences in the timing and frequency of viral load testing, adherence to antiretroviral therapy and patient follow-up, causing imprecision within the data.

There may be insufficient data across the full range of viral load levels to fully assess the association and/or potential lack of agreement by authors to share data for the IPD meta-analysis.

Introduction

Globally, an estimated 39 million people are currently living with HIV (PLHIV), of whom 29.8 million (76%) are on treatment and 21.2 million (54% of all PLHIV and 71% of PLHIV on treatment) are living with suppressed HIV. 1 Antiretroviral therapy (ART) can improve the lives of PLHIV and help protect their sexual partners from sexual HIV transmission. People who are on HIV treatment can achieve an undetectable viral load with effectively no risk of transmitting HIV to their sexual partners. 2 This concept is referred to as Undetectable equals Untransmittable, or U=U, 3 and it was initiated in 2016 by the Prevention Access Campaign, a health equity initiative with the goal of ending the HIV/AIDS pandemic and associated HIV-related stigma. 4 The U=U concept is based on a substantial body of scientific evidence demonstrating that for PLHIV who have achieved a sustained suppressed and undetectable viral load, there is effectively no risk of sexual HIV transmission. 5 Furthermore, treatment as prevention is one of the effective strategies to prevent HIV transmission, with high uptake of ART suggested as an effective approach to reduce HIV incidence. 6 7

A systematic review and meta-analysis published by the Public Health Agency of Canada (PHAC) in 2018, concluded, using criteria defined by the Canadian AIDS Society framework to characterise HIV transmission risk, 8 that the risk of sexual transmission of HIV is negligible when the PLHIV is on ART with a suppressed viral load of <200 copies of HIV RNA/ml with consecutive testing every 4–6 months. 2 A rapid review published by the Canadian Agency for Drugs and Technologies in Health (CADTH) in 2023 as well as a 2023 PHAC rapid communication confirmed these findings, with the PHAC report providing an estimated risk of HIV sexual transmission of 0.00 transmissions per 100 person-years (95% CI 0.00 to 0.10) in this specific situation. 9 10 In 2023, a systematic review by Broyles et al 11 concluded that the risk of sexual transmission of HIV is almost zero when the PLHIV is under ART and has a suppressed viral load of <1000 copies of HIV RNA, 11 but no quantitative risk estimate was calculated. Furthermore, the WHO concluded in its 2023 policy brief that PLHIV who have a suppressed but detectable viral load have almost zero or negligible risk of sexual transmission of HIV to their partner as long as they continue to take their ART as prescribed. 12 The WHO also revised the operational definition for undetected viral load from ‘≤ 50 copies/ml’ to ‘not detected by the test or sample type used’ and suppressed viral load from ‘≤200 copies/ml’ to ‘≤1000 copies/ml’ and recommended a viral suppression threshold of 1000 copies/mL because persistent viral load levels above 1000 copies/mL are associated with treatment failure. 12

Most of the literature demonstrating that a suppressed, undetectable viral load is associated with effectively no risk of sexual HIV transmission uses a viral load threshold of 200 copies/mL. 3 5 Precise estimates of the risk of sexual transmission at sustained viral load levels between 200 and 1000 copies/mL remain a significant gap in the literature. Addressing this gap by quantifying these risks is needed to evaluate the strength of the association between different viral load levels and the risk of HIV transmission, and to better understand considerations of viral load levels with respect to HIV treatment and prevention programmes.

The primary objective of this review is to quantify the risk estimate of HIV transmission and determine the association between different levels of viral load (primarily in the range 200–1000 copies/mL) and the risk of sexual HIV transmission among serodiscordant couples where the PLHIV is on ART.

The specific hypotheses include: (a) there will be a significant difference in the risk of sexual HIV transmission between viral load levels and (b) there will be a dose-response relationship between different viral load levels (200 copies/mL, 400 copies/mL, 1000 copies/mL or >1000 copies/mL) and risk of sexual HIV transmission.

Research questions

Q1 : What is the risk of sexual transmission of HIV with suppressed viral load<1000 copies/ml and at different levels of viral load>1000 copies/ml?

Q1.1: What is the risk of sexual HIV transmission in serodiscordant couples when the PLHIV is on ART with different levels of suppressed viral load between 200 to 1000 copies/ml (new potential evidence on risk of HIV transmission with viral load<200 copies/ml will also be assessed and reported if available)?

Q1.2: What is the risk of sexual HIV transmission in serodiscordant couples when the PLHIV is on ART with different levels of viral load>1000?

Q2 : Is there a dose-response association between different levels of viral load and the risk of sexual HIV transmission?

Methods and analysis

Patient and public involvement.

In designing this meta-analysis protocol, neither patients nor public were involved.

Protocol guidance and registration

This systematic review will follow a two-step meta-analysis approach. First, a systematic review and meta-analysis will be conducted. Second, an individual participant data (IPD) meta-analysis will be performed if feasible. IPD is considered as the gold standard of reviews and has several advantages compared with aggregate data systematic reviews and meta-analyses. These advantages include a greater quantity of data, the ability to standardise outcomes across trials, more flexibility in analytic approaches, the ability to conduct subgroup/moderator analyses and an enhanced ability to detect and address biases. 13 This protocol is based on the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol (PRISMA-P) statement 14 ( online supplemental appendix table 1 ). This systematic review and meta-analysis will follow the methodology outlined in the Cochrane Handbook for Systematic Reviews of Intervention. 15 16 The reporting of results will follow the PRISMA 2020 and PRISMA-IPD meta-analysis guidelines ( online supplemental appendix tables 1,2 ). 17 18 IPD meta-analysis will be done using data from studies already published or ongoing on this topic. Although such an approach would produce the best theoretical result, there are some limitations with this method, 19 namely the potential for insufficient data across the full range of viral load levels and/or potential lack of agreement by authors to share such data. Based on the results of the included studies from full-text screening of studies, we will be selecting collaborators for IPD requests. If the proposed IPD meta-analysis is not possible, the systematic review will assess the extent to which these research questions can be answered from existing published literature alone.

Supplemental material

Protocol registration.

This study has been registered with the International Registration of Systematic Reviews (PROSPERO) on 11 November 2023 with the registration number CRD42023476946 . Any future changes or modifications to the review procedures will be documented and updated to the PROSPERO registration.

Eligibility criteria and type of study

Original studies (randomised controlled trials and non-randomised studies), case reports and conference abstracts will be included if they report on longitudinal studies of couples with one partner living with HIV and document the number of HIV infections in previously seronegative sexual partners and provide information about viral load levels in the HIV-seropositive partner and/or use of ART. For studies that report any HIV infections in the seronegative partner, they will need to be linked to the partner living with HIV through phylogenetic analysis to rule out infection from outside the couple. Considering the difficulty of doing individualised randomisation in public health interventions, cluster Randomized Controlled Trials (RCTs) and quasi-experimental studies with self-control will also be considered for inclusion. Studies reporting a sex partner living with HIV who takes ART and has a viral load measurement provided will be included. Articles written in English and French will be retrieved from electronic English and French databases with full-text access, and published within the timeframe of 1 January, 2000 to Oct 2023 will be included. 20 Studies involving condom use or pre-exposure prophylaxis will be excluded. Studies where HIV is not primarily transmitted through sex will also be excluded. Reviews, editorials, letters and conference proceedings without detailed results will be excluded. Search types and patterns are featured in online supplemental appendix 2 .

Participants, type of interventions and outcomes of interest

  • View inline

Information sources and search strategy

A comprehensive and systematic search of the following databases will be conducted: MEDLINE, Embase, the Cochrane Central Register of Controlled Trials via Ovid and Scopus. The search strategy, developed by a health information professional in collaboration with the other authors, uses text words and relevant indexing to identify studies on viral load, ART and transmission of HIV between serodiscordant couples. The MEDLINE search strategy (see Appendix) will be applied to all databases with appropriate modifications. The search will be limited to publications from January 2000 to 2023.

In addition, a thorough examination will be performed of the reference lists of identified relevant studies, experts in the field of HIV sexual transmission will be contacted to identify any additional studies or results, and ClinicalTrials.gov and International Clinical Trials Registry Platform will be examined to identify planned, ongoing or unpublished trials. To retrieve any grey literature, Google Scholar and Baidu Scholar will also be searched. Clinical trial registries will also be searched, including the US National Institutes of Health’s clinicaltrials.gov and Health Canada’s Clinical Trials Database. Search types and patterns are featured in online supplemental appendix table 3 .

Study selection

Articles will be imported and deduplicated using EndNote20 (Clarivate, Philadelphia, Pennsylvania, USA) and then imported into Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia) for screening. Reviewers (GB, JD and PVN) will do pilot screening with a sample of 100 abstracts to ensure consistency of use and clarity of the inclusion and exclusion criteria. To measure the inter-rater reliability, a Cohen’s kappa statistic will be used. Screening will begin when >70% agreement is achieved. 21 In duplicate, the authors (GB, JD and PVN) will conduct all screening, data extraction and quality assessment procedures. Disagreements will be resolved by consensus. Situations where consensus cannot be reached will be resolved by a third author who will arbitrate (PD and HB). Eligible articles identified by title and abstract screening based on inclusion criteria will be selected for full-text screening. Two independent reviewers will review the full texts. References of the included studies will be hand searched to identify additional relevant studies for inclusion. Conflicts between reviewers will be resolved through discussion, and if no resolution can be achieved, a third reviewer (PD and HB) will be consulted. In case of missing data or information, authors will be contacted.

A third reviewer (PD and HB) will confirm the excluded publications and their respective reasons for elimination. A PRISMA flow chart adapted from the PRISMA 2020 and the PRISMA IPD flow diagram ( figure 1 ) 17 18 will be used to show the process of study selection.

  • Download figure
  • Open in new tab
  • Download powerpoint

PRISMA flow diagram. IPD, individual participant data; PRISMA, Preferred Reporting Items for Systematic Review and Meta-Analysis.

Data extraction and management

After the full-text screening and study selection process, the selected studies will undergo data extraction, wherein information from the studies will be extracted after a thorough reading of the full text. The list of variables to be extracted is presented in table 1 . The data extraction form will be created using Microsoft Excel 2016. Data extraction will be conducted by two independent reviewers using the designed data extraction form. Following this process, the records extracted by the reviewers will be cross-checked, and any disputed points will be resolved through a third reviewer (PD and HB).

List of variables for data extraction.

Risk of bias assessment

For non-RCTs, the ROBINS-I (the Risk Of Bias in Non-randomised Studies of Interventions) tool will be used by the reviewers to determine the quality of the study. The ROBINS-I tool is concerned with evaluating the risk of bias in estimates of the effectiveness or safety (benefit or harm) of an intervention from studies that did not use randomisation to allocate interventions. 22 This will influence how the data are interpreted. For prognostic studies, QUIPS tool will be used. 23 Biases will be measured as ‘critical risk’, ‘serious risk’, ‘moderate risk’, ‘low risk’ and ‘no information’.

The risk of publication bias will be assessed by visual inspection of funnel plots and using the Egger’s test (with 10 or more included articles). 24

Data synthesis

Descriptive statistics from included studies will be extracted and summarised in tables. When there is a difference in data units across studies, we will perform data conversion for the meta-analysis. The main statistical analysis of the study will involve two steps:

Meta-analysis using data extracted from included studies

Incidence data will be summarised for meta-analysis. A pooled estimate of the incidence of sexual HIV transmission will be generated and reported with 95% CI. Heterogeneity will be examined using 25 26 the I 2 and the H² statistics since they both relate to the percentage of variability that is due to true differences between studies (heterogeneity). I² will be quantified as low (≤25%), moderate (25%–50%) or high (>50%). Fixed-effect model will be used for heterogeneity <50%. Where heterogeneity is >50%, we will use the random-effect model to examine the association between varying viral loads and risk of HIV transmission among serodiscordant couples and create summary forest plots.

The variation for moderate or higher heterogeneity will be explored by conducting meta-regression and sensitivity analyses, including sample size, study year and demographic characteristics, or excluding studies to examine heterogeneity. Furthermore, we will also attempt to explain the heterogeneity by conducting subgroup analyses to compare the risk of HIV transmission between groups, including gay, bisexual and other men-who-have-sex-with-men (gbMSM), women who have sex with women and heterosexuals.

The presence of publication bias will be assessed using a funnel plot and Egger’s test, provided we have at least 10 studies included in the meta-analysis. 24

Statistical analysis using IPD

A data sharing agreement will be established outlining the nature of the project, collaboration and responsibilities of each party. Deidentified and anonymised participant data will be confidentially collected from collaborators. Descriptive analyses will be performed to examine the participants’ demographic characteristics.

We will analyse all the studies separately to compare our results with the original study. Any discrepancies will be resolved. Analysis will include all study participants following the intention-to-treat approach. Summary statistics will be presented as mean (SD) or median (IQR) for continuous variables and per cent for categorical variables. Effect size will be computed for different thresholds of viral load. χ 2 test will be used to evaluate the association of viral load to the risk of sexual HIV transmission by comparing the various viral load levels. We will also compute ORs and corresponding 95% CI to assess the strength of the association of viral load to the risk of sexual HIV transmission. The level of statistical significance α will be 0.05 for all tests. An individual random-effect meta-analysis will be conducted to determine the overall effect of viral load on sexual HIV transmission. Furthermore, a multivariable logistic regression for binary outcomes will be done to predict the risk of HIV transmission among serodiscordant couples at different levels of viral load at the baseline level from each study. Additional adjustments with sociodemographic characteristics, including age, sex, education and location, will also be included. Effect sizes and standard errors can be obtained from this analysis including covariate adjustment which could potentially address bias concerns.

Additionally, viral load levels will be categorised into a contingency table to investigate whether different viral load categories are associated with different levels of HIV transmission risk among sexual partners. A dose-response relationship will also be examined between different viral load levels in PLHIV and the incidence of HIV among their partners using multivariate logistic regression and incidence frequencies of sexual HIV transmission.

All analyses will be done in R V.4.2.3, REVMAN and SPSS V.28 as needed.

Missing data

Missing data will be addressed depending on the specific characteristics of the missing data. An effort will be made to discuss with collaborating teams the possibility of collecting missing data from their studies. If the data are missing completely at random for the entire study, a list-wise or pair-wise deletion to obtain valid and complete cases will be performed. However, this step may reduce the sample and power of the study. For the remaining non-random missing data, multiple imputations by chained equations will be used. 27 In this method, missing data is computed on a case-by-case basis. A regression model will also be conditionally applied to the other variables in the dataset.

Certainty of evidence

Summary of findings will be presented via tables, including tables for each of the prespecified outcomes (eg, number of cases of HIV transmission). The Grading of Recommendations, Assessment, Development and Evaluation will be used to assess the certainty of evidence considering the bias risk of the trials, consistency of effect, imprecision, indirectness, publication bias, dose response and residual confounding. 28

Ethics and dissemination

The meta-analysis will be conducted using deidentified and anonymised data. No human subjects will be directly involved in this research. Dissemination of results of this review will be done through peer-reviewed publications and presentations, as well as international conferences.

We understand that effort, resources and international cooperation are required to perform meta-analysis based on IPD. We will produce a meta-analysis based on the number of collaborators interested in this review and the quality of data collected. We will attempt to establish quantitative risk estimates of sexual HIV transmission at viral load levels between 200 copies/mL and 1000 copies/mL and potentially also at levels >1000 copies/mL. This two-step systematic review (SR) and individual participant data (IPD) meta-analysis will also evaluate the strength of the association of viral load to the risk of sexual HIV transmission. The findings of this SR and IPD meta-analysis will help patients, researchers and policymakers to better understand the risk of sexual HIV transmission in the context of ART and the associated considerations for HIV treatment and prevention programmes.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • LeMessurier J ,
  • Traversy G ,
  • Varsaneux O , et al
  • Eisinger RW ,
  • Dieffenbach CW ,
  • Prevention Access Campaign
  • Martin-Blondel G ,
  • Vu Hai V , et al
  • Sansom SL ,
  • Wolitski RJ , et al
  • Lockman S ,
  • Ayles H , et al
  • Canadian Aids Society
  • Khangura S ,
  • Subramonian A ,
  • Djiadeu P ,
  • Sabourin S , et al
  • Broyles LN ,
  • Boeras D , et al
  • Tierney JF ,
  • Riley R , et al
  • Shamseer L ,
  • Clarke M , et al
  • Higgins JPT ,
  • Chandler J , et al
  • Altman DG ,
  • Gøtzsche PC , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Stewart LA ,
  • Rovers M , et al
  • Broeze KA ,
  • Opmeer BC ,
  • van der Veen F , et al
  • Phillips EJ
  • Sterne JAC ,
  • Hernán MA ,
  • McAleenan A , et al
  • Grooten WJA ,
  • Äng BO , et al
  • Davey Smith G ,
  • Schneider M , et al
  • Julian PT ,
  • Thompson SG ,
  • Deeks JJ , et al
  • Schafer JL ,
  • Schünemann HJ ,
  • Vist GE , et al

Contributors PD, HB, CA and AF participated in the conception and design of the study. PD, HB, CA, and TE developed the search strategy and assessed the feasibility of the study. PD, HB, GB, JD and PVN wrote the manuscript. CA improved the manuscript. PD, HB, CA and AF are the guarantors. All the authors critically reviewed this manuscript and approved the final version.

Funding This research was funded, conducted and approved by the Public Health Agency of Canada.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

COMMENTS

  1. Eight problems with literature reviews and how to fix them

    Traditional approaches to reviewing literature may be susceptible to bias and result in incorrect decisions. This is of particular concern when reviews address policy- and practice-relevant questions.

  2. Avoiding Bias in Selecting Studies

    Gray literature can provide evidence on publication bias and outcomes reporting bias; EPCs should use processes similar to those used with published literature in reviewing gray literature to avoid potential bias in selecting unpublished studies or data.

  3. Eight problems with literature reviews and how to fix them

    Traditional approaches to reviewing literature may be susceptible to bias and result in incorrect decisions. This is of particular concern when reviews address policy- and practice-relevant questions. Systematic reviews have been introduced as a more rigorous approach to synthesizing evidence across …

  4. Assessing the Risk of Bias in Systematic Reviews of Health Care

    Risk-of-bias assessment is a central component of systematic reviews but little conclusive empirical evidence exists on the validity of such assessments. In the context of such uncertainty, we present pragmatic recommendations that can be applied consistently across review topics, promote transparency and reproducibility in processes, and address methodological advances in the risk-of-bias ...

  5. Writing a literature review

    Writing a literature review requires a range of skills to gather, sort, evaluate and summarise peer-reviewed published data into a relevant and informative unbiased narrative. Digital access to research papers, academic texts, review articles, reference databases and public data sets are all sources of information that are available to enrich ...

  6. Chapter 7: Considering bias and conflicts of interest among the

    Address conflict of interests in included trials, and reflect on possible impact on: (a) differences in study design; (b) risk of bias in trial result, and (c) risk of bias in synthesis result. Review authors should consider assessing whether they judge a trial to be of 'notable concern about conflicts of interest'.

  7. Identifying and Avoiding Bias in Research

    Abstract. This narrative review provides an overview on the topic of bias as part of Plastic and Reconstructive Surgery 's series of articles on evidence-based medicine. Bias can occur in the planning, data collection, analysis, and publication phases of research. Understanding research bias allows readers to critically and independently review ...

  8. Tools for assessing risk of reporting biases in studies and syntheses

    Tools varied in regard to the type of reporting bias assessed (eg, bias due to selective publication, bias due to selective non-reporting), and the level of assessment (eg, for the study as a whole, a particular result within a study or a particular synthesis of studies).

  9. Research Techniques Made Simple: Assessing Risk of Bias in Systematic

    Research Techniques Made Simple. Systematic reviews are increasingly utilized in the medical literature to summarize available evidence on a research question. Like other studies, systematic reviews are at risk for bias from a number of sources. A systematic review should be based on a formal protocol developed and made publicly available ...

  10. Systematic review of publication bias in studies on ...

    Publication bias is a well known phenomenon in clinical literature,1 2 in which positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Conclusions exclusively based on published studies, therefore, can be misleading.3 Selective underreporting of research might be more widespread and more likely to have adverse ...

  11. 8 common problems with literature reviews and how to fix them

    Literature reviews are an integral part of the process and communication of scientific research. Whilst systematic reviews have become regarded as the highest standard of evidence synthesis, many literature reviews fall short of these standards and may end up presenting biased or incorrect conclusions. In this post, Neal Haddaway highlights 8 common problems with literature review methods ...

  12. Peer Review Bias: A Critical Review

    Various types of bias and confounding have been described in the biomedical literature that can affect a study before, during, or after the intervention has been delivered. The peer review process can also introduce bias. A compelling ethical and moral rationale necessitates improving the peer review process. A double-blind peer review system is supported on equipoise and fair-play principles ...

  13. Types of Bias in Systematic Reviews

    A bias can be introduced in a study at any stage of the process - from formulating the research question, establishing the eligibility criteria for inclusion and exclusion of primary studies, reviewing collected resources, to choosing which findings to publish. The hallmark of a systematic review is a reduced risk of bias.

  14. Minimize Bias

    Minimizing Bias. Multiple types of bias may impact health evidence. The Cochrane Handbook for Systematic Reviews of Interventions (Table 7.2.a)1 provides definitions of non-reporting biases that can be minimized by identifying all relevant literature on a research topic. Note: *Prior to assessing Risk of Bias, consider use of the Research ...

  15. Peer Review Bias: A Critical Review

    Various types of bias and confounding have been described in the biomedical literature that can affect a study before, during, or after the intervention has been delivered. The peer review process can also introduce bias. A compelling ethical and moral rationale necessitates improving the peer revie …

  16. Guidance on Conducting a Systematic Literature Review

    Abstract Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in ...

  17. Revisiting Bias in Qualitative Research: Reflections on Its

    Recognizing and understanding research bias is crucial for determining the utility of study results and an essential aspect of evidence-based decision-making in the health professions. Research proposals and manuscripts that do not provide satisfactory detail on the mechanisms employed to minimize bias are unlikely to be viewed favorably. But what are the rules for qualitative research studies ...

  18. Language bias in systematic reviews: you only get out what you ...

    Many protocols and reviews submitted to JBI Evidence Synthesis limit the search parameters to English only, with authors overwhelmingly stating this is due to the limited resources available. The infrequent exception to this arises from author teams in Europe, South America, and Asia who include at least one additional LOTE (largely based on ...

  19. Guidance to best tools and practices for systematic reviews

    Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal ...

  20. Writing a Literature Review

    Writing a Literature Review A literature review is a document or section of a document that collects key sources on a topic and discusses those sources in conversation with each other (also called synthesis ). The lit review is an important genre in many disciplines, not just literature (i.e., the study of works of literature such as novels and plays). When we say "literature review" or ...

  21. Mapping the field of behavioural biases: a literature review using

    Research on behavioural biases has witnessed a momentous growth in the last two decades, supported by rising interest and publication thrust shown by academic scholars. Present study maps the academic literature on the role of behavioural biases in investment decision-making. With the help of bibliometric tools, the paper highlights the current state-of-the-art and identifies significant gaps ...

  22. The bias beneath: analyzing drift in YouTube's algorithmic ...

    This literature review systematically explores a wide range of scholarly work on the multifaceted nature of biases in digital spaces, algorithmic influences in recommendation systems, and the psychological and ethical dimensions of online content interaction. 3.1 Recommendation bias

  23. Single nucleotide polymorphisms (SNPs) that are associated ...

    Systematic review of search results and risk of bias within the studies. The initial database search identified 11,860 articles from Ovid/Embase, Scopus, and the Cochrane, PubMed, Web of Science ...

  24. Prevention and care of adult enterostomy with high output: a scoping

    Introduction The purpose of this protocol is to investigate the risk factors, critical evaluation contents and preventive measures of high-output enterostomy. Methods and analysis This scoping review will follow the Joanna Briggs Institute guidelines for scoping reviews. PubMed, EMBASE, CINAHL, the Chinese Biological Literature Database and the Cochrane Library will be searched for relevant ...

  25. Machine learning-derived phenotypic trajectories of asthma and allergy

    Introduction Development of asthma and allergies in childhood/adolescence commonly follows a sequential progression termed the 'atopic march'. Recent reports indicate, however, that these diseases are composed of multiple distinct phenotypes, with possibly differential trajectories. We aim to synthesise the current literature in the field of machine learning-based trajectory studies of ...

  26. Assessing the Risk of Bias of Individual Studies in Systematic Reviews

    This document updates the existing Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Center (EPC) Methods Guide for Effectiveness and Comparative Effectiveness Reviews on assessing the risk of bias of individual studies. As with other AHRQ methodological guidance, our intent is to present standards that can be applied consistently across EPCs and topics, promote ...

  27. Impaired glucose metabolism and the risk of vascular events and

    Systematic literature search was performed in PubMed, Embase, Cochrane Library on 21st March 2024 and via citation searching. Studies that comprised IS or TIA patients and exposures of impaired glucose metabolism were eligible. Study Quality Assessment Tool was used for risk of bias assessment.

  28. Medicina

    Background and Objectives: Conventional radiotherapies used in the current management of rectal cancer commonly cause iatrogenic radiotoxicity. Proton beam therapy has emerged as an alternative to conventional radiotherapy with the aim of improving tumour control and reducing off-set radiation exposure to surrounding tissue. However, the real-world treatment and oncological outcomes associated ...

  29. Bias in the research literature and conflict of interest: an issue for

    MacCoun, 6 in an extensive review, examined the experimental evidence about bias in the interpretation and use of research results. He also discussed the evidence and theories concerning the cognitive and motivational mechanisms that produce bias.

  30. Association between different levels of suppressed viral load and the

    Introduction HIV is a major global public health issue. The risk of sexual transmission of HIV in serodiscordant couples when the partner living with HIV maintains a suppressed viral load of <200 copies of HIV copies/mL has been found in systematic reviews to be negligible. A recent systematic review reported a similar risk of transmission for viral load<1000 copies/mL, but quantitative ...