Artificial Intelligence (AI) and the literature review process: Screening

Application of AI tools such as ChatGPT to searching and all aspects of the literature review process

Why using AI for screening?

Addressing manual screening challenges

Screening is the process of selecting studies for inclusion in a review. The process usually involves the removal of duplicate records using reference management software. There are then two stages: (1) checking the title and abstract to remove those studies clearly irrelevant; and (2) obtaining the full-text of the study and applying pre-determined eligibility criteria to determine which studies should be included.

Screening to select studies for inclusion in the review is the most time-consuming aspect of the systematic literature review (SLR) process. It is also subject to error because it is a manual process.

Automating the screening process

Researchers have therefore looked to speed up the process and make it more accurate. Van Dinter et al. (2022) found 41 studies on automating the process, all of which involved the use of machine learning techniques (most often Naive Bayes and Support Vector Machine (SVM) algorithms) to do so. However, the adoption of machine learning techniques has been hindered by a range of issues including initial costs, the difficulty of transferring the models developed for one review to other reviews, extensive training required, limited public availability of clinical data because of data protection and automation bias (Scherbakov et al., 2025).

There were 19 AI tools used for the screening process identified in the survey by Bolaňos et al. (2024) four of which were exclusive to the biomedical literature and which are set up to identify randomised controlled trials. 15 of these tools use artificial intelligence for one task only: to classify whether the study is relevant or irrelevant. One of these tools (Rayyan AI) is detailed below as it is the tool most often mentioned by our researchers. Only four of the tools operate under open licenses (ASReview, Colandr, FAST2, RobotReviewer/Robotsearch).

AI tools for screening

Effectiveness of ChatGPT in Screening

AI tools have also been used to speed up the screening process. Syriani et al. (2023) have shown that ChatGPT can work at the same accuracy as these machine learning techniques without the need for training which gives it (p. 2) "a realistic chance to revolutionize SLR automation". Syriani et al. (2024) then showed that ChatGPT could reach 82% accuracy when given five systematic literature review datasets. They concluded that LLMs were not ready to replace article screening by humans but offered promising solutions by, for example, discarding all the articles that ChatGPT has excluded. The effectiveness of ChatGPT 3.5 in classifying articles when provided with a title and abstract were demonstrated by Alshami et al. (2023) in the context of the application of internet of things to water infrastructure projects. They considered that ChatGPT was outstanding in removing irrelevant articles and most effective in classifying the articles into three main categories. ChatGPT provides justification for its classification, allowing you to evaluate whether its decision is appropriate. Spillias et al. (2024) showed that ChatGPT could function as well as a single researcher when screening a large list of studies. They provided a list of inclusion and exclusion criteria and invited different ChatGPT models to screen 1098 studies to determine eligibility for selection. The selected studies were compared with the selection made by human reviewers. The screening criteria supplied to ChatGPT required modification to improve the agreement with human reviewers. Their research confirmed "the current need for collaborative rather than purely automated AI systems" (Spillias et al., 2024: 7). Thomas et al. (2024) applied ChatGPT 3.5 to the screening process, asking it to assess 2917 studies on indicators of ecosystem condition for relevance based on their title and abstract. It completed this process in 10% of the time taken by expert reviewers. Studies were classified as selected, rejected or uncertain. The choices made by the tool were compared to those made by expert reviewers. One version of the prompt achieved 100% accuracy in selecting studies but did so at the expense of precision, selecting a higher proportion of irrelevant studies than other versions of the prompt. Using a prompt that emphasises and repeats key terms improves the performance of the generative AI tool. Khraisha et al. (2024) used ChatGPT to screen and extract data from documents including non-English language and grey literature that had not been peer-reviewed, for a systematic review on parenting in a protracted refugee situation. They found that "when given highly reliable prompts in full-text screening— GPT-4 demonstrated an “almost perfect” performance, on par with humans" (Khraisha et al., 2024: 623). However, it does have limitations and these prompted the conclusion that ChatGPT should be seen as a secondary reviewer, often erring on the side of rejecting a study, to complement the primary human reviewer. Clear, concise and specific prompts are the key to successful data extraction using AI tools.

Advantages of AI Screening Tools

Applying an AI tool for screening has several advantages (Thomas et al., 2024):

it ensures a consistent interpretation of subject matter, especially where the subject is complex and there is subjectivity in applying inclusion criteria even among experts;
the tool’s responses enable you to clarify terminology and identify which specific phrases might lead to confusion.

Adoption Challenges

Despite the availability of tools to automate the title and abstract screening process, there has been little take up of these tools. One of the reasons is the need for any automation methods used in synthesizing evidence to be freely accessible and transparent to examination. Many of the 18 interviewees in this survey by Arno et al. (2020) emphasized that they were accountable to stakeholders who needed to be sure that information had not been missed and therefore needed to examine freely the methods used.

Example prompts for Chatbots

Prompt 1 (adapted from Alshami et al., 2023):

Categorise each paper based on the information provided into one of four categories (list the categories). Use the details from each paper and do not make assumptions. Provide the results in a table format where the columns represent each category and there is a row for each paper. Include an X in the row under one of the four categories. After the table, explain why you decided on the category for each paper.

When inputting article titles, the prompts were limited to 10 articles at a time because of the prompt limits on the number of tokens. When inputting article titles plus the abstract, prompts were limited to 5 articles at a time.

Rayyan AI

Rayyan AI is a web-based automated screening tool, developed by Qatar Computing Research Institute (QCRI), which launched in 2014.

It uses text mining techniques to identify relevant information using statistical pattern learning that recognizes patterns in the data.

Rayyan AI

Using Rayyan

First steps

Sign up for an individual account with Rayyan.
Check your browser compatibility using https://rayyan.ai/check.
Select My Reviews.
Then select New review.
Give the review a title, select your research field from the drop-down menu, select the review type, select the review domain which is similar to research field, give an optional description and press Create.

This review will now appear in the My Reviews tab when you enter Rayyan.

Exporting references

Rayyan then presents you with an option to import records directly from Mendeley.

For all other reference management software, such as EndNote or Zotero, select Upload references. These need to be in RIS format. You are best to export your references from your reference management software in RIS format.

You will find including the abstract is helpful.

Importing references

Remove any duplicates before you import into Rayyan.
Then choose Select files.
Find your file in your downloads folder.
Click upload then Continue.

You can add new records to your review by selecting the New Search button top right.

Collaboration

The tutorial video from the Rayyan HelpCentre shows how this works in practice and includes how the collaboration functions work.

You can invite colleagues to join you in the review:

Click All reviews, then select the review and click invite.
Your collaborators will then receive an email and can see the review in the Collaboration reviews tab.

Including and excluding articles

Your articles, once imported into Rayyan, appear as undecided in the inclusion decisions box (facet):

Select each article being screened.
Decide whether to include or exclude each article.
When including an article your name will appear beside the article in green.
When excluding an article, select the Reason option and choose from the list of pre-populated reasons.
You can create new reasons for excluding. Your article will then be marked as excluded and a label with your exclusion reason is added to the article in red.
You can select I or E or M as keyboard shortcuts.
Use shift and up/down arrows to select multiple references.

Rayyan keeps a count of numbers of articles tagged with each exclusion reason. Once you have made 50 decisions, you can click on Compute ratings. Rayan's artificial intelligence engine will then compute the probability of each of the remaining records being included or excluded based on the decisions you have already made. Each undecided rating will receive a rating of 1 to 5 where 5 is the most likely to be included.

You can filter by Undecided to show the references where you did not make the decision. The more decisions you make the more accurate Rayyan's AI engine will be as it learns more from each decision.

Evidence

Harrison et al. (2020) reviewed 15 software tools that used machine learning techniques to support abstract and title screening. They would recommend Covidence and Rayyan to systematic reviewers looking for suitable and easy to use tools for screening. The medical researchers included in the survey were all involved with systematic reviews. There were six tools which performed best, all scoring higher than 75% in the feature analysis, and these were included in the user survey. The remaining tools were DRAGON, AbstrackR, Colandr and EPPI-Reviewer.

Rayyan speeds up the screening process of selecting which studies to include in a systematic review. In experiments on a set of 15 reviews, users reported time savings in the order of 40% on average compared to previous tools they had been using. Rayyan’s two most important features compared to its competitors are its abstract and title screening assistance and opportunity for authors to collaborate on the same review (Ouzzani et al., 2016).

Valizadeh et al. (2023) compared the use of Rayyan with human reviewers in the manual screening of >2000 records from three systematic reviews. This was done in four stages. At the end of each, Rayyan was used to predict the eligibility score for the remaining records. Rayyan assigns a star rating to each record. Rayyan proved a reliable tool for excluding ineligible records at the threshold of <2.5 stars for exclusion. The findings were confirmed by Dos Reis et al. (2023) who concluded that the use of software to screen titles did not remove any that should have been included.

The three tools used for comparison: Rayyan, AbstrackR and Colandr (Cheng et al., 2018) were valuable resources to facilitate the screening process. Rayyan® provided the best scores in the objective evaluation and also rated highest as the most user friendly software according to the raters.

Curzon

School of Art

Mary Seacole

School of Jewellery

Artificial Intelligence (AI) and the literature review process: Screening

Why using AI for screening?

Addressing manual screening challenges

Automating the screening process

AI tools for screening

Effectiveness of ChatGPT in Screening

Advantages of AI Screening Tools

Adoption Challenges

Example prompts for Chatbots

Prompt 1 (adapted from Alshami et al., 2023):

Rayyan AI

Using Rayyan

First steps

Exporting references

Importing references

Collaboration

Including and excluding articles

Evidence

Was this page useful?