Artificial Intelligence (AI) and the literature review process: Screening
Screening is the process of selecting studies for inclusion in a review. The process usually involves the removal of duplicate records using reference management software. There are then two stages: (1) checking the title and abstract to remove those studies clearly irrelevant; and (2) obtaining the full-text of the study and applying pre-determined eligibility criteria to determine which studies should be included.
Screening to select studies for inclusion in the review is the most time-consuming aspect of the systematic literature review (SLR) process. It is also subject to error because it is a manual process. Researchers have therefore looked to speed up the process and make it more accurate. Van Dinter et al. (2022) found 41 studies on automating the process, all of which involved the use of machine learning techniques (most often Naive Bayes and Support Vector Machine (SVM) algorithms) to do so. There were 19 AI tools used for the screening process identified in the survey by Bolaňos et al. (2024) four of which were exclusive to the biomedical literature and which are set up to identify randomised controlled trials. 15 of these tools use artificial intelligence for one task only: to classify whether the study is relevant or irrelevant. One of these tools (Rayyan AI) is detailed below as it is the tool most often mentioned by our researchers. Only four of the tools operate under open licenses (ASReview, Colandr, FAST2, RobotReviewer/Robotsearch).
AI tools for screening
AI tools have also been used to speed up the screening process. Syriani et al. (2023) have shown that ChatGPT can work at the same accuracy as these machine learning techniques without the need for training which gives it (p. 2) "a realistic chance to revolutionize SLR automation". Syriani et al. (2024) then showed that ChatGPT could reach 82% accuracy when given five systematic literature review datasets. They concluded that LLMs were not ready to replace article screening by humans but offered promising solutions to assist reviewers in the screening process by, for example, discarding all the articles that ChatGPT has excluded.
You can tell the AI tool which records are definitely to be included in your selected studies. The AI tool can then be trained to determine which records should be included based on your decisions. The trained AI tool can then re-rank the records to improve the efficiency of the screening process. Selected studies for inclusion are added earlier in the screening process rather than randomly as with manual screening.
Applying an AI tool for screening has several advantages (Thomas et al., 2024):
- it ensures a consistent interpretation of subject matter, especially where the subject is complex and there is subjectivity in applying inclusion criteria even among experts;
- the tool’s responses enable you to clarify terminology and identify which specific phrases might lead to confusion.
They applied ChatGPT 3.5 to the screening process, asking it to assess 2917 studies on indicators of ecosystem condition for relevance based on their title and abstract. It completed this process in 10% of the time taken by expert reviewers. Studies were classified as selected, rejected or uncertain. The choices made by the tool were compared to those made by expert reviewers. One version of the prompt achieved 100% accuracy in selecting studies but did so at the expense of precision, selecting a higher proportion of irrelevant studies than other versions of the prompt. Using a prompt that emphasises and repeats key terms improves the performance of the generative AI tool.
Similar results that showed the effectiveness of ChatGPT 3.5 in classifying articles when provided with a title and abstract were demonstrated by Alshami et al. (2023) in the context of the application of internet of things to water infrastructure projects. They considered that ChatGPT was outstanding in removing irrelevant articles and most effective in classifying the articles into three main categories. ChatGPT provides justification for its classification, allowing you to evaluate whether its decision is appropriate.
Despite the availability of tools to automate the title and abstract screening process, there has been little take up of these tools. One of the reasons is the need for any automation methods used in synthesizing evidence to be freely accessible and transparent to examination. Many of the 18 interviewees in this survey by Arno et al. (2020) emphasized that they were accountable to stakeholders who needed to be sure that information had not been missed and therefore needed to examine freely the methods used.
Example prompts for Chatbots
Prompt 1 (adapted from Alshami et al., 2023):
Categorise each paper based on the information provided into one of four categories (list the categories). Use the details from each paper and do not make assumptions. Provide the results in a table format where the columns represent each category and there is a row for each paper. Include an X in the row under one of the four categories. After the table, explain why you decided on the category for each paper.
When inputting article titles, the prompts were limited to 10 articles at a time because of the prompt limits on the number of tokens. When inputting article titles plus the abstract, prompts were limited to 5 articles at a time.
Rayyan AI
Rayyan AI is a web-based automated screening tool, developed by Qatar Computing Research Institute (QCRI), which launched in 2014.
It uses text mining techniques to identify relevant information using statistical pattern learning that recognizes patterns in the data.
Using Rayyan
First steps
- Sign up for an individual account with Rayyan.
- Check your browser compatibility using https://rayyan.ai/check.
- Select My Reviews.
- Then select New review.
- Give the review a title, select your research field from the drop-down menu, select the review type, select the review domain which is similar to research field, give an optional description and press Create.
This review will now appear in the My Reviews tab when you enter Rayyan.
Exporting references
Rayyan then presents you with an option to import records directly from Mendeley.
For all other reference management software, such as EndNote or Zotero, select Upload references. These need to be in RIS format. You are best to export your references from your reference management software in RIS format.
You will find including the abstract is helpful.
Importing references
- Remove any duplicates before you import into Rayyan.
- Then choose Select files.
- Find your file in your downloads folder.
- Click upload then Continue.
You can add new records to your review by selecting the New Search button top right.
Collaboration
The tutorial video from the Rayyan HelpCentre shows how this works in practice and includes how the collaboration functions work.
You can invite colleagues to join you in the review:
- Click All reviews, then select the review and click invite.
- Your collaborators will then receive an email and can see the review in the Collaboration reviews tab.
Including and excluding articles
Your articles, once imported into Rayyan, appear as undecided in the inclusion decisions box (facet):
- Select each article being screened.
- Decide whether to include or exclude each article.
- When including an article your name will appear beside the article in green.
- When excluding an article, select the Reason option and choose from the list of pre-populated reasons.
- You can create new reasons for excluding. Your article will then be marked as excluded and a label with your exclusion reason is added to the article in red.
- You can select I or E or M as keyboard shortcuts.
- Use shift and up/down arrows to select multiple references.
Rayyan keeps a count of numbers of articles tagged with each exclusion reason. Once you have made 50 decisions, you can click on Compute ratings. Rayan's artificial intelligence engine will then compute the probability of each of the remaining records being included or excluded based on the decisions you have already made. Each undecided rating will receive a rating of 1 to 5 where 5 is the most likely to be included.
You can filter by Undecided to show the references where you did not make the decision. The more decisions you make the more accurate Rayyan's AI engine will be as it learns more from each decision.
Evidence
Harrison et al. (2020) reviewed 15 software tools that used machine learning techniques to support abstract and title screening. They would recommend Covidence and Rayyan to systematic reviewers looking for suitable and easy to use tools for screening. The medical researchers included in the survey were all involved with systematic reviews. There were six tools which performed best, all scoring higher than 75% in the feature analysis, and these were included in the user survey. The remaining tools were DRAGON, AbstrackR, Colandr and EPPI-Reviewer.
Rayyan speeds up the screening process of selecting which studies to include in a systematic review. In experiments on a set of 15 reviews, users reported time savings in the order of 40% on average compared to previous tools they had been using. Rayyan’s two most important features compared to its competitors are its abstract and title screening assistance and opportunity for authors to collaborate on the same review (Ouzzani et al., 2016).
Valizadeh et al. (2023) compared the use of Rayyan with human reviewers in the manual screening of >2000 records from three systematic reviews. This was done in four stages. At the end of each, Rayyan was used to predict the eligibility score for the remaining records. Rayyan assigns a star rating to each record. Rayyan proved a reliable tool for excluding ineligible records at the threshold of <2.5 stars for exclusion. The findings were confirmed by Dos Reis et al. (2023) who concluded that the use of software to screen titles did not remove any that should have been included.
The three tools used for comparison: Rayyan, AbstrackR and Colandr (Cheng et al., 2018) were valuable resources to facilitate the screening process. Rayyan® provided the best scores in the objective evaluation and also rated highest as the most user friendly software according to the raters.