Artificial Intelligence (AI) and the literature review process: Abstract and title creation

Application of AI tools such as ChatGPT to searching and all aspects of the literature review process

The current evidence is that "abstract generation by large language models such as ChatGPT is a powerful tool to create readable scientific abstracts" (Gao et al., 2023: 3).

It has been e​​​​​stimated that at least 10% of the 2024 PubMed abstracts were created using an LLM i.e. at least 150,000. This estimate was obtained from an analysis of the language used in 14 million abstracts between 2010 and June 2024. There was a marked increase in the usage of 280 style words (e.g. intricate, meticulously, delves, pivotal, underscores, showcasing) compared to pre-November 2022 (Kobak et al., 2024) when ChatGPT became available.​

AI tools for abstract and title creation

Example prompts

Prompt (from Babl and Babl, 2023):

Write a 250 word abstract on plantar fasciitis in children with title, introduction, methods, results and discussion section using the data below. Add 2 references.

ChatGPT:

Title: Efficacy of oral prednisolone for plantar fasciitis in children: A double-blind randomized controlled trial

Introduction: Plantar fasciitis is a common cause of heel pain in adults, but it is rarely reported in children. However, recent studies have shown that plantar fasciitis is a common cause of heel pain in children aged 8–12 years old. Although there is no standard treatment for plantar fasciitis in children, non-steroidal antiinflammatory drugs and rest are commonly used. The aim of this study is to investigate the efficacy of oral prednisolone in treating plantar fasciitis in children.
Methods: A double-blind randomized controlled trial was conducted on 177 children aged 8–12 years old, who were randomly assigned to receive either oral prednisolone or placebo for 10 days. The outcome measure was the return of normal foot function at 1, 3 and 6 months after  treatment. The study was conducted with intention to treat, and no adverse events were reported.

Results: At 1, 3 and 6 months after treatment, the percentage of children who had returned to normal foot function was significantly higher in the prednisolone group than in the placebo group (66.67% vs 44.94%, 81.82% vs 52.94% and 98.82% vs 60.98%, respectively). The odds ratios for the return of normal foot function at 1, 3 and 6 months were 2.45 (95% CI 1.34–4.49, P = 0.003), 4.00 (95% CI 2.01–7.97, P < 0.001) and
53.76 (95% CI 7.13–405.63, P < 0.001), respectively.

Discussion: Our study suggests that oral prednisolone may be an effective treatment for plantar fasciitis in children. However, further studies are needed to confirm these findings and to investigate the long-term effects of oral prednisolone on foot function in children with plantar fasciitis.

References:
Yan S, Li B, Hu Y et al. A systematic review and meta-analysis of risk factors for pediatric plantar fasciitis. J. Foot Ankle Res. 2020; 13: 16. doi: 10.1186/s13047-020-0397-1
Radford JA, Landorf KB, Buchbinder R, Cook C. Effectiveness of low-dye taping for the short-term treatment of plantar heel pain: a randomised trial. BMC Musculoskelet. Disord. 2006; 7: 64. doi:10.1186/1471-2474-7-64

Evidence

Babl and Babl (2023) showed that ChatGPT could produce a grammatically and structurally correct conference abstract of reasonable quality, with no obvious errors, which was well written with headings and word length as requested. The title was appropriate, the introductory sentences consistent with the state of knowledge on the topic, the results including the statistical tests were correctly extracted from the data table, and the conclusions a reasonable interpretation of the results. 

Gao et al. (2023) took 50 abstracts from five high-impact journals as the control data and asked ChatGPT to generate 50 scientific abstracts in the style of these five journals. Only 8/50 of the generated abstracts (16%) correctly used the headings specific to the journal. Improving the prompt may have improved the results which were based on the first output of ChatGPT. Human reviewers were given a mixture of real and generated abstracts. They could identify correctly 68% of AI-generated abstracts as being generated but incorrectly 32% of AI-generated abstracts as being real. They commented that they found it surprisingly difficult to differentiate between the two. 

ChatGPT was used to draft a scientific article for Reproductive BioMedicine Online (Altmae et al., 2023). The generated abstract was regarded as acceptable although it needed a further couple of sentences by way of introduction to the topic. ChatGPT was helpful in suggesting a title which was highly appropriate.