Artificial Intelligence (AI) and the literature review process: Generative AI Technology
Natural language processing is a research field that combines linguistics with computer science. It aims to enable computers to understand human language through research into tasks such as classifying text, summarizing text, tagging parts of speech, classifying documents, answering questions, sentiment analysis etc...
Understanding Generative AI technology requires you to grasp key concepts such as: Language model, Pre-training, Tokens, Transformer and Generative pre-trained transformer (GPT).
Language models
Language models are "statistical models of word sequences" (Jurafsky and Martin, 2014: 85).
There are of two types of language models:
(i) statistical language models, which compute the probability of the next token (character, word or string) based on the previous tokens or sequence of tokens;
and
(ii) neural language models which use the power of neural networks to model the sequences of words.
Since the launch of ELMo in 2018, with 94 million parameters and a pretraining dataset of >1 billion tokens, these models have expanded in size because training larger models, especially those based on the transformer architecture (see below), has produced further improvements (Bender et al., 2021; Sanh et al., 2020).
"This trend of increasingly large LMs can be expected to continue as long as they correlate with an increase in performance" (Bender et al., 2021: 611).
They are all now called large language models (LLMs). Table 3 of Ray (2023) provides a comparison of LLMs. LLMs are typically trained on massive amounts of text data, such as Wikipedia, news articles, books, and social media posts. This allows them to learn the patterns and relationships that exist within language and use this knowledge to complete natural language processing tasks.
Pre-training
Pre-training technology has been used to enhance language model performance. It consists of analysing large amounts of data such as text or images.
In natural language processing, there are five categories of pre-training tasks as outlined in the survey by Zhou et al. (2023) of pretraining foundation models. Pre-trained language models are classified according to the word representations approach:
- Autoregressive
- Contextual
- Permuted (Zhou et al., 2023).
Autoregressive language models predict the next possible word based on the preceding word or predict the last possible word based on the succeeding word.
The model is then fine-tuned using a smaller dataset to update the model's weights and biases to better fit the task, usually to improve efficiency, effectiveness and privacy.
The concept of fine tuning language models becomes prohibitively expensive when applied to large language models because of the size of the dataset required - thousands to hundreds of thousands of examples specific to that task. Inspired by human learning, where only a few previous examples and a clear instruction to solve a particular task are required, the new paradigm is to present the LLM with a task in natural language and no (zero-shot learning), one (one-shot learning), or a few (few-shot learning) demonstrations and ask it to provide a solution.
Tokens
When processing prompts and training data, language models break down the information into small text chunks called tokens. For example, in English, a token could be a word (e.g., "cat"), a sub-word (e.g., "un" and "happiness" as separate tokens in "unhappiness") or even individual characters (e.g. @ # $ %). These tokens are the building blocks that LLMs can understand and manipulate to construct meaningful and fluent text.
Models often have a maximum token limit for each input sequence. Exceeding this limit may require truncation or other strategies to fit the input into the agent.
There is also a token limit per session. This means AI agents can "forget" about the earlier prompts in a conversation and you may have to re-state important information throughout the exchange. Some agents indicate this limit in their answers.
Transformer
Transformer architecture was introduced by Vaswani et al. (2017), six of the eight researchers were based at Google.
The architecture is based on the use of an attention mechanism and has applications in natural language processing, computer vision, graph learning and for speech recognition. Transformer-based language models such as GPT-3 and GPT-4 are autoregressive models.
Generative pre-trained transformer (GPT)
Radford et al. (2018), working at Open AI, demonstrated large improvements in natural language processing tasks through generative pre-training of a language model on a dataset of unlabelled text, followed by fine-tuning on each specific task. They used the BooksCorpus dataset of 7000 unpublished books for pretraining. Their model was based on transformer architecture.
The open-source GPT (Generative Pre-trained Transformer) family created by OpenAI, has shown improvements in their level of intelligence from GPT-1 (117 million parameters) in 2018 to GPT-3 (175 billion parameters) in 2020. OpenAI is a world-leading artificial intelligence company created in 2015 which has since received billions of dollars of investment from Microsoft.
Brown et al. (2020), working at OpenAI, trained GPT-3 on a 1 trillion word dataset, Common Crawl. The model showed strong performance on many natural language processing tasks and benchmarks in the zero-shot, one-shot, and few-shot settings as well as showing strong performance on tasks that require on-the-fly reasoning such as unscrambling words, using a novel word in a sentence, or performing three-digit arithmetic.
This family expanded further with the launch of GPT3.5 in March 2022 and GPT-4 (1 trillion parameters and optimized for chat) in March 2023. ChatGPT has been fine-tuned using both supervised and reinforcement learning techniques (OpenAI, 2022). It is estimated to have reached 100 million monthly active users in January 2023, just two months after launch, making it the fastest-growing consumer application in history (Hu, 2023). GPT-4 is a model based on transformer architecture pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF). As the technical report states, "given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training computer,dataset, construction, training method, or similar" (OpenAI, 2023).
OpenAi's GPT-4o model, launched in May 2024, accepts text, audio, image, and video as input, generating any combination of text, audio, and image outputs. It is this multimodal nature which means OpenAi is "still just scratching the surface of exploring what the model can do and its limitations" (OpenAI, 2024).