Glossary of AI terms: Understanding GPT, neural networks, and more

Senior Machine Learning Engineer, Intercom

Senior Managing Editor, Intercom

Are you intrigued by the possibilities of AI but finding it difficult to get to grips with all the technical jargon? Our AI glossary will help you understand the key terms and concepts.

AI is constantly evolving and expanding, with new developments and applications emerging every week – and it feels like the amount of jargon to keep up with is developing just as fast.

All in all, it can be a bit overwhelming, so we’ve compiled a list of concepts and terms to help you better understand the brave new world of artificial intelligence.

If you’d like to get more of our content about AI and automation delivered to your inbox, be sure to subscribe to our regular newsletter.

Artificial intelligence (AI)

AI refers to the creation of intelligent machines that are capable of performing complex tasks that typically require human-level intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI systems can be trained to learn and improve their performance over time, allowing them to complete more complex tasks with greater efficiency and accuracy.

Deep learning

Deep learning refers to methods for training neural networks with more than one layer, with each layer representing different levels of abstraction. Typically these deep networks are trained on large datasets to make predictions or decisions about data.

A neural network with a single layer may be able to make approximate predictions, but additional layers can help to improve accuracy – each building upon the previous layer to optimize and refine the predictions.

Deep learning algorithms are highly effective at processing complex and unstructured data, such as images, audio, and text, and have enabled significant advances in a wide range of applications such as natural language processing, speech recognition, and image recognition systems that include facial recognition, self-driving cars, etc.

Embedding

An embedding in the context of natural language processing (NLP) is a recipe for turning text of variable length into a set of numbers of fixed length. Usually this set of numbers will preserve semantic meaning in some sense – for instance, the set of numbers for “dog” and “animal” will be close together in a mathematical sense. This enables text to be processed efficiently by NLP algorithms.

Encoder and decoder networks

These are types of deep neural network architectures whose job it is to convert a given input, say text, into a numerical representation, such as a fixed length set of numbers (encoder), and also convert these numbers back to a desired output (decoder).

They are very commonly used in natural language processing tasks such as machine translation.

Fine-tuning

The process of adapting a pre-trained model to a specific task by training it on a new dataset. That model is first trained on a large, general dataset and then on a smaller, more specific dataset related to the task – that way, the model can learn to recognize more nuanced patterns in the data specific to the task, leading to better performance.

Fine-tuning can save time and resources by using general models instead of training new ones from scratch, and it can also reduce the risk of overfitting, where the model has learned the features of a small-ish training set extremely well, but it’s unable to generalize to other data.

Generative adversarial networks (GANs)

A class of AI algorithms used in unsupervised machine learning in which two neural networks compete with each other. GANs have two parts: a generator model that is trained to generate new examples of plausible data, and a discriminator model that tries to classify examples as either real data or fake (generated) data. The two models then compete against each other until the discriminator gets worse at telling the difference between real and fake and starts to classify fake data as real.

Generative AI

A type of artificial intelligence that can create a wide variety of content – including text, images, video, and computer code – by identifying patterns in large quantities of training data and generating unique outputs that resemble the original data. Unlike other forms of AI that are based on rules, generative AI algorithms use deep learning models to generate novel outputs that are not explicitly programmed or predefined.

Generative AI is capable of producing highly realistic and complex content that mimics human creativity, making it a valuable tool for a wide range of applications, like image and video generation, natural language processing, and music composition. Examples include recent breakthroughs such as ChatGPT for text and DALL-E and Midjourney for images.

Generative pre-trained transformer (GPT)

Generative pre-trained transformers, or GPTs, are a family of neural network models trained with hundreds of billions of parameters on massive datasets to generate human-like text. They are based on the transformer architecture, introduced by Google researchers in 2017, which allows the models to better understand and apply the context in which words and expressions are used and selectively attend to different parts of the input – focusing on relevant words or phrases that it perceives as more important to the outcome. They are capable of generating long responses, not just the next word in a sequence.

The GPT family of models are considered the largest and most complex language models to date. They’re typically used to answer questions, summarize text, generate code, conversations, stories, and many other natural language processing tasks, making them well-suited for products like chatbots and virtual assistants.

In November 2022, OpenAI released ChatGPT, a chatbot built on top of GPT-3.5, which took the world by storm, with everyone flocking to try it out. And the hype is real: more recent advances in GPT have even made the tech not just feasible for business settings like customer service, but actually transformational.

Hallucinations

An unfortunate but well-known phenomenon in large language models, where the AI system provides a plausible-looking answer that is factually incorrect, inaccurate, or nonsensical because of limitations in its training data and architecture.

A common example would be when a model is asked a factual question about something it hasn’t been trained on and instead of saying “I don’t know” it will make something up. Alleviating the problem of hallucinations is an active area of research and something we should always keep in mind when evaluating the response of any large language model (LLM).

Large language model (LLM)

LLMs are a type of neural network capable of generating natural language text that is similar to text written by humans. These models are typically trained on massive datasets of hundreds of billions of words from books, articles, web pages, etc., and use deep learning to understand the complex patterns and relationships between words to generate or predict new content.

While traditional NLP algorithms typically only look at the immediate context of words, LLMs consider large swaths of text to better understand the context. There are different types of LLMs, including models like OpenAI’s GPT.

LLM agents (e.g. AutoGPT, LangChain)

On their own, LLMs take text as input and provide more text as output. Agents are systems built on top of an LLM that give them agency to make decisions, operate autonomously, and plan and perform tasks without human intervention. Agents work by using the power of LLMs to translate high level language instructions into the specific actions or code required to perform them.

There is currently an explosion of interest and development in Agents. Tools such as AutoGPT are enabling exciting applications such as “task list doers” that will take a task list as input and actually try and do the tasks for you.

Machine learning (ML)

A subfield of AI that involves the development of algorithms and statistical models that enable machines to progressively improve their performance in a specific task without being explicitly programmed to do so. In other words, the machine “learns” from data, and as it processes more data, it becomes better at making predictions or performing specific tasks.

There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning is a machine learning approach that uses labeled datasets designed to train algorithms into classifying data or predicting outcomes accurately. For example, if you provide a set of labeled pictures of cats and dogs, a model will be able to predict new, unlabelled pictures of cats and dogs;
Unsupervised learning looks for undetected patterns in a dataset with no pre-existing labels or specific programming and with minimum human supervision;
Reinforcement learning involves training a model to make decisions based on feedback from its environment. It learns to take actions that maximize a reward signal, such as winning a game or completing a task.

Natural language processing (NLP)

NLP is a branch of AI that focuses on the interaction between human language and computers. It combines rule-based modeling of human language with statistical, machine learning, and deep learning models, typically trained using large amounts of data, that enable computers to process, understand, and generate human language.

Its applications are designed to analyze, understand, and generate human language, including text and speech. Some common NLP tasks include language translation, sentiment analysis, speech recognition, text classification, named entity recognition, and text summarization.

Neural networks

Neural networks are a subfield of machine learning proposed in 1944 by two Chicago researchers, Warren McCullough and Walter Pitts, that is modeled after the structure of the human brain. It consists of layers of interconnected nodes, or neurons, that process and analyze data to make predictions or decisions: each layer receives inputs from nodes in the previous layer and produces outputs that are fed to nodes in the next layer. The last layer, then, outputs the results.

They have been used for a wide range of applications, including image and speech recognition, natural language processing, and predictive modeling.

Prompt engineering

A prompt is a set of instructions written as text or code you provide as an input to an LLM to result in meaningful outputs, and can be as simple as a question. Prompt engineering is the skill (or art, some would argue) of creating effective prompts that will produce the best possible output for any given task. It requires an understanding of how large language models (LLMs) work, the data they’re trained on, and their strengths and limitations.

Reinforcement learning from human feedback (RLHF)

RLHF refers to the process of using explicit human feedback to train the reward model of a reinforcement learning system. In the context of an LLM, this might be humans ranking the outputs of the LLM and picking the responses they prefer –this is then used to train another neural network, called a reward model, that can predict if a given response will be desirable to humans. The reward model is then used to fine-tune the LMM to produce output that is better aligned with human preferences.

These techniques are thought to be a highly impactful step in the development of LLMs like ChatGPT which have seen breakthrough advancements in their capabilities.

Transformer

A transformer is a type of deep neural network architecture that is made up of multiple encoder and decoder components that are combined in such a way to enable the processing of sequential data such as natural language and time series.

These are just a few of the most common terms in AI that you are likely to encounter. Undoubtedly, glossaries like these will forever be an ongoing project – as the tech continues to evolve, new terms and ideas will keep emerging. But for now, by understanding these concepts, you can build a solid foundation that will help you keep up with the latest developments.