It is getting harder to keep up with AI and the growing list of jargon and scientific terms surrounding it.
Throughout its history, AI has gone through hot and cold periods of hype. In the periods of hype — now with the explosion of large language models — new terms always make their way into public parlance.
Yet, there are many people who hear the term AI and it doesn’t register. Or they have heard it in passing but only vaguely know what it is. There are also different levels of awareness.
This glossary aims to serve both as a resource for those just being introduced to AI and for those looking for a reference or to refresh their vocabulary.
Agent. An intelligent agent is an AI system that can independently perceive its environment and act autonomously to reach an objective. The environment may be simulated or physical and doesn’t need repeated prompting to achieve a bigger task.
AI alignment. AI alignment is the process of trying to get an AI system to function as intended. Alignment covers both small, direct goals, such as writing a sentence, and large conceptual ones, such as conforming to certain values and moral standards. As AI systems and their target goals get more complex, aligning them becomes more difficult.
Algorithm. An algorithm is a step-by-step procedure or set of instructions that a computer follows. In the context of AI, an algorithm can be used by machine learning systems to ingest data and make predictions based on it.
Anthropomorphism. Anthropomorphism is the tendency to attribute human qualities to nonhumans. For example, calling a chatbot “he” or “she,” saying a chatbot wants something or saying a chatbot is trying to do something is anthropomorphizing AI. This happens often in conversations about artificial intelligence because it is often designed to sound or appear human.
Artificial general intelligence (AGI). AGI is a machine that can perform any intellectual task that a human can. AGI is a category of AI. There are no current AI systems that count as AGI, though some claim that recent technologies, such as GPT-4, come close.
Artificial intelligence (AI). AI is the simulation of human intelligence processes by computer systems.
Bias. Machine learning bias, or AI bias, occurs when algorithms produce results that are systemically prejudiced. Bias can be present in the code of an AI system or the data it trains on. Machine learning bias can affect decision-making.
Black box AI. Black box AI systems’ operations are not visible. The user provides input, and the machine provides an output. But the exact steps the machine took to arrive at the response are opaque. Explainable AI is the antithesis of black box AI. Its exact logic and operations are transparent.
ChatGPT. ChatGPT is a chatbot from OpenAI that inspired a wave of public attention directed toward large language models and generative AI. It uses algorithms to produce text responses to user input in natural language.
Chatbot. A chatbot is an AI-powered tool designed to communicate with people in a conversational way. ChatGPT is an example of a chatbot. Generative AI chatbots are sometimes used as an alternative to search engines for information retrieval.
Constitutional AI. Constitutional AI trains AI systems to align with a set of values or principles as defined in a constitution. This approach was developed by AI startup Anthropic.
Corpus. A corpus is a large data set of written or spoken words that is used to train language models.
Copilot. Copilot is the brand name of Microsoft’s suite of AI-assisted workplace products.
Cutoff date. The cutoff date is the date at which the model’s information ends. AI models can’t recall information past the cutoff date. For example, GPT-3.5’s cutoff date is September 2021.
Data mining. Data mining is the process of searching for data and looking for patterns within a large data set to extract useful information. Names of entities in the data set are one example of data that mining processes can extract.
Data validation. Data validation is the process of checking the quality of data before using it to develop and train AI models.
Dall-E. Dall-E is OpenAI’s AI-powered image generator. Users submit a text prompt, and the AI tool generates a corresponding image.
Deepfake. A deepfake is a convincing AI-generated image, audio or video hoax. A deepfake can be entirely original content that shows someone doing or saying something they didn’t do or say. They can also depict fake news events. A deepfake of the Pentagon exploding went viral in May 2023 and had a tangible effect on the stock market.
Deep learning. Deep learning is a type of machine learning that imitates the way humans gain certain types of knowledge.
Embodied agents. Embodied agents, also referred to as embodied AI, are AI agents with a physical body that perform specific tasks in the physical environment.
Emergence. Emergence describes capabilities that arise in AI systems unpredictably as they become more complex. A system’s emergent properties are not observable in its individual parts.
EU AI Act. The EU AI Act is a regulatory framework for responsible AI deployment in a way that doesn’t conflict with data privacy rights.
Expert system. Expert systems are AI systems used to simulate the judgment or behavior of a human expert.
Fréchet inception distance (FID). FID is a metric for evaluating the quality of images created by generative AI.
Garbage in, garbage out (GIGO). GIGO is a concept in computer science that states the quality of a system’s output is based on the quality of the input. If the input is trash, the output will be trash. If AI is trained on low-quality data, then the user can expect its output to be shoddy as well.
Generative adversarial network (GAN). A GAN is a type of machine learning that consists of two neural networks: one that generates data and one that discriminates and refines the data. The two neural networks compete to create more accurate predictions.
Generative AI. Generative AI is an artificial intelligence technology that creates content by learning patterns in training data and synthesizing new material with the same learned characteristics.
Graphics processing unit (GPU). A GPU is a type of processor that is suited to powering AI hardware because it can perform more simultaneous computations than a CPU.
Generative pre-trained transformer (GPT). GPTs are the AI algorithms that power some of the most well-known natural language processing and generative algorithms. GPT-3, GPT-3.5 and GPT-4 are examples of the GPT family of algorithms. They were developed by OpenAI.
Hallucination. An AI hallucination is when an AI system presents false information framed as the truth. For example, a chatbot prompted to write a five-page research report with citations and links might generate fake links that appear real but lead nowhere or fabricate quotes from public figures as evidence. A deepfake is different than a hallucination in that it is intentionally created as a hoax to trick the viewer.
Knowledge engineering. Knowledge engineering is the field of AI that aims to emulate a human expert’s knowledge in a certain field.
Large language model (LLM). LLMs are deep learning algorithms that understand, summarize, generate and predict new content. They typically contain many parameters and are trained on a large corpus of unlabeled text. GPT-3 is an LLM.
Large Language Model Meta AI (LLaMA). LLaMA is an open source LLM released by Meta.
Moats. Moats are mechanisms that prevent competitors from copying a proprietary LLM. An LLM’s moats are training data, model weights and the cost of training.
Model. An AI model is a machine learning algorithm that has been trained and deployed.
Multimodal AI. Multimodal AI systems can handle input and produce output in several mediums. Multimodal systems can handle any combination of images, video, text or sound, as opposed to just text.
Model collapse. Model collapse is when low-quality, AI-generated content contaminates the training set for future models.
Natural language generation (NLG). NLG is the use of AI to produce written or spoken language from a data set.
Natural language processing (NLP). NLP is the ability of a computer program to understand human language as it is spoken and written. It is a subfield of linguistics as well.
Neural network. Neural networks are a type of deep learning technology made up of interconnected artificial neurons. The neurons are nodes that process and transmit information. Two of the main neural network types are recurrent neural networks and CNNs.
Neuromorphic computing. Neuromorphic computing is a method in which computer design is modeled after elements of the human brain. It can apply to both hardware and software.
OpenAI. OpenAI is an American artificial intelligence company. It conducts AI research and has developed several AI models and services in the last decade, including GPT-3, ChatGPT and Dall-E.
Overfitting. Overfitting is a concept in data science where a statistical model conforms too tightly to its training data, harming its ability to process unseen data.
Parameter. Parameters in AI refer to the internal settings that are learned by a machine learning model during the training process. Parameters augment the model’s approach to pattern recognition.
Prompt. A prompt is input that a user feeds to an artificial intelligence system with some expected result. Some companies, such as GoDaddy and Uproer, produce prompt libraries to be used with generative AI applications, such as ChatGPT.
Pathways Language Model (PaLM). PaLM is Google’s transformer-based LLM, based on similar technology to GPT-3 and GPT-4. The Google Bard chatbot runs on PaLM.
Prompt engineering. Prompt engineering is the process of developing and refining prompts for LLMs. Prompt engineering is used by AI engineers to refine LLMs and by generative AI users to hone the output they want from the model.
Q-learning. Q-learning is a type of reinforcement learning that enables AI models to learn and improve iteratively over time.
Recommendation engine. A recommendation engine is an AI algorithm that is used to serve users content based on their preferences. Social sites, such as TikTok, and streaming platforms, such as Spotify and YouTube, use recommendation engines to personalize user feeds.
Reinforcement learning. Reinforcement learning is a machine learning training method that rewards desired behaviors and punishes undesired ones. Through reinforcement learning, a machine learning agent perceives its environment, takes action and learns through trial and error.
Reinforcement learning from human feedback (RLHF). RLHF trains models directly from human feedback, as opposed to from a coded reward stimulus. Humans may score a chatbot’s output and feed those scores back into the model.
Sentiment analysis. Also known as opinion mining, sentiment analysis is the process of analyzing text for tone and opinion with AI.
Supervised learning. Supervised learning trains machine learning algorithms on data with labels, known as structured data.
Speech recognition. Speech recognition converts spoken language into text using AI.
Synthetic data. Synthetic data is computer-generated information for testing and training AI models. Large AI models need large quantities of data to train on, which is traditionally generated in the real world.
Technological singularity. The singularity describes a point in the future where advanced AI becomes more intelligent than humans and technological growth becomes uncontrollable.
Training data. Training data refers to the information or examples used to train a machine learning model. AI models need a large training data set to learn patterns and guide their behavior.
Transformer. Transformer models are an AI model architecture used for NLP. Transformer-based models are neural networks that can uniquely process context and long-term dependencies in language.
Turing test. The Turing test is a method of inquiry for determining if a computer is capable of thinking like a human being. The test was created by Alan Turing, an English computer scientist and cryptanalyst. A computer has intelligence if it can mimic human responses under certain conditions, according to the Turing test.
Token. A token is the basic unit of text that an LLM uses to understand and generate language. It may be a word or parts of a word. Paid LLMs, such as GPT-4’s API, charge users by token.
Unsupervised learning. Unsupervised learning trains machine learning algorithms on data without labels, known as unstructured data.
Variational autoencoder. Variational autoencoders are a generative AI model architecture commonly used for signal analysis and finding efficient coding of input data. They are comparable to GANs in that they pit two neural networks against each other — an encoder and a decoder.
Zero-shot learning. Zero-shot learning is a machine learning technique where algorithms observe samples that were not present in training and predict what class they belong to. For example, an image classifier only trained to recognize cats would be told to classify an image of a lion knowing some extra information — that lions are big cats.