Wading into the Mire

The chapter “What is AI?” from the website “The Infinitely Long Notebook: Computer Science Edition”, updated in May 2023.

The goal of a program is to find the solution to a problem. Sometimes, this problem is more close ended, like “what is 3+3”. Sometimes, this problem is more open ended, like “what is the best way to get from Los Angeles to San Francisco based on the traffic situation right now”. AI models are programs designed to solve problems that are more complex than those that standard programs can solve. For example, the question “Is this picture a dog or a cat” is more complex than “what is 1984920 * 3430098” because it is relatively easy to write down a set of concrete steps or procedures to solve a maths problem, while there are many possible factors that make a given picture a dog or a cat picture (think of every dog and cat breed that you know, and every pose they’ve been found to do). Some of these questions have no set “correct answer” at all, like “what is the next word in the following sentence” or “can you write me a poem in the style of Shakespeare”.

The way AI models solve complex problems like these is by starting with a large amount of computing power. The input (the picture or text) is fed through many layers of calculations that at first have no idea how to process this input, and largely just jumble it up or sift through it in some random manner. At the end of this initial computation, the AI chooses some arbitrary way to “sum up” an output from the process, and returns it to the user as their answer to the problem. You can think of this as an octopus being shown an incomplete sentence and then using a tentacle to pick a word from a list to “continue” the sentence. Sure, what they pick is probably in some way influenced by the sentence they were shown, but definitely not in the way we would understand as “comprehension”. An AI model initialised in this way is usually very bad at the task it is supposed to perform, since it has been told nothing about how to solve the problem at hand.

To improve an AI model’s performance, we train the model and give it some feedback about how good its answer was. This can be in the form of a “loss”, or the difference between their answer and the correct answer; or a “reward”, or how ideal their answer is in some abstract sense. Losses are useful for problems with fixed solutions e.g. “Is this a cat or a dog?” or “What number is being shown in this picture?”. Rewards are useful for teaching computers to perform complex tasks with constant feedback like playing video games, since there is no strict correct answer for “what to do next” at any given moment but some button presses will produce better results than others e.g. by allowing the computer to succeed at jumping over a pit or killing an enemy and thereby progress further in the game. A simple reward signal for a platformer game might be the distance a model has travelled in any given level before it stops or dies, or how long it has stayed alive.

Based on this loss or reward signal, the computer changes the way it processes the input mathematically, in the hopes of decreasing the loss or increasing the reward. Generally, the computer directly changes the way it processes the input with a technique called backpropagation, which uses calculus to identify the parts of the input processing that were counterproductive and minimises them while maximising the parts of the processing that were helpful. This may mean focusing on the middle of an image to identify whether a dog or cat is in the image, or learning to recognise parts like whiskers to separate the two. This technique is what powers traditional feed-forward Neural Networks. Sometimes, the computer tries to guess which answers will produce the highest amount of future reward—to use the video game example, it might learn to jump when it sees a pit coming up on screen to prevent death, which results in a better reward than if it fell into the pit and died. This is called Reinforcement Learning. Sometimes, this feedback signal is generated by another network, which learns to discriminate between AI output and a “correct” answer over time to force the model to improve. This is useful for when AI models are trained to output complicated images like faces, and this approach is called a Generative Adversarial Network.

Training data, therefore, is simply examples that the model uses to learn what is “correct” and what is not. This can be pictures of dogs or cats with the answer labelled, or large bodies of human text taken from the internet. Just as we learn from examples, AI models do as well during their training process, and generally the more examples there are and the more relevant each example is the better they learn to do a specific task. Some AI models do not need external training data at all, and learn by reward. The video game playing AI we discussed earlier does not learn from an ideal set of player inputs, but learns to create its own methods of problem solving over time. An AI model can even automate its own training: AlphaGo, the Go playing AI by DeepMind which defeated Lee Sedol, played thousands of games with itself to learn strategies for playing the complex board game without knowing anything more than the basic rules and a way of determining which side won.

Models like ChatGPT are examples of a very specific kind of AI model called a transformer. Broadly, these transformers are trained by learning to guess missing parts of the input, and rewarded for guessing the most “realistic” i.e. plausible answer. The way they do this is by alternating layers of traditional neural networks with a special technique called attention, which allows it to learn which parts of the input are the most important. For example, when translating text from English to French some words may be out of order in the two sentences, and attention allows them to pair up the most important words in both. When ChatGPT is given a prompt (“What is an apple?”), it condenses the input to an internal code or “representation” based on their understanding of human language which they then use to generate a continuation (“An apple is a red fruit…”). Since this code is learned from many, many examples of human sentences, the answers it outputs will be very similar to human sentences and will resemble arguments, stories, poems, or other pieces of text written by humans. However, it may not understand what it’s writing, and it is very hard therefore to say concretely whether an AI has an “opinion” or “plan”, since it’s only trying to provide the most plausible continuation to the prompt we have given it and not expressing ideas or goals of its own. Remember, AI models want to answer the problem we have defined for them: In a very real sense, the “objective” of ChatGPT is to continue the prompt as best it can, just as the objective of a dog-cat identification AI is to figure out whether a dog or a cat is in the picture even if there is nothing in the picture at all.

To expand on this idea, possible points of failure for AI models can come from how we send feedback to these models, which is based on mathematical functions that may not model the actual problems we want them to tackle (this can also apply to the simulated situations, tests, or games we use to train AIs). An AI that receives a high reward for staying alive in a video game may simply not move or move very little from the safe starting position, and an AI that receives a lowered loss for identifying dogs and cats correctly may sort every image into either dog or cat no matter the contents of the image. To address this, OpenAI (the makers of ChatGPT) developed a technique called Reinforcement Learning with Human Feedback, in which another “optimiser” AI model is trained to judge which of ChatGPT’s continuations for a given prompt are the most “human-favoured” and the least toxic or nonsensical, and train it to optimise for the most “human-favoured” outputs. Of course, since there are now two AI models at work (the transformer and the optimiser) the problem of making sure the system as a whole is properly aligned with the goals of their human makers is now doubled, and this technique may not be perfect.