5.5.3 A Pragmatic View of AI Chatbots: Part 3

The text prompt that was used to create the precursor image was generated by the output of a chatbot.
The multiple ‘attention heads‘ of transformer models are crucial for modern chatbot function.

 A Few Words About How Chatbots Work

Publically available commercial chatbots operate on gigantic computer systems that are in some ways similar to the graphic cards used on the high end domestic machines on which gamers play. These systems are designed to carry out  an absolutely enormous number of simple calculations simultaneously (or in parallel) at very high speed. The use a form of computing known as an artificial neural network

Input Tokens
Chatbot applications and the underlying transformer models are merely mathematical stores of word associations operating with a limited vocabulary of numerical input tokens. These tokens (or numbers) represent words or parts of  longer words.  The token numbers are in themselves entirely arbitrary and so do not directly encode any useful information within the model, apart from the conversion of input text and the generation of output. Current estimates suggest that there are in the region of 100,000 to 200,000 tokens per model. Longer words are not even broken down by the tokenizer into parts that are meaningful to an appropriately educated person. The number tokens that form parts of larger words are merely numerically efficient clusters of letters that occur frequently in the language training set. The word ‘immunohistochemistry’ might be broken down by a human pathologist into immuno, histo(ology) and chemistry, however that need not be the case for a chatbot, because it depends on whether or not these fragments correspond to frequent letter combinations in the training set. “Text tokenization is deterministic. The sentence “The cat sat on the mat” will always tokenize the same way for a given model. This determinism allows the model to focus entirely on the relationships between tokens. ” ( Source: Google Gemini). Since chatbots use tokens (words and statistically relevant parts of words) and we use words or compound words it would appear that human text use differs very significantly from AI applications, even at a conceptually abstract level.

Transformer Models at the Core of  General Purpose Chatbots
Chatbots are based on transformer models of artificial neural networks that were described in a paper published by Google employees in 2017 entitled ‘Attention is all you need‘. What these models do exceptionally well is capture the way that we use words by encoding patterns of word associations and word orders found in human generated text. Transformer models which are at the heart of Large Language Model based chatbots use the input tokens to refer to an extremely long string of numbers (usually called an embedding vectors). The vector embedding of  the word cat for example in the trained model,  is an enormous string of numbers, which makes no sense to a human. To visualize such a vector type click on the words “Show the raw vector of «cat» in model MOD_enwiki_upos_skipgram_300_2_2021:” at this link. You can now see why I said earlier there is nothing to read in these models!

The strings of numbers (or embedding vectors) can be thought of as representing word associations in a high dimensional geometric space. This is hard for humans to envisage as we normally can only easily think in three dimensions. These long number strings are often said to be representative of ‘meaning’ or some kind of ‘semantic relationship’. As a pragmatist I find this unacceptably anthropomorphic. The transformer model is capturing the way in which words are strung together in natural languages. When an astronomically large number of word association examples are analysed during ‘training’ and used at the later stage of system output they are, from a human perspective, encoding information that was initially created by humans in text form. We should not attempt to trivialise the tremendous achievement  and the almost indescribable amount of calculations that have gone into creating systems that are capable of inducing informational contents of human texts. There is neither magic nor meaning, however the facility that we now have to use text to generate a novel text streams by a machine is a very considerable computational achievement.

These models carry out an enormous number of calculations, known as matrix multiplications (If you  wish to know about simple matrix multiplication see a very basic explanatory video).

Easy Further Reading and Resources

1. Large language models, explained with a minimum of math and jargon, by Timothy B. Lee and Sean Trott at https://www.understandingai.org/p/large-language-models-explained-with

2. An intuitive overview of the transformer architecture by Roberto Infante
https://medium.com/@roberto.g.infante/an-intuitive-overview-of-the-transformer-architecture-6a88ccc88171

3. ‘This is not the AI we were promised’ A Royal Society Lecture by Professor Michael John Wooldridge
https://www.youtube.com/live/CyyL0yDhr7I?si=Xgi7upJpt3A0Y39G&t=560

More Advanced Resources and Reading

4. There is a really excellent course of graphical videos about  Neural Networks on Grant Sanderson’s 3Blue1Brown YouTube chanel

5. Try typing in you own original prompt,  try changing the attention heads and trying model characteristics in this very impressive graphic simulation of an LLM at https://poloclub.github.io/transformer-explainer/

A graphically illustrated video from the Neural Networks course by Grant Sanderson of the 3Brown1Blue YouTube Channel
Another graphically illustrated video from the Neural Networks course by Grant Sanderson of the 3Brown1Blue YouTube Channel

 < Previous Part | Contents Index | Next Part >