5.5.3 A Pragmatic View of AI Chatbots: Part 3

The text prompt that was used to create the precursor image was generated by the output of a chatbot.
A Few Words About How Chatbots Work
Publicly available commercial chatbots operate on gigantic computer systems. The processing chips are in some ways similar to the graphic cards used on the high-end domestic machines on which gamers play, minus the hardware needed to drive screen display and texture mapping. These systems are designed to carry out an absolutely enormous number of simple calculations simultaneously (or in parallel) at very high speed. They carry out low-precision (8-bit floating-point) arithmetic, at an unprecedented scale. Google said in 2025 that their ‘super-pods’ of 9,216 Ironwood chips can calculate 42.5 exflops (42.5 × 1018 ) floating-point operations per second (FLOPS) using 1.77 petabytes of high-bandwidth memory. That memory capacity is equivalent to 27,657 high-end home computers with 64 GB of RAM, although possibly slower in the domestic case.
They use a form of computing known as an artificial neural network. This is a “type of AI that learns to recognize patterns in data by figuring out relationships on its own, rather than being programmed with strict, explicit rules”. For a beautiful non-technical explanation of different types of neural network, see ‘What Is a Neural Network (For Non-technical People)?‘ by Adman Steele.
Input Tokens
Chatbot applications and the underlying transformer models are merely mathematical stores of word associations operating with a limited vocabulary of numerical input tokens. These tokens (or numbers) represent small words or parts of longer words. The token numbers are in themselves entirely arbitrary and so do not directly encode any useful information within the model, apart from the conversion of input text and the generation of output. Current estimates suggest that there are in the region of 100,000 to 200,000 tokens per model. Longer words are not even broken down by the tokenizer into parts that are meaningful to an appropriately educated person. The group of number tokens that form parts of larger words are merely numerically efficient clusters of letters that occur frequently in the language training set. The word ‘immunohistochemistry’ might be broken down by a human pathologist into ‘immuno’, ‘histo'(ology) and chemistry. However, that need not be the case for a chatbot because it depends on whether or not these fragments correspond to frequent letter combinations in the training set. “Text tokenization is deterministic. The sentence ‘The cat sat on the mat’ will always tokenize the same way for a given model. This determinism allows the model to focus entirely on the relationships between tokens.” (Source: Google Gemini). Since chatbots use tokens (words and statistically relevant parts of words) and we use words or compound words, it would appear that human text use differs very significantly from AI applications, even at a conceptually abstract level. Put simply, even the fundamental way we handle language is different from chatbots.
Transformer Models at the Core of General Purpose Chatbots
Chatbots are based on transformer models of artificial neural networks that were described in a paper published by Google employees in 2017 entitled Attention is all you need. What these models do exceptionally well is capture the way that we use words by encoding patterns of word associations and word orders found in human-generated text. Transformer models which are at the heart of Large Language Model-based chatbots use the input tokens to refer to an extremely long string of numbers (usually called embedding vectors). The vector embedding of the word cat, for example, in the trained model, is an enormous string of numbers, which makes no sense to a human. To visualize such a vector type, click on the words “Show the raw vector of «cat» in model MOD_enwiki_upos_skipgram_300_2_2021:” at this link. You can now see why I said earlier there is nothing to read in these models! This style of vector encoding is the only type of information held within the model. The vast number of vectors within the model represents information storage, but not knowledge as we conventionally think of it.

A nicely simplified schematic explanation of the transformer model architecture from the Cohere LLM University. See the associated very clear article; What are transformer models?
(Cohere is a company that provides secure AI models and services to businesses)
The very long strings of numbers (or embedding vectors) can be thought of as representing word associations in a high dimensional geometric space. This is hard for humans to envisage, as we normally can only easily think in three dimensions. These long number strings are often said to be representative of ‘meaning’ or some kind of ‘semantic relationship’. As a pragmatist, I find this unacceptably anthropomorphic. The transformer model is capturing the way in which words are strung together in natural languages. When an astronomically large number of word association examples are analysed during ‘training’ and used at the later computational stage of system output they are, from a human perspective, encoding information that was initially created by humans in text form.
The multi-head transformer is deliberately designed to pick up the statistical autocorrelations in well-formed human language use. Consider the following sentence: ‘The words in this sentence only make sense because they have been used in a particular way, and in the case of English also because of the word order.’ For the final word ‘order’ to make sense, ‘word’ had to become before it, and so on. Such properties of word distributions are important in sentences and over much longer stretches of text. For example, one would expect the term ‘word order’ to appear in texts that deal with autocorrelation in language. The frequency of use and the distribution of those usages would also be expected to be different when compared to a book on the philosophy of science. It seems utterly remarkable that capturing patterns of language use in this way can result in the creation of a machine that predicts text sequences, which have meaning for humans. This emergent characteristic is thought to become particularly successful as the model size, the amount of training text, and the amount of computation is increased. (see this easy-to-read source or a research paper by Google Deep Mind).
The way that ‘learning’ is achieved during the training stage is by the ‘backpropagation‘ of error at the point of output. System-wide backpropagation is an arithmetically enormous process that does not correspond to any biological feature of the nervous system that we know about. During training, the system starts with random values (or weights) in the vectors and recalculates the contribution of all the weights in the network to minimise next word prediction errors in the output. When that process is repeated billions or trillions of times, the network is trained. Each network stores hundreds of billions of weights needed to produce output on all of the subjects encountered in publicly available text. Grant Sanderson has estimated that if you had a computer that could only do 1 billion arithmetic operations per second, it would take more than 100 million years to calculate a large language model. It seems extremely unlikely that biological evolution of multicellular animals over the past 500 million years has created a system that starts with random gibberish as output. It also seems equally improbable that we then retune the entire language capabilities of brain function for every error encountered. The Large Language Model is trained on a corpus of human text containing trillions of tokens. By comparison, in learning a language, children perhaps encounter a few tens of millions of word instances.
We should not attempt to trivialise the tremendous achievement and the almost indescribable number of calculations that have gone into creating systems that are capable of mimicking the informational content of human texts. The output is neither magic nor the outpourings of an oracle. Any meaning that the system seems to impart is one that we humans apply to the output. However, the facility that we now have to use novel text streams generated a by a machine is a very considerable computational achievement.
These models carry out an enormous number of calculations, known as matrix multiplications (If you wish to know about simple matrix multiplication, see a very basic explanatory video).
Optimising the Output from In-Depth Inquiries
We have now entered the era of publicly available but very much slower Retrieval-Augmented Generation (RAG), for a modest subscription fee. RAG might include the addition of relevant documents on your part or an automated and extensive internet search forming part of preliminary ‘thinking’. At a technical level, a well-designed RAG system changes the probability of which text strings will be generated in the output, but does not abolish errors.
When the Google Gemini bot user options are set to ‘thinking’ the preliminary text output now gives the impression that this bot is using a more sophisticated technique called Chain-of-RAG. [ See this example created for me, and compare it with the final output.] In this type of bot architecture, the initial query can be split into sub-queries. The initial response appears to iteratively influence the future steps of the generation process. With the present day general purpose transformer models, Chain-of-RAG should probably now be used for all professional purposes unless operating in a narrow domain that has all the needed pre- and post-training. RAG is essentially a productive, although not infallible, ‘workaround’ for the tendency of current chatbots to make both blatant and subtle errors.
There are also more sophisticated systems that are said to be Agenetic. An example of agentic workflow is the ‘Deep Research’ options in Google Gemini and ChatGPT (source Google Gemini). These systems act as AI-agents and are more flexible than Chain-of-RAG, as they can run inquiries autonomously and call the use of tools through an Application Programming Interface (API). When an AI-agent is incapable of determining the result, it can pause natural language processing and instead generate strictly formatted code that allows external communication through an interface to a tool, such as a web search or a programming interface. In this way, the system as a whole can generate and execute computer code and retrieve the results that have been generated by the external computational tool. When the results of the function call are received, they are then incorporated into the natural language output of the Chatbot. The ability to solve unstructured problems and potentially self-correct are additional advantages of these automated systems.
A Philosophical Question
At least one important philosophical question arises concerning the functioning of chatbots. The AI companies are literally selling those agentic models as reasoning, however, is that reasoning in a human sense? Google Gemini, basing its response on human text, has produced the following response to that question:
“Without conscious understanding and intentionality, the AI is not reasoning; it is merely simulating reasoning … In computer science, “agentic reasoning” is a legitimate and accepted technical term used to describe a specific autonomous, iterative computational architecture. However, in philosophy, it is generally viewed as simulated reasoning.”
Unless we subscribe to a doctrine of functionalism (in the Philosophy of mind), it seems wise to accept that reasoning requires conscious understanding or its formal logical equivalent in computation, which agentic deep research chatbots do not possess. Although ‘reasoning’ might be an acceptable term of art for computer scientists, we should dismiss its use in the same way as we reject the words confabulation and hallucination, as I have argued previously.
At some point in the not so distant future, that view will likely have to be revised when chatbots are re-engineered to introduce formal logic as used in ‘good old-fashioned AI systems’. There are already new technical developments in AI that have the prospect of introducing continual learning, missing from present day chatbots. Improved learning might bring improvements in performance. Our views of chatbots will likely have to change.
Version 1.5
Easy Further Reading and Resources
1. Large language models, explained with a minimum of maths and jargon, by Timothy B. Lee and Sean Trott at https://www.understandingai.org/p/large-language-models-explained-with
2. An intuitive overview of the transformer architecture by Roberto Infante
https://medium.com/@roberto.g.infante/an-intuitive-overview-of-the-transformer-architecture-6a88ccc88171
3. See the 9 modules of Cohere LLM University.
https://cohere.com/llmu
4. ‘This is not the AI we were promised’ A Royal Society Lecture by Professor Michael John Wooldridge
https://www.youtube.com/live/CyyL0yDhr7I?si=Xgi7upJpt3A0Y39G&t=560
More Advanced Resources
4. There is a really excellent course of graphical videos about Neural Networks on Grant Sanderson’s 3Blue1Brown YouTube channel
5. Try typing in your own original prompt, try changing the attention heads and trying model characteristics in this very impressive graphic simulation of an LLM at https://poloclub.github.io/transformer-explainer/
RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models
A helpful video about agentic AI if you ignore the anthropomorphic language