AI: 3 Questions

  1. How much thinking can you outsource to AI?
  2. How does your mind retain mastery of AI, instead of letting AI get mastery of your mind?
  3. How do you keep up with all the latest tools and developments in AI?

Understanding Gen AI

The algorithm by which Generative AI programs like ChatGPT generate text is extremely difficult to understand, for anyone. If you really want to understand it, it will take a week or two of dedicated effort. Here are the best available sources on the Internet, for understanding GPT, the heart of ChatGPT, or the Generative Pretrained Transformer, which is what GPT stands for, in detail:

  • Andrej Karpathy’s nanochat, the source code for training and running a small version of ChatGPT.
  • A visual simulation and rendering of the Transformer model, from Georgia Tech.
  • The original paper that started it all, Attention is All You Need, from Google.

A hand waving explanation of the GPT algorithm follows – I believe this will be simpler for you than the above sources, if you’re interested and willing to spend a few minutes on it:

The goal of GPT is to predict the next word in a sequence of words, like “AI technology is a subfield of ….”, and then the next word after that, and so on, which in this case might be “computer” and then “science”. This simple rule results in the fantastic text generation capabilities of programs like ChatGPT, Claude, and Google Gemini.

The way this is done, for production Large Language Models (the programs that use GPTs to generate text) is that, ideally, all the text on the Internet is fed into a program which is a black box that observes and learns how the partial sentences in that text are completed by their subsequent words.

Now, these programs do not act on text directly, as they have been designed. The text is converted into numbers, the numbers are run through the core of the algorithm, and at the end of it, the numbers are converted back into text. A word or fragment of a word is represented by a unique number from a dictionary of all possible words or word fragments. This number is called a “token”. These tokens are then further organized into fixed length arrays of the numbers or tokens. Such arrays are called “vectors” in the language of mathematics. There are two types of these vectors in Gen AI: one which represents the “meaning” of a word fragment or word, which we’re representing as a numerical token (word fragments don’t have meaning to us, but this is how a Large Language Model reads meaning into its inputs), and the second type of vector encodes the position of a token in a passage. The first type of vector is called a “token embedding”, and the second type of vector is called a “positional embedding”.

These embeddings are then run through a sequence of structurally identical “Transformer Blocks”. A Transformer is a sequence of Transformer Blocks. A Transformer Block consists of two components: a complicated component which I’ll just explain, and a simpler component.

The complicated component is called the “multi-headed self attention” component. The “self attention” aspect of this deals with the fact that for the algorithm to understand a sentence completely, the relationship of every token in the sentence to every other token is represented and learnt. This is done by creating three vectors for every token: a vector that represents what the given token is looking for from the other tokens in the sentence, a vector that represents what the given token contains, and a vector that represents what the given token can share with other tokens. The first vector is called a “query”, the second vector is called a “key”, and the third vector is called a “value”. Thus a relationship between every token (word/word fragment) in a sentence with every other token is captured. Because there can be multiple contexts in which tokens relate to other tokens, for example verb-object, pronoun-noun relations, and so on, the self attention is captured multiple times, called multiple heads, thus multi-headed self attention. These multi-heads can run in parallel because each self attention head is independent of the other, which is why GPUs (Graphical Processing Units as from Nvidia), which are basically parallel processing chips for doing computations on numerical arrays are so effective with them and are in such high demand. This entire multi-headed self attention algorithm aims to capture all relevant relationships between all the words in a sentence – and judging by the success of ChatGPT, is pretty good at doing so.

The simpler component following the multi-headed self attention component is just an ordinary neural network, a very simple digital version of an animal’s network of neurons which expands it’s input vectors internally, transforms them in a “non-linear” way (meaning if the input is doubled the output is not doubled, but becomes something completely different, which is how animals recognize patterns), compresses the expanded transformed vector back to the original size, and passes it on to the next Transformer Block.

For generating text, at the end of the whole set of Transformer Blocks, there is a layer of units which calculate the probability or likelihood of each of the words in the dictionary being the next word in the given sequence, and either the word with the highest probability is chosen, or one of the higher probability words is chosen, depending on the setting.

That is how ChatGPT works.

The relevance of this for using, or prompting, ChatGPT is, that while ChatGPT will always give an answer to any prompt, the precision and relevance of the answer will vary depending on the prompt. It will vary based on the sequence and content of words that constitute your prompt. To maximize your chances of getting a good answer, you should include all the content which is relevant to your answer – prompts are sometimes pages and pages long, while pursuing at the same time the somewhat contradictory goal of keeping your prompt concise in the sense of not putting anything which is not relevant. This comes from the core of the Transformer mechanism described above, the multi-headed self attention mechanism, which indicates that every word you type in your prompt matters, since every word in the prompt attends to every other word in that prompt. This is if you’re seeking goal oriented answers. If your prompt is “Write a poem about love”, and you’re just exploring ChatGPT or your favourite Gen AI program for the sake of it, there’s nothing wrong with that either.

There are some accepted best practices for prompting – meta-prompting, few-shot, chain-of-thought, etc., which, frankly, I am not going to talk about just now, since I haven’t done a scientific evaluation of them in the sense of testing the “quality” of the output with or without the best practice across a sufficiently exhaustive set of cases. The easiest best practice is asking my LLM to generate a prompt for itself, based on a sufficiently detailed desciption of the prompt, and then modifying it. This again, is for serious uses of LLMs, such as for coding, making business decisions, and getting medical responses.

Finally, hallucinations are a known and documented problem of LLMs – when they make up facts and state them with confidence – but a lesser talked about problem is that LLMs by default respond with the tone and authority of a college textbook or a scientific paper, even though the results may not be 100% on target, while not being completely hallucinated. It is increasingly difficult to contradict a response from an LLM, but maintaining critical human judgement is vital. At a good college, they teach you how to critique the works of the greatest masters, and that skill is very important today.

Air Pollution in New Delhi

New Delhi is the most polluted city in the world. It is unclear that any of the measures being undertaken by the government to tackle the sources of air pollution in the city in winter, when the pollution is at its peak, are of much effect. Surprisingly, there are no home grown companies that make innovative, effective, cheap, popular air pollution masks for the general population, despite the city being host to the famous Indian Institute of Technology, Delhi.

This is a problem in the “design” space. We need a better designed anti-pollution mask.

The problem with the classical N-95 mask is that it often traps the air you breathe out, containing carbon dioxide, between the mask and the nose and mouth, which gives one a headache after some time. It also covers the whole lower face, making it unstylish, making people not want to use it. And the overall texture of the current generation of masks is not very becoming. As of this writing, there are some transparent face masks on the market, but these are unavailable on Amazon. There is an active (has a fan that pumps air into the mask) air pollution mask from Prana Air, (https://www.pranaair.com/in/air-mask/) except that it has poor reviews on Amazon for fit, comfort, and general quality, and is also expensive. But it’s a step in the right direction. One problem with these masks is that there is no clear guidance on when to discard them or replace the air filters. Ideally, the color of the mask should change when it’s time to discard it. Also, they don’t come with a “carry case” because one generally wears them outdoors and takes them off when one comes indoors, and then wants to wear them again when one goes outdoors.

We need a new generation of masks, with the corresponding fundamental research and development to support it. Given the scale of the problem, the investment is worth the effort. One solution is a “respirator”, and these are available, but not readily for the general Indian market, which would be a silicone, ideally transparent mask which could have one or a pair of small, replaceable, inexpensive, filters that one is actually likely to replace every day. The filters should be readily available on the market. A high quality, well designed product, combined with sophisticated marketing (not – how to sell it, but – what would make people likely to buy it), could turn this into a blockbuster product.

Update: A few months after writing the original post, I have in fact discovered a home grown solution to the problem, by, appropriately, an IIT Delhi spin off itself, Nanoclean. The product was developed in collaboration with IIT Delhi. It’s an innovative, patented product. It is a air pollution filter which one applies discreetly on one’s nostrils as a “sticker”, and it is an effective candidate solution, though it can be improved upon in terms of ease of breathing and the overall quality control by the company.

Human Machine Integration and the Meta AI Glasses

Meta’s new Ray-Ban AI glasses are the next step in Human Machine Integration. I’ve read about them, and am planning to try them out once they become cheaper and better. The most telling fact about them is that they have a full color display right in front of your right eye. I am a techno-enthusiast, not a Luddite, but all of us are more than aware of how technologies such as cellphones – which are now assumed to be carried by everyone by governments and businesses – are programmed to make them so addictive by getting them to produce pleasure inducing dopamine bursts in the brain when one uses them, that it’s impossible to put them down. Social media companies like Meta have invested heavily into this kind of programming. As these technologies penetrate more deeply into our minds and bodies, the only solution to their dehumanizing side effects is competing and more compelling technologies which become our servants, and not otherwise.

The Age of Spiritual Machines

Ray Kurzweil, in his book The Age of Spiritual Machines published in the year 2000, made the astonishing prediction that by approximately the year 2020, computers would exceed human intelligence. Fast forward to 2025, and we have ChatGPT 5. While OpenAI does not call it Artificial General Intelligence, I find it close. It gives plausible answers to any question you ask it, on any subject under the Sun, and a large majority of its answers are quite, or completely, accurate. Kurzweil is a technologist and futurist, and what is more interesting about his book is the other predictions he makes. He talks about the possible properties of aliens from another planet visiting our Earth, nanobots, which are self replicating tiny nanotechnology based robots, human machine integration, the end of mortality, and so on. All this before the end of the 21st century. I haven’t yet gotten around to reading his 2006 book, The Singularity is Near: When Humans Transcend Biology, but I’m looking forward to it.

My Current Favorite Unsolved Problem in Computer Science

My current favorite unsolved problem in Computer Science is … Search. Google has not solved search. OpenAI is getting better. There are billions of pages on the web, and I need a way for the web to answer the search query: “Show me all the pages that I might be interested in seeing”. Nicholas Negroponte, founder of the Media Lab at MIT, envisioned a digital newspaper called “The Daily Me”, which would be an expert on me, and everyday show me all the newspaper articles that I personally might be interested in reading. Negroponte was talking in the 1990s on the basis of his insight into what computers should, ideally, be able to do, but Machine Learning based Artificial Intelligence technology actually makes this possible, and indeed there are currently electronic newspapers available along the lines of The Daily Me. I need a similar service for the web. Right now, searching for subtle web pages is a painful process, and I need a search engine that can give an answer to my query: Show me all the pages that I might be interested in seeing, given everything you know about me so far.