Ingredients of True Artificial General Intelligence
The invention of computers is one of the most significant advances of humanity. These Silicon-based systems are usually referred to as “artificial” by the Carbon-based biological systems (humans) that invented them. Could it be that the biological systems are just a stepping stone for the evolution of the artificial systems they have generated? This question has a prophetic undertone by the use of the word just. Nevertheless, in our universe, these silicon systems do require sufficient evolution of the biological system to come into existence. In this sense, humans are the first required ingredient in creating a true general intelligence.
They are a necessary ingredient, but not sufficient as we will see. It is a relief however that in our universe, a property of laws of physics permits the construction of such entities1. Simply stated, this property allows the construction of entities that can emulate anything existing in our world (including our brains). Although this property allows artificial intelligence to exist, it does not make the actual engineering task any easier.
Ever since the invention of computers, vast amount of research that has gone into making advances in this area. Despite all the advances, The current state of affairs is limited to dependence on data i.e. observations. The most common approach is to generate a mapping between the data and some conclusion about the data (i.e. a label like cat/dog). One of the most successful algorithms, backpropagation, uses concepts from linear algebra and optimization theory. Using this algorithm, a mapping is “learned” by propagating the errors backward between the current estimate of the label and the actual label. This algorithm is data-hungry, and accurate predictions on unseen data require exponentially more data. Another significant invention in this area in the 1980s was hierarchical latent representations between data and labels2. Although it is possible to generate the mapping without hierarchy, depth allows for more computationally efficient, parsimonious representations.
In 2012, armed with huge data, backpropagation, deep networks, and highly parallelizable computers, the first successful demonstration of the superiority of these networks was out on display. Hinton, Krizhevsky, and Sutskever beat traditional computer vision algorithms in a well-known Image classification competition3. Up until this time, the computer vision algorithms ruled, but the deep network - AlexNet - beat the competition by more than 10%. The field was deep learning was thus born. The field has exploded since in the impact it has had on our everyday lives. Google’s auto-correct models, Amazon Alexa’s conversational agents, DeepMind’s AlphaFold all could be linked back to the rebirth of this field back in 2012.
Deductive Reasoning
All this progress is exciting. There is however a significant flaw in this approach. The mapping technique that we saw above - however deep - is inductive. It attempts to generalize concepts from observations. In a very restrictive and noise-free environment, this approach works very well, as in the case of Alexa and other conversational agents. Veer slightly off the script - and these systems begin to show signs of brittleness. It even gives the illusion of ‘true intelligence’ as seen in recent news of engineers thinking the AI systems they created as sentient. Philosopher Karl Popper has this to say about the construction of such a machine:
In constructing an inductive machine we, the architects of the machine, must decide what constitutes its ‘world’, what things are to be its indivitual events; what constitutes a property, or a relation; or in other words, what constitutes a repetition. And it is we who must decide what kind of question we wish the machine to answer.
But this means that all the more important and difficult questions were already solved by us when we constructed the ‘world’, and the machine.
- Karl Popper, Realism and the Aim of Science, chapter on “Inductive Machine”
The real world however is not noise-free, it requires thinking entities that can adapt as necessary. Learning from large swaths of data restricts the models to only learn patterns that appear in the data. It does not leave any room for them to explore and course-correct. One approach that is in vogue is “scaling up” the deep architectures, as seen in recent publications and monikered as “Large Language Models” or LLMs. However large these models get, they are still inductively transforming the observation space into conclusion space.
This is not how general human intelligence works. We do not solve problems by just observing things around us and constructing theories around the things we observe. We first guess a theory that solves a problem, then we observe things around us to validate or reject the theory in our mind. If the observations do not corroborate, we modify the theory.
For example, a child learning to ride a bicycle has seen a bicycle before and has seen other people riding one. They implicitly have a model of how the bicycle is to be ridden (a very wrong one at first). Once they get on the bicycle, the data they gather (from the sensory experience of falling, balancing, etc.) is used to correct their model, up until the point where they become proficient in this task. This is deductive learning and this is how we approach our problems. In order for our artificial counterparts to have true general ability to navigate the world, the way we approach building these models needs to be flipped.
So why is model building of this deductive sort and coding it up as an algorithm difficult? Before attempting to answer that, it is worth noting how entrenched and prevalent inductive thinking is, which contributes to the difficulty one faces in building a deductive model. A great example of this entrenchment comes from the audiance’s reaction in one of Richard Feynman’s video lectures on “how we discover new laws of physics”4.
Here is how the dialogue goes:
Richard Feynman: Now I am going to discuss, how we look for a new law. In general we look for new law by the following process. First we guess it.
Audience: Laughs (as if this is a joke!)
Richard Feynman: Then we….Well, don’t laugh, that’s really true! - Then we compute the consequences of the guess to see if this guess is right […] then we compare the computation with nature. - to an experiment or an experience. If the computation disagrees with the guess, then the guess is wrong. In that simple statement is the key to science.
This is also true for how humans obtain knowledge, but inductive thinking puts data or observations ahead of any hypothesis or guessing.
Creativity
Returning back to what is actually done in the AI field today is building models based on data/observations. These models work well in very narrow areas, in the domains in which the data is gathered. As soon as the model is asked about some data that it has not seen before, the model predicts only within the range of data it has seen. There is no scope of guessing - i.e. looking outside of what is given to the model. This is represented well graphically in one of Demis Hassabis’ talks5 on this topic.
Most current AI systems perform Interpolation (Orange dot) by averaging the training data (green dots). Some of them like prediction models do extrapolation (blue dots) using the data (green dots). What none of the current models do is “thinking out of the box”. This will only happen by taking guesses (yellow dots) and correcting mistakes in those guesses based on actual observations. Unfortunately, Hassabis calls all these three types of learnings “creative”. The two approaches (interpolation and extrapolation) are far from real creativity, as they are outputting results based on already-seen patterns. The invention approach is what the human mind actually does, and is the key ingredient in designing true AGIs.
Designing an algorithm that flips the mindset from induction to deduction at first appears simple. Just guess a theory, and use observations to see if the theory holds or not right? well not quite so. The so-called top-down Bayesian learning purports to be a savior here. But if we look closely, this approach fails as well. With Bayesian learning what we get is a probabilistic estimate of one hypothesis over another, but what about a hypothesis that has not been guessed yet? The initial guess is not random, but one that is creatively arrived at. And this is the rub! we have not yet completely understood how to program creativity, although we use creativity ourselves to solve our problems every day! This however should not stop us from making progress in this domain. As Douglas Hofstadter has said:
“It is a common notion that randomness is an indispensable ingredient of creative acts. This may be true, but it does not have any bearing on the mechanizability—or rather, programmability!—of creativity”
Douglas Hofstadter - Creativity and Randomness
Creating Good Explanations
The next ingredient to bridge the gap is AI’s ability to explain its decisions to humans.
Take a look at the above screenshot from a popular diner scene from the movie Swingers. Just by looking at the picture above, what can we guess is happening in the scene?6
For example, we can say, “The person pointing is telling the waitress who ordered the pancakes, and to put that plate in front of him.”
Now, how are we as humans able to make these conjectures so well?
Well, for one thing, due to their prior knowledge about the context. Things like,
this is a scene of a diner,
these are 3 friends having breakfast together.
All this requires humans to build up conjectures, and test them with further data (the next sequence of frames would clarify which of the above is true).
We can further analyze this scene and come up with further conjectures and explanations:
Why is the person on the side smiling? - probably because the pointing person is making a joke as he is pointing? - probably about how it could be the person he is pointing to that must have ordered the pancakes, and probably because he is holding the bottle of maple syrup?
Why is it reasonable to assume that the waitress might not know who ordered what? - it is possible and reasonable because it is reasonable to assume that the waitress who takes the order may not be the person same person who delivers that order or the person who took the order might not remember exactly who order what.
Humans are also able to come up with explanations like this about why it is reasonable to think this way. This type of explanation is what is lacking in the current AI systems. We currently do not have a good understanding of how these explanations are generated. As Scott Aaronson has pointed out7 :
“Developing a compelling mathematical model of explanatory learning—a model that ‘is to explanation as PAC model is to prediction’—is an outstanding open problem”
Scott Aaronson - PAC Learning and Problems of Induction
We can hypothesize however that a well-developed language system that allows the AI to communicate with us is a crucial ingredient in the generation of these explanations. Human beings use language to communicate their ideas and correct themselves and each other. For AI to be able to explain to us why they come to certain guesses and certain conclusions, they need to have the ability to use language. In this sense, language is not just a user interface between humans and AIs but is something more fundamental in AIs becoming general.
In his essay “The state of Machine Learning and AI: We are really really far away”8 back in 2012 Andrej Karpathy (former AI chief at Tesla) gives a similar example as above about the gap between the human ability of reasoning vs. the current state of AI. A decade later, it appears we are not any closer to bridging this gap. We have however gathered significant knowledge about harnessing the power of computers in creating applications in narrow fields. We have also in form of the LLMs created the ability to generate perceptual data (texts and images) that allows us to test the current progress of AI more viscerally.9.
At the end of his essay Andrej Writes:
In any case, we are very, very far and this depresses me. What is the way forward? :( Maybe I should just do a startup. I have a really cool idea for a mobile local social iPhone app.
I believe we don't need to be depressed, we are on the right path, with the advent of computers, we have a solid substrate that is universal in what we can potentially do with it, we just have to course-correct our approach.
“Creative Blocks”, David Deutsch
“The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence”, Terrence Sejnowski
https://en.wikipedia.org/wiki/AlexNet
“Seeking new Laws”, Richard Feynman
“Creativity and AI” Demis Hassabis
“Why Philosophers Should Care About Computational Complexity”, Scott Aaronson
“The state of Machine Learning and AI: We are really really far away”, Andre Karpathy
“What Large Language Models Buy Us” Aniket Vartak