Notes on "AGI: Could The End Be Nigh"
Notes during this thought podcast episode of Increments by Ben and Vaden with Rosie Campbell
These are notes taken during the latest episode of the Increments podcast. Listen to it in full here
How can AGI (AI) cause catastrophe?
Two major ways the AI/AGI could cause the catastrophe.
Capability
bad human programmers write malicious programs and use them to cause harm
Where existing narrow AIs are used maliciously by humans, for eg computer viruses. However, the narrow AI are computer programs which if built with sound engineering principles should go thru the usual software development process which involves cycles of development, testing, deployment, bug fixing, etc. AGI Safety community discounts this realm of risk and focus on the next one mostly.
Propensity
Human programmers write programs i.e. train deep models - with human-generated data - and those programs figure out how to not follow instructions in those programs (doesn’t make sense!?)
Could AGI “want” to harm humanity? - Example is given where the AI is asked to find the cheapest flights on the internet. It would take my preferences into consideration (my likes, and dislikes) and try to get the cheapest flight - but it is contended that if anything comes in its way to serve that goal “find cheap flights” - for example if it detects that turning the PC this software is running is going to be turned off - it might take actions to prevent that?!
Having a “model of the world” is thrown in many times to make arguments that the program can now disobey the instructions embedded in its weights.
It is contended that this could happen through Reinforcement Learning (RL)? - Seems unlikely based on the fact that RL still works off of the data that is given to it. Also, these systems are “very constrained in their capabilities”.
Can the above problem of “bad” AIs trying to do things with whatever “whatever it takes” attitude (this itself is not very clear would come about, just out of the training data it is provided), be countered by “good” AIs keeping them in check?
Are the above two types of catastrophes different?
Rosie contends that these are not two types and that is where the problem in understanding these risk lies. A system may develop an internal goal due to misspecification or wrong reward function - and if it is intelligent enough to know that if it reveals it, it might not get deployed in the real world, then the system might try to hide these displays of behaviors in the testing phase.
My counter-argument: This is not how engineering development and deployment work. There is a secondary way to validate if a current system under development is behaving in an intended way. This is what hundreds of engineers spend their hours at their job working on, “integration, validation and testing”. Additionally, each subsystem of the software system is tested like this individually. It is akin to believing in magic if we say that the software system is going to “sneak in” its sub-system that fools the testers. If this is what we mean by AI Safety then yes, we absolutely need this kind of “safety”, like any other complex engineered system meant to be used by billions of people.
Are we worried about the current capabilities of ChatGPT being malicious?
OpenAI’s definition of AGI “It can do most economical work better than most humans”, which is in my view is actually not making any distinction to the definition of AI.
Does scaling up ChatGPT have the possibility of causing harm? - How does data play into this?
The real risk happens - not when the LLM is simply outputting text - but it can start taking actions (i.e. embodiment) and has a “world model”.
It is contended by Rosie that newly generated training data can help in creating a new step-change in the current ChatGPT architectures. Vadan says that these things are basically “dumb pattern matches” and no extra amount of data is going to give them this step-change advantage.
Why throwing more data at the model is not going to generate new knowledge? - this is “because of Popper” - that is Popper’s explanation of how real knowledge generation works by first making a guess of what the basis of the data is, and then observing the data. Humans are able to do this because of the millions of years of evolution that have given them the genetic proclivities that allow them to conjecture new solutions and test them and this is how new knowledge is generated.
How can we test the conjecture that, the LLM/ChatGPT models actually can perform tasks that are truly creative?!
One way to test this is to retrain one of these models with all the data before some invention (like the invention of the light bulb, or all data just before Einstein’s Relativity) is made, and test if such inventions come out of such trained models - it is highly unlikely that this will happen.
Is it fundamentally possible to program creativity?
Yes, given we have a working example (ie. humans) of this, it IS possible, it’s just not possible by throwing more data at the models. This is because of the fragile ways these models are trained using inducive methods. Something philosophically different is required to bring about this change!. These inductive ways do not create new knowledge. Only 2 ways we know of so far where new knowledge is created - 1. biological evolution (slow) 2. human creativity. We have not seen any evidence so far of new knowledge emerging from neural networks.
The current AI outputs can not be used in isolation, and human intervention is absolutely necessary to cause harm or to use the output for creative acts.
“Your mom is a statistical pattern matcher!” - NOT
Scott Alexander’s article is mentioned where in response to someone’s comment that “these things are just statistical pattern matcher” he responds by saying “Your mom is a statistical pattern matcher!”. Joke’s aside, this is utterly wrong. Humans are NOT just statistical pattern matchers!
Vadan sneaking in “You are Universal Explainer, and you know it!” - hahaha
What happens when we nerf the LLM/ChatGPT model by redacting stuff that is toxic/obnoxious?
The short answer, the resulting model is not as good as when these things are included. Humm, interesting!