What Large Models Buy Us?
Potential for progress with visceral experience of the current state of AI
Up until the introduction of transformers1, we had classifiers and detectors with hundreds of layers. These deep networks do their job of classifying and detecting cars and people and so forth very well.
The beginning of this progress started in 2012 with a deep neural net beating all other traditional CV methods and models with a substantial margin in the ImageNet competition. Since then, we have improved upon that initial network by leaps and bounds.
These methods and techniques have now improved the performance of the original AlexNet so much so that the competition itself as it stood was deemed unnecessary anymore2. That is progress!
With the advent of transformers and large language models (LLM)s we have stepped into yet another territory. A territory in which the models could be interrogated interactively and more freely.
These models give us something we have never seen before: an interactive sandbox to poke and prod the system and uncover its flaws as well as understand and appreciate their abilities. This gives us an important lever to check our progress. We can now write a sentence in free form and expect the model to spit out a response that appears to be coherent (OpenAI’s GPT-3), we can write a sentence and the model generates images that seem to be compelling at a first glance (Dall*E, Imagen, etc.). This now allows a more thorough investigation of the capabilities and shortcomings of the current state of affairs in AI. That is progress as well!
There are many responses when one criticizes the output of these new-age models such as:
They range from “the model only learns what it is given as training data”, to “human artists make art that does not always physically plausible, why are you criticizing these outputs?” on and on it goes.
On the contrary to these objections, I think such interrogation and questioning of the claims of what the models are capable of doing should be encouraged! There are many posts and opinion pieces one can find about the good these models bring us. On the other hand, there is a number of posts about how these models fail at many tasks and are in no way comparable to general intelligence. Both views and sides have legitimate arguments. In my view, the biggest contribution these models bring about is the debate that it allows to take place due to the very tangible and visceral responses it generates. I think that is progress!!
https://arxiv.org/abs/1706.03762
https://image-net.org/static_files/files/imagenet_ilsvrc2017_v1.0.pdf