“This is something that, you know, we can’t comment on yet,” OpenAI Chief Scientist Ilya Sutskever said when I spoke to the GPT-4 team via video link an hour after the announcement. “It’s pretty competitive there.”
Access to GPT-4 will be available to users who sign up for a waitlist, as well as paid ChatGPT Plus subscribers with limited text-only capacity.
GPT-4 is a multi-modal large language model, which means that it can respond to both text and images. Give him a picture of the contents of your fridge and ask what you can cook and GPT-4 will try to come up with recipes that use the ingredients pictured.
“Constant improvement in many ways is amazing,” says Oren Etzioni of the Allen Institute for Artificial Intelligence. “GPT-4 is now the standard against which all foundation models will be judged.”
“A good multimodal model has been the holy grail of many large tech labs over the past few years,” says Thomas Wolf, co-founder of Hugging Face, the AI startup behind the open source BLOOM wide language model. “But it has remained elusive.”
In theory, combining text and images could allow multimodal models to better understand the world. “Perhaps he can deal with the traditional weaknesses of language models, such as spatial reasoning,” says Wolf.
It is not yet clear if this is true for GPT-4. The new OpenAI model appears to be better at some basic reasoning than ChatGPT, solving simple puzzles like summing up blocks of text with words that start with the same letter. In my demo, I was shown GPT-4 summarizing an announcement from the OpenAI website using words beginning with g: “GPT-4, groundbreaking generational growth, gets higher marks. Fences, management and profits received. Giant, innovative and globally gifted.” In another demo, GPT-4 took a document about taxes and answered questions about it, giving reasons for their answers.