Andreas Braun, the Chief Technology Officer of Microsoft Germany, has confirmed that GPT-4 will be released in the week of March 9, 2023. This latest version is expected to be a multimodal AI model capable of processing various types of input, including video, images, and sound.
GPT-4 Released on March 14, 2023
On March 14, 2023, OpenAI unveiled GPT-4, a multimodal model that can process both image and text prompts. For more information, you can read the official announcement on the OpenAI website.
When it comes to machine learning, "modal" refers to the different types of input that a model can handle. Multimodal models have the ability to process text, speech, images, and video. Unlike its predecessors, GPT-3 and GPT-3.5, GPT-4 can operate in multiple modalities.
According to a German news report, GPT-4 may be capable of processing input in four modalities: images, sound, text, and video. Dr. Andreas Braun stated, "We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos..."
The report did not provide specific details about GPT-4's multimodality, leaving some ambiguity as to whether the information shared applied specifically to GPT-4 or was more general in nature. Microsoft Director Business Strategy Holger Kenn mentioned multimodal AI, but it remains unclear whether his remarks were about GPT-4 or multimodality in general.
Another notable development is Microsoft's work on "confidence metrics" to enhance the reliability of their AI systems by grounding them with facts. This demonstrates their commitment to making AI more dependable.
In early March 2023, Microsoft released a multimodal language model called Kosmos-1, which received less attention in the United States. According to a German news site called Heise.de, Kosmos-1 was subjected to various tests and performed well in tasks such as image classification, image-based question-answering, image labeling, optical text recognition, and speech generation. Notably, Kosmos-1 excelled in visual reasoning, which involves drawing conclusions from images without relying on language as an intermediate step. Kosmos-1 integrates the modalities of text and images.
GPT-4 goes a step further by incorporating a third modality, video, and possibly sound as well.
Works Across Multiple Languages
GPT-4 appears to support all languages. It can process questions in one language, such as German, and provide answers in another language, like Italian. Although this may seem like an unusual scenario, the breakthrough lies in the model's ability to transfer knowledge across different languages. If a question is asked in German but requires an answer in Italian, GPT-4 can recognize this and provide the answer in the language it was asked.
This capability aligns with the goal of Google's multimodal AI called MUM, which aims to provide answers in English for data that only exists in other languages, such as Japanese.
While there haven't been specific announcements about where GPT-4 will be implemented, it is expected to be integrated into Azure-OpenAI. This development puts Google in a challenging position as it strives to incorporate a competing technology into its own search engine. This further reinforces the notion that Google is lagging behind and lacks leadership in consumer-facing AI.
Google already incorporates AI into various products, such as Google Lens and Google Maps, to assist users with their tasks. In contrast, Microsoft's approach is more visible, capturing attention and solidifying the perception that Google is struggling to keep up.
For the official announcement on the release of OpenAI GPT-4, you can visit this link.
Read the original German article reporting the news:
GPT-4 is coming next week – and it will be multimodal, says Microsoft Germany