When Atoms meet Bits (73/115)

Technology Vision 2023 | When Atoms meet Bits #T#TechVechVisionision Multimodal foundation models’ ability to recognize developed FLAVA, a multimodal foundation model multiple data types and identify the relationships that works across dozens of tasks; and it built between them is also pushing the envelope of what Omnivore, a model that can operate across images, AI is capable of, enabling powerful new systems. video, and 3D data, doing things like detecting GPT-4, for example, is multimodal and accepts both content in both videos and images.207, 208 ough multimodal image and text inputs, meaning that if someone were to show it a picture of the inside of their Though multimodal foundation models are still foundation models are refrigerator, it could correctly identify the items relatively few, and most of them are text-to-image still relatively few, and inside, suggest meals that can be made with those generators, it’s exciting to imagine the possibilities most of them are text- ingredients, and then provide step-by-step cooking we’ll have in the future. What will we be able to do to-image generators, instructions.206 when multimodal models connect text, sound, image, video, 3D spatial data, sensor data from it’s exciting to imagine And Meta has long seen the value of an AI system industrial equipment, environmental data, or many the possibilities we’ll that can interpret content on its platform—especially other types of data? Early opportunities may start have in the future. when it comes to detecting hate speech. But this is with generating marketing images and ad copy but a task that’s historically been dif昀椀cult for machines could grow into sophisticated autogenerated code because people tend to communicate in multimodal and new ways to search and access information. ways on these platforms (using text and image Analysts might use language to ask an AI system together to tell a joke, for instance). So Meta has to describe patterns across thousands of satellite launched a series of foundation and multimodal images. A piece of industrial equipment might AI projects to help them analyze different types of use an AI system to translate data from dozens of communication—like text, image, and video—in sensors into a repair procedure for a mechanic. Or conjunction. The company created the Hateful multimodal AI might help drastically improve the Memes dataset to address the shortage of publicly path planning and performance of robotic arms. available training data for classifying memes; it

When Atoms meet Bits Page 72 Page 74