When Atoms meet Bits (68/115)

Technology Vision 2023 | When Atoms meet Bits #T#TechVechVisionision Future advances in foundation models While foundation models have already pushed This can mean building standalone foundation technological boundaries, it’s important to recognize models for new kinds of data. Meta, for instance, that the 昀椀eld is changing quickly. Stanford’s 2022 developed a protein-folding model—a large Arti昀椀cial Intelligence Index Report found a signi昀椀cant language model that learned the “language uptick in the annual number of global AI publications of protein”—that accelerated protein structure 187 190 since 2017. According to CB Insights, 2022 saw predictions by up to 60x. And a research team 97% record investments in generative AI startups, with from the University of Texas at Austin, the Indian 188 $2.6 billion in funding across 110 deals. And in Institute of Technology Madras, and Google of global executives early 2023, Microsoft announced the next phase of Research, proposed Generalizable NeRF Transformer agree AI foundation its partnership with OpenAI through a multibillion (GNT), a transformer-based architecture for NeRF models will enable 189 191, 192 dollar investment. To truly understand the impact reconstruction. A NeRF (Neural Radiance Field) is connections across data foundation models will have on their industries and a neural network that can generate 3D scenes based businesses, companies need to carefully track new on only partial 2D views—and experimenting with pes, revolutionizing developments. transformers to generate 3D data like this could have where and how big metaverse implications. AI is used. One of the most signi昀椀cant ways foundation models are evolving has to do with the data types they’re Other organizations are working to incorporate more trained on—which right now are limited. Most of data types into a single model. Take Microsoft’s today’s foundation models are large language Florence, a foundation model built for general- 193 models trained on natural language, and even purpose computer vision tasks. While it was trained multimodal models are typically language-and- on a large data set of image-text pairs and has only image only. But some are working to expand to more a two-tower architecture, combining one language data modalities. encoder and one image encoder, its creators

When Atoms meet Bits Page 67 Page 69