December 5th Big Model Daily Collection

News1years go (2023)update AIWindVane

182 0 0

[December 5th Big Model Daily Collection] Five times throughput, performance fully surrounds Transformer: new architecture Mamba detonates the AI circle; the big model version of “5-year college entrance examination and 3-year simulation” is here! 6141 math questions, still multi-modal | Jointly produced by Microsoft & UCLA & UW; making 3D editing as easy as PS, the new algorithm GaussianEditor can complete the addition, deletion and modification of 3D scenes in a few minutes; a certain “new compound” existed 90 years ago ? A professor from University College London questioned the “A-Lab” in which DeepMind participated; the startup was valued at US$2 billion in 180 days! The European version of OpenAI explodes in popularity, Llama’s creators set up their own business, and NVIDIA has become a shareholder

Five times throughput, performance fully surrounds Transformer: new architecture Mamba detonates AI circle

Link: https://news.miracleplus.com/share_link/12440

In other fields, if you want to describe something as very important, you might describe it as “supporting half of a certain field.” But in the field of large AI models, the Transformer architecture cannot be described in this way, because it almost supports “the entire country.” Since it was proposed in 2017, Transformer has become the mainstream architecture for large AI models. However, as the scale of the model expands and the sequences that need to be processed continue to grow, the limitations of Transformer have gradually become apparent. An obvious flaw is that the calculation amount of the self-attention mechanism in the Transformer model will increase squarely as the context length increases. For example, when the context increases by 32 times, the calculation amount may increase by 1000 times, and the calculation efficiency is very low. In order to overcome these shortcomings, researchers have developed many efficient variants of the attention mechanism, but this is often at the expense of its effectiveness. So far, none of these variants have been proven to be effective in different areas. Recently, a study called “Mamba” seems to break this situation.

Animated video generation has become popular these days, and the new framework makes pictures move.

Link: https://news.miracleplus.com/share_link/12441

A few days ago, the Alibaba research team built a method called Animate Anyone, which only requires a photo of a person and is guided by skeletal animation to generate a natural animated video. However, the source code for this study has not yet been released. In fact, the day before the Animate Anyone paper appeared on arXiv, the National University of Singapore Show Lab and Byte jointly conducted a similar study. They proposed MagicAnimate, a diffusion-based framework designed to enhance temporal consistency, faithfully preserve reference images, and improve animation fidelity. Moreover, the MagicAnimate project is open source, and the inference code and grdio online demo have been released.

Make 3D editing as easy as PS, the new algorithm GaussianEditor can complete the addition, deletion and modification of 3D scenes in a few minutes

Link: https://news.miracleplus.com/share_link/12442

3D editing plays a vital role in fields such as games and virtual reality. However, previous 3D editing suffered from problems such as long time consumption and poor controllability, making it difficult to apply to actual scenes. Recently, Nanyang Technological University, Tsinghua University and SenseTime proposed a new 3D editing algorithm, GaussianEditor, which for the first time achieved controllable and diversified editing of 3D scenes in 2-7 minutes, completely surpassing previous 3D editing work.

The large model version of “5-year college entrance examination and 3-year simulation” is here! 6141 math questions, still multi-modal | Jointly produced by Microsoft & UCLA & UW

Link: https://news.miracleplus.com/share_link/12443

The large-model “5-year college entrance examination and 3-year simulation” math questions are here, and it’s an enhanced version! Microsoft, the University of California, Los Angeles (UCLA), and the University of Washington (UW) jointly create a new multi-modal mathematical reasoning benchmark data set. It’s called “MathVista”. It covers a total of 6141 questions of various question types, derived from 28 existing multi-modal data sets and 3 newly annotated data sets.

A “new compound” existed 90 years ago? University College London professor questions DeepMind’s participation in “A-Lab”

Link: https://news.miracleplus.com/share_link/12444

Last week, a team of researchers from Google DeepMind and the University of California, Berkeley published a highly anticipated paper in Nature proposing an “autonomous laboratory” – A-Lab – designed to harness AI and robotics. Technology accelerates the discovery and synthesis of new materials. Dubbed the “autonomous driving laboratory,” A-Lab demonstrates an ambitious vision that when equipped with the latest technologies in computational modeling, machine learning, automation, and natural language processing, artificial intelligence-driven systems can be used in scientific research. What objectives are achieved in the research. Within days of publication, however, doubts began to arise about some of the key claims and results made in the paper. Robert Palgrave is Professor of Inorganic Chemistry and Materials Science at University College London (UCL). He has decades of experience in X-ray crystallography. Palgrave raised a series of technical concerns on X, formerly known as Twitter, after he noticed inconsistencies in the data and analysis that were evidence of A-Lab’s supposed success.