Big Model Daily on December 21st

News1years go (2024)release AIWindVane

179 0 0

[Big Model Daily on December 21st] Creating a new universal 3D large model, VAST brings 3D generation into the “second level” era; Google Gemini technical report is released, with more than 900 authors; GPT-4 engages in scientific research and successfully reproduced in 4 minutes Now a Nobel Prize winner in Nature; “Transformer challenger” Mamba can run on a Macbook! GitHub gained 500+ stars in half a day

Creating a new universal 3D large model, VAST brings 3D generation into the “second level” era

https://news.miracleplus.com/share_link/13969
Generative AI in 3D has been waiting for its “ChatGPT” moment for a long time. Traditional 3D modeling involves many industries such as games, film and television, and architecture. It generally relies on manual operations by professionals. The production cycle can be as short as a few days or as long as months. The creation cost of a single 3D model requires at least several thousand yuan. The successful experience of generative AI technology in the field of 2D image generation has allowed people to see the potential of AI in revolutionizing 3D modeling. A trillion-dollar track seems to be starting from now on, but the current 3D generation AI technology on the market still has various shortcomings, and everyone is looking forward to a product that will make them shine. VAST’s self-developed 3D large model Tripo, when the classic “avocado armchair” is quickly and smoothly generated in an exquisite form, generative AI has ushered in another milestone moment.

4090 to replace A100? The token generation speed is only 18% lower than A100, and the big inference engine is popular

https://news.miracleplus.com/share_link/13970
The Shanghai Jiao Tong University team recently launched PowerInfer, a super powerful CPU/GPU LLM high-speed inference engine. How fast is this inference engine? Running LLM on a single NVIDIA RTX 4090 GPU, PowerInfer’s average token generation rate is 13.20 tokens/s, with a peak of 29.08 tokens/s, which is only 18% lower than the top server A100 GPU and can be applied to various LLMs. Not only that, compared with the most advanced local LLM inference framework llama.cpp, PowerInfer achieves more than 11 times acceleration when running Falcon (ReLU)-40B-FP16 on a single RTX 4090 (24G), while maintaining the accuracy of the model. sex. Specifically, PowerInfer is a high-speed inference engine for on-premises deployment of LLM. Unlike those using multi-expert systems (MoE), PowerInfer cleverly designs a GPU-CPU hybrid inference engine by exploiting the high degree of locality in LLM inference.

Google Gemini technical report released, with more than 900 authors

https://news.miracleplus.com/share_link/13971
Two weeks ago, people were excited about Gemini, the “native multi-modal large model” proposed by Google. Its claimed powerful performance beyond GPT-4 and its ability to understand images, videos and other fields made people seem to see the future. However, Gemini quickly fell into controversy because the demo demonstrated by Google was suspected of exaggerating the effect. However, as a recent important development in the field of generative AI, people’s expectations for Gemini are getting higher and higher, and a team quickly conducted research and published a test paper. The 64-page technical report released today may provide a more intuitive explanation for many of our doubts. The authors of this technical report “Gemini: A Family of Highly Capable Multimodal Models” include Jeff Dean, Oriol Vinyals, Koray Kavukcuoglu, Demis Hassabis and other Google research tycoons, as well as co-founders of companies such as Sergey Brin people.

With just a picture and an action command, Animate124 can easily generate a 3D video

https://news.miracleplus.com/share_link/13972
In the past year, DreamFusion has led a new trend, namely the generation of 3D static objects and scenes, which has attracted widespread attention in the field of generation technology. Looking back on the past year, we have witnessed significant advancements in quality and control of 3D static generation technology. Technology development started from text-based generation, gradually integrated into single-view images, and then developed to integrate multiple control signals. In comparison, 3D dynamic scene generation is still in its infancy. In early 2023, Meta launched MAV3D, marking the first attempt at generating 3D video based on text. However, limited by the lack of open source video generation models, progress in this field has been relatively slow. However, now, 3D video generation technology based on the combination of graphics and text has come out! Although text-based 3D video generation is capable of producing diverse content, it still has limitations in controlling the details and poses of objects. In the field of 3D static generation, 3D objects can be effectively reconstructed using a single image as input. Inspired by this, a research team from the National University of Singapore (NUS) and Huawei proposed the Animate124 model. This model combines a single image with a corresponding action description to enable precise control of 3D video generation.

No longer worried about missing people for group photos, Anydoor opens an “any door” for photo editing

https://news.miracleplus.com/share_link/13973
Anydoor, a new achievement from the University of Hong Kong, Alibaba and Ant Group, opens an “any door” for photo editing. Any item can be transported to the world of another picture with just one photo.

ChatGPT has a new chat archiving function, allowing you to build your own chat database!

https://news.miracleplus.com/share_link/13974
On December 21, OpenAI announced on social platforms that ChatGPT has added an archiving function so that users can save chat records without deleting them. Although this is only a small function, it is of great help to industries such as scientific research, medical care, writing, finance, lawyers, etc. that use text frequently. You can build your own text data database and conduct refined management. For example, legal personnel have 100,000 chat records in ChatGPT. They can manage and save them through this function, and then upload them to ChatGPT through attachments, allowing ChatGPT to answer specific content based on their own chat documents, ensuring the accuracy of the data. Or after 1 year, re-check the accurate chat history of your chat with ChatGPT.