January 18-19 Large Model Daily Collection
[January 18-19 Large Model Daily Collection] Visual Mamba is here: speed increased by 2.8 times, memory can be saved by 87%; one-click real-scene conversion to animation, Tsinghua-based startup company’s world’s first 4D skeletal animation framework, and personalized characters can be generated; Tencent releases the video generation model VideoCrafter2, which greatly improves the light and shadow effect
Visual Mamba is here: speed increased by 2.8 times, memory can be saved by 87%
Link: https://news.miracleplus.com/share_link/16215
Mamba, known as “Comprehensive Surrounding Transformer”, has a high-performance visual version less than two months after its launch. On Thursday, researchers from Huazhong University of Science and Technology, Horizon, Zhiyuan Artificial Intelligence Research Institute and other institutions proposed Vision Mamba (Vim). What is the effect? On the ImageNet classification task, COCO object detection task, and ADE20k semantic segmentation task, Vim achieves higher performance than mature visual Transformers such as DeiT, while also significantly improving computational and memory efficiency. For example, when performing batch inference to extract features from images with a resolution of 1248×1248, Vim is 2.8 times faster than DeiT and saves 86.8% of GPU memory. The results show that Vim is able to overcome the computational and memory limitations of performing Transformer-style understanding of high-resolution images, and has great potential to become the next generation backbone of vision-based models.
One-click live scene to animation, Tsinghua-based start-up company’s world’s first 4D skeletal animation framework, and can also generate personalized characters
Link: https://news.miracleplus.com/share_link/16216
A few days ago, Apple announced that its first virtual head-mounted display device, Vision Pro, will be officially released on February 2. XR devices, as the next generation of terminals, are expected to develop rapidly. In the future, with the popularization of virtual display devices, digital interaction will move from two-dimensional to three-dimensional. Three-dimensional models and three-dimensional animation will become the mainstream content forms in the future. Multi-dimensional immersive interaction under the integration of virtual and real will also become a trend. Facing this cutting-edge field, Tsinghua University’s entrepreneurial team Shengshu Technology has carried out a series of research and product development, and recently launched the world’s first 4D animation generation framework “AnimatableDreamer” based on “skeletal animation” in conjunction with Tsinghua University, Tongji University and other universities, which can directly Convert 2D video material into dynamic three-dimensional models (i.e. 4D animation) with one click, support automatic extraction of skeletal movements, one-click conversion of animation effects, and personalized character generation through text input.
How to harness revolutionary protein structure tools for drug discovery? AlphaFold discovers thousands of possible hallucinogens
Link: https://news.miracleplus.com/share_link/16217
AlphaFold2 (AF2) and RosettaFold have greatly expanded the number of structures available for structure-based ligand discovery, although their direct role in this goal has been questioned. A team of researchers at the University of California, Berkeley, has used the protein structure prediction tool AlphaFold to identify hundreds of thousands of potential new psychedelic (psychedelic) molecules that could help develop new antidepressants. The study is the first to show that AlphaFold predictions, which can be made at the push of a button, are as useful for drug discovery as experimentally derived protein structures, which can take months or even years to determine.
Zuckerberg declares war on AGI: During Llama 3 training, he will hoard 350,000 H100 this year, spending nearly 10 billion US dollars
Link: https://news.miracleplus.com/share_link/16218
In order to achieve the ambitious goal of artificial general intelligence (AGI), Zuckerberg is conducting a major reorganization of Meta’s AI research department. On Thursday, Meta CEO Mark Zuckerberg announced that his company is working to build “general intelligence” and “responsibly open source” artificial intelligence assistants. Meta is integrating its two main research groups (FAIR and GenAI) are combined to achieve this goal. Research from a third-party investment institution estimates that Nvidia’s H100 shipments for Meta will reach 150,000 units in 2023. This number is the same as shipments to Microsoft and at least three times that of other companies. Zuckerberg said that if Nvidia A100 and other artificial intelligence chips are included, Meta’s GPU computing power will reach the equivalent of nearly 600,000 H100s by the end of 2024.
The throughput is increased by 5 times. The LLM interface for jointly designing the back-end system and front-end language is here.
Link: https://news.miracleplus.com/share_link/16219
Large language models (LLMs) are increasingly used for complex tasks that require multiple chained build calls, advanced prompting techniques, control flow, and interaction with the external environment. However, existing efficient systems for programming and executing these applications suffer from significant shortcomings. Now, researchers in the open source community have proposed a Structured Generation Language (SGLang) for LLM. SGLang can enhance the interaction with LLM and make LLM faster and more controllable by jointly designing the back-end runtime system and front-end language. Chen Tianqi, a well-known scholar in the field of machine learning and CMU assistant professor, also forwarded this research.
Don’t worry about competition from major manufacturers such as ByteDance and Tencent. Insilicon CEO Alex Zhavoronkov talks about AI drug discovery
Link: https://news.miracleplus.com/share_link/16220
Artificial intelligence is increasingly used in biopharmaceuticals, with applications beyond discovery devices. This is a hot issue at the 42nd J.P. Morgan Healthcare Conference in San Francisco from January 8th to 11th, 2024. Eli Lilly and Novartis even announced multimillion-dollar discovery deals with Alphabet’s Isomorphic Labs just as the conference began. Amid the craze for artificial intelligence, foreign media held a discussion with Alex Zhavoronkov, CEO of Insilico Medicine. Last summer, the company became the first to enter Phase II clinical trials for a therapy developed using generative artificial intelligence. Zhavoronkov talked about the role of AI in the industry and when Insilico might launch a product on the market. In addition to Phase II trials testing the treatment for the lung disease idiopathic pulmonary fibrosis in the U.S. and China this summer, Insilico just signed a licensing agreement this month with Menarini Group to bring it to another Commercialization of a drug.
Tencent releases video generation model VideoCrafter2, which significantly improves light and shadow effects
Link: https://news.miracleplus.com/share_link/16221
Tencent announced the launch of VideoCrafter 2, an upgraded version of the video generation model VideoCrafter, which has significant improvements in light and shadow effects and other aspects. VideoCrafter 2 can generate a few seconds of high-quality video based on user-supplied text. Compared with the previous version, the new version has been greatly improved in terms of picture quality, character movements, etc., and the generated video content is more realistic.