Big Model Daily, November 11-12
[Big Model Daily, November 11-12] OpenAI saved Stable Diffusion! Open source Dall·E3 same decoder, from Ilya Song Yang and others; Google DeepMind grades AGI, guess where ChatGPT is; NVIDIA special edition chip will be on the market: the highest performance is less than 20% of H100; Google large model research Caught in major controversy: Completely unable to generalize beyond training data? Netizen: AGI singularity postponed
OpenAI saves Stable Diffusion! Open source Dall·E3 same decoder, from Ilya Song Yang and others
Link: https://news.miracleplus.com/share_link/11637
Unexpectedly, OpenAI took advantage of the “competing” Stable Diffusion. At the hot “AI Spring Festival Gala”, OpenAI open sourced two projects at once, one of which was the consistency decoder, a VAE model specifically for SD. It allows image generation to be of higher quality and more stable, such as multiple faces, images with text, and line control. The big V blogger analyzed that this decoder should be the same as Dall·E 3. OpenAI also provides the Dall·E 3 paper on the GitHub project page. And this consistency decoder has quite a history. It comes from the Consistency Models proposed by OpenAI co-creator and chief scientist Ilya, OpenAI Chinese rising star Song Yang and others. When this model was open sourced in the first half of the year, it caused a shock in the industry and was rated as being able to “end the proliferation model.” Not long ago, Song Yang and others also optimized the model training method to further improve the quality of image generation.
Google DeepMind ranks AGI, guess where ChatGPT is
Link: https://news.miracleplus.com/share_link/11638
How AGI should be divided, Google DeepMind gives the standard. To develop this framework, DeepMind analyzed existing definitions of AGI and distilled six principles: 1. Focus on model capabilities, not processes. 2. Pay attention to versatility and performance. 3. Focus on cognitive and metacognitive tasks. 4. Focus on potential, not deployment. 5. Focus on ecological effectiveness. 6. Focus on the AGI development path, not just the destination. Based on these principles, DeepMind proposed “Levels of AGI” from two dimensions: performance and versatility. Level 0: No AI, such as Amazon Mechanical Turk; Level 1: Emerging, equivalent to or better than unskilled humans, such as ChatGPT, Bard, Llama 2; Level 2: Competent ), reaching 50% of the human level, but has not yet been achieved on a wide range of tasks; Level 3: Expert, reaching 90% of the human level, but has not yet been achieved on a wide range of tasks, Imagen and Dall-E 2 have been achieved on specific tasks ;Level 4: Master (Virtuoso), reaching 99% of human levels, has not yet been achieved on a wide range of tasks, Deep Blue and AlphaGo have been achieved on specific tasks; Level 5: Superhuman (Superhuman), better than 100% of humans, extensive It has not yet been implemented in terms of tasks. Within some tasks, AlphaFold, AlphaZero, and StockFish have been implemented.
Lex Fridman talks to Musk | The universe may be just a giant computer simulation; there is still a gap of 6 orders of magnitude between GPU clusters and the human brain….
Link: https://news.miracleplus.com/share_link/11639
On Friday, MIT scientist Lex Fridman held a fourth conversation with Musk. In this latest program, they discussed human beings, philosophy, game competition, economy, war, social media, AI, robots, and the short-term 2 to 3 years. Trends and topics in various fields. Regarding AI and robots, Musk said that our brains are very computationally efficient and energy-saving, with less than 10 watts of advanced brain functions, excluding things used to control the body, and the thinking part of our brains is less than 10 watts. This 10 watts Watts of energy can still produce better novels than a 10 MW GPU cluster, so there’s a 6 orders of magnitude difference here. Musk believes that the reason why AI has achieved today’s achievements so far is mainly through powerful calculations and large amounts of energy investment, but this is not its end. Typically with any given technology, you first make it work and then you optimize it, so I think over time these models will become smaller and able to be produced with less compute and energy consumption Reasonable output….
Google’s large model research has fallen into a major controversy: completely unable to generalize beyond training data? Netizen: AGI singularity postponed
Link: https://news.miracleplus.com/share_link/11640
Google Transformer is the infrastructure behind today’s large models, and the “T” in GPT that we are familiar with refers to it. A series of large models demonstrate strong contextual learning capabilities and can quickly learn examples and complete new tasks. But now, researchers also from Google seem to have pointed out its fatal flaw – beyond the training data, that is, the existing knowledge of humans, it is powerless. For a time, many practitioners believed that AGI had become out of reach again.
Baidu CTO Wang Haifeng: Wenxinyiyan’s user base has reached 70 million
Link: https://news.miracleplus.com/share_link/11641
This year marks the tenth year of the Wuzhen Summit. Wang Haifeng, chief technology officer of Baidu and director of the National Engineering Research Center for Deep Learning Technology and Applications, attended the Frontier Digital Technology Innovation and Security Forum and the Artificial Intelligence Empowerment Industry Development Forum to interpret the latest Wenxin Big Model 4.0 technology, elaborating on large-scale industrial models. Wang Haifeng revealed that Wenxinyiyan has now reached 70 million users and 4,300 scenarios.
Nvidia’s special version of the chip will be on the market: the highest performance is less than 20% of H100
Link: https://news.miracleplus.com/share_link/11642
On October 17, the U.S. Department of Commerce issued the strictest export controls to China, and AI accelerators such as H800 became the focus of sanctions. Because the sanctions involve restrictions on graphics card power and computing power, following the H100, Nvidia’s custom chips designed to meet the requirements are also restricted. The new ban came into effect on October 23. Some dealers said that after the A800 and H800 GPUs could not be imported, Nvidia developed a new server chip and two new GPUs specifically for the domestic market. Nvidia will deliver three new chips to domestic manufacturers in the next few days. Following the A800 and H800 GPUs, Nvidia has developed the HGX H20 and GPU L20 and L2, which correspond to artificial intelligence training, inference and end-side application scenarios respectively, and will be released and mass-produced by the end of this year. Among them, HGX L20 is an HGX accelerator card based on Hopper architecture. This model offers high-spec HBM3 memory with a capacity of 96 GB and a bandwidth of 4Tb/s. In terms of computing performance, this model has an INT8 computing power of 296 TFLOPS, which can reach 148 TFLOPS in BF16 through Tensor Core, 44 TFLOPS in FP32, and 1 TFLOP in FP64. Finally, it has a PCIe 5.0 interface along with a 900 GB/s NVLINK link.
A new dawn for general anomaly detection: Huake University of Science and Technology and others reveal the all-round anomaly detection performance of GPT-4V
Link: https://news.miracleplus.com/share_link/11643
Recently, large-scale multi-modal models (LMM) have developed rapidly. Among them, GPT-4V (ision) recently launched by OpenAI has the best performance. It has powerful multi-modal perception capabilities and has achieved results in multiple tasks such as scene understanding and image generation. performed well. We believe that the emergence of LMM provides a new paradigm and new opportunities for the research of general anomaly detection. In order to evaluate the performance of GPT-4V in general anomaly detection, researchers from Huazhong University of Science and Technology, University of Michigan and University of Toronto jointly conducted a study on 15 anomalies involving 4 data modalities and 9 anomaly detection tasks. GPT-4V was comprehensively tested on the detection data set. Specifically, the tested data sets include images, point clouds, videos, time series and other modalities, and cover industrial image anomaly detection/positioning, medical image anomaly detection/positioning, point cloud anomaly detection, logical anomaly detection, and pedestrian anomaly detection. , traffic anomaly detection, timing anomaly detection and other 9 anomaly detection tasks.
GPT-4 can ask questions better than you: Let large models retell themselves and break the barriers to dialogue with humans
Link: https://news.miracleplus.com/share_link/11644
In the latest trends in the field of artificial intelligence, the quality of artificially generated prompts has a decisive impact on the response accuracy of large language models (LLM). OpenAI proposes that precise, detailed, and specific questions are critical to the performance of these large language models. However, can ordinary users ensure that their questions are clear enough for LLM? It is worth noting that there is a significant difference between the natural understanding of human beings and the interpretation of machines in certain situations. For example, the concept of “even months” obviously refers to months such as February, April, etc. to humans, but GPT-4 may misunderstand it as months with an even number of days. This not only reveals the limitations of artificial intelligence in understanding everyday context, but also prompts us to reflect on how to communicate with these large language models more effectively. With the continuous advancement of artificial intelligence technology, how to bridge the gap in language understanding between humans and machines is an important topic for future research. In this regard, the General Artificial Intelligence Laboratory led by Professor Gu Quanquan of the University of California, Los Angeles (UCLA) released a research report, proposing an innovative method to address the ambiguity of large language models (such as GPT-4) in problem understanding. solution. This research was completed by doctoral students Deng Yihe, Zhang Weitong, and Chen Zixiang.
University of Toronto team uses AlphaFold to gain new insights into protein structure
Link: https://news.miracleplus.com/share_link/11645
The AlphaFold protein structure database contains predicted structures for millions of proteins. For most human proteins containing intrinsically disordered regions (IDRs), which do not adopt stable structures, these regions are generally considered to have lower AlphaFold2 confidence scores, reflecting low-confidence structure predictions. A team of researchers at the University of Toronto showed that AlphaFold2 assigns trusted structure to nearly 15% of human IDRs. By comparison with experimental NMR data for a subset of IDRs known to be conditionally folded (i.e., under binding or other specific conditions), the researchers found that AlphaFold2 can generally predict the structure of the conditionally folded state. Based on a database of known conditionally folded IDRs, the team estimates that AlphaFold2 can identify conditionally folded IDRs with up to 88% accuracy at a 10% false positive rate. Meanwhile, the researchers found that conditionally folded IDRs are nearly five times more abundant than general IDRs in human disease mutations, and up to 80% of IDRs in prokaryotes are predicted to be conditionally folded, compared with less than 20% of eukaryotic IDRs.
Pictures and videos directly generate 3D Gaussian distribution! Free trial, officially commercialized
Link: https://news.miracleplus.com/share_link/11646
Polycam, a well-known 3D scanning application company, announced on its official website that it has ended the test of 3D GAUSSIAN SPLATS (hereinafter referred to as “3DGS”) and has officially put it into commercial use. 3DGS is a generative AI product that can directly generate 3D Gaussian distribution through pictures or videos. Through plug-ins, the generated products can also be imported into game development engines such as Unity and Unreal for use and secondary editing, which is of great help to 3D designers, real estate agents, museum displays, medical research, e-commerce displays, etc. Currently, 3DGS is in free online experience and can be accessed directly without region lock. If you want to use more advanced features, such as uploading 1,000 pictures or 15-minute long videos at one time, you will need to pay. For the average person, the free features are enough. In addition, there is no copyright risk in commercial use of the generated content because the uploaded materials are all your own.
Runway’s new feature “Motion Brush” once again amazes the AI circle: Just paint it and the picture will start to move.
Link: https://news.miracleplus.com/share_link/11647
A fifty-second trailer video once again made the AI circle excited. Yesterday, Runway announced that it will soon launch the “Motion Brush” function in the video generation tool Gen-2, a new method to control the movement of generated content.