Collection of Big Model Daily on November 6th

News6months agorelease AIWindVane
32 0

Collection of Big Model Daily on November 6th

[Collection of Big Model Daily on November 6th] Kai-fu Lee officially announced the “world’s most powerful” open source large model: processing 400,000 Chinese characters at a time, dominating both Chinese and English; Sci-fi reflects reality, looking at “Musk Economics” from xAI AI layout and cutting-edge applications in China; Yuanxiang XVERSE-65B: The largest open source model in China is here, with high performance and unconditional free commercial use


Kai-Fu Lee officially announced the “world’s most powerful” open source large model: processing 400,000 Chinese characters at a time, dominating the list in both Chinese and English

 

Link: https://news.miracleplus.com/share_link/11465

The universe of open source large models has a new heavyweight member. This time it is the “Yi” series of open source large models launched by Kai-Fu Lee, chairman and CEO of Innovation Works, a large model company. It is reported that Zero One Thousand Things was officially established at the end of March this year and started operations in June and July. Dr. Kaifu Li is the founder and CEO.

On November 6, Zero-Yiwuwu officially released the “Yi” series of pre-trained open source large models, including Yi-6B and Yi-34B versions, which gave the open source large model community “a little shock.” According to the latest lists of the Hugging Face English open source community platform and C-Eval Chinese evaluation, the Yi-34B pre-training model has achieved multiple SOTA international best performance indicator recognitions, becoming the “double champion” of the global open source large model, defeating LLaMA2 and Falcon and other open source competing products.

Collection of Big Model Daily on November 6th

Sci-fi shines into reality, looking at the AI layout and cutting-edge applications in “Musk Economics” from the perspective of xAI

 

Link: https://news.miracleplus.com/share_link/11466

This past weekend, Musk released xAI’s product on his own AI vertical integration in . So what does “Grok” mean? The word first appeared in “Stranger in a Strange Land” by Robert A. Heinlein, one of the “Big Three” of 20th century science fiction literature. “Grok” means to understand the nature of something. In Heinlein’s setting, “Grok” is a very rich and complex concept in the Martian language. It covers the physical action of “drinking water” and also implies the deep connection and understanding between man and the universe. It involves a The complete perception, understanding, and assimilation of an organism to another organism or thing.

Collection of Big Model Daily on November 6th

Shocking! GPT-4V Illusion Challenge Record: What should be wrong is right, what should not be wrong is wrong instead

 

Link: https://news.miracleplus.com/share_link/11467

GPT-4V challenged the visual error map, and the results were “astonishing”. Why does this happen: it recognizes some illusions and performs them poorly on others? First of all, for the color illusion picture, netizens first thought it was a problem with the prompt words. However, some netizens pointed out that when we asked it which tree was brighter, if all pixels were averaged very rigorously, GPT-4V’s answer would be correct. As for the problem of being unable to recognize distant images, some netizens believe that this may be because GPT-4V only reads images from left to right. As for the question “Why does it sometimes become dizzy and misled by illusions like humans, and is not like an intelligent AI at all?” many people said that this is not surprising and it is a training problem. That is, large models are trained based on human data, human feedback, and human annotations, and will naturally make the same mistakes as humans.

Collection of Big Model Daily on November 6th

Yuanxiang XVERSE-65B: The largest open source model in China is here, with high performance and unconditional free commercial use

 

Link: https://news.miracleplus.com/share_link/11468

Domestic countries have previously open sourced a number of large models with 7 billion to 13 billion parameters, and the implementation results have emerged, and the open source ecosystem has been initially established. As the complexity and data volume of tasks such as agents increase, the industry and community’s demand for larger models becomes increasingly urgent. Research shows that the higher the number of parameters and the more high-quality training data, the more performance of large models can be continuously improved. The general consensus in the industry is that only when the parameter threshold of 50 to 60 billion is reached can large models “smartly emerge” and demonstrate powerful performance in multi-tasks. However, training a model of this magnitude is expensive and requires high technical requirements. Currently, it is mainly provided as a closed-source paid model. In the foreign open source ecosystem, benchmark models such as Llama2-70B and Falcon-180B are conditionally open source, with commercial upper limits on monthly active users or income, and have obvious shortcomings in Chinese language capabilities due to lack of training data. In addition, the recently promulgated ban on AI chips in the United States may further restrict the development speed of China’s large model industry. The industry urgently calls for a high-performance large-scale domestic model to fill the ecological gap and provide more powerful understanding, reasoning and long text generation capabilities for Chinese applications. In this context, Yuanxiang XVERSE Company announced that it will open source the XVERSE-65B, a high-performance general-purpose large model with 65 billion parameters, and make it available for free commercial use unconditionally, which is the first time in the industry. In addition, the 13B model has been fully upgraded to increase the upper limit of small model capabilities. This will allow a large number of small and medium-sized enterprises, researchers and AI developers to realize the freedom of large models earlier. They can freely use, modify or distill Yuanxiang large models according to their computing power, resource constraints and specific task requirements, promoting breakthroughs in research and application. Innovation.

Collection of Big Model Daily on November 6th

GPU inference speed is increased by 4 times, and the 256K context is the longest in the world: Wuwen Core Dome refreshes the large model optimization record

 

Link: https://news.miracleplus.com/share_link/11469

It is imperative to reduce the inference cost of LLM, and improving the inference speed has become an effective key path. In fact, the research community has proposed many technologies for accelerating LLM inference tasks, including DeepSpeed, FlexGen, vLLM, OpenPPL, FlashDecoding, and TensorRT-LLM. Naturally, these technologies each have their own advantages and disadvantages. Among them, FlashDecoding is a state-of-the-art method proposed last month by Tri Dao and others from the Stanford University team and the author of FlashAttention. It greatly improves the inference speed of LLM by loading data in parallel and is considered to be very effective. potential. But at the same time, it also introduces some unnecessary computational overhead, so there is still a lot of room for optimization. In order to further solve the problem, recently, a joint team from Infinigence-AI, Tsinghua University and Shanghai Jiao Tong University proposed a new method, FlashDecoding++, which not only brings stronger acceleration capabilities than the previous method (it can also GPU inference speeds up 2-4 times), and more importantly, it also supports NVIDIA and AMD GPUs! Its core idea is to achieve true parallelism in attention calculations through asynchronous methods, and to optimize and accelerate the calculation of the Decode stage for “chubby” matrix multiplication.


Significantly improving the versatility of user behavior representation, Ant’s new model won the CIKM 2023 Best Application Paper Award

 

Link: https://news.miracleplus.com/share_link/11470

The CIKM 2023 academic conference sponsored by the American Computer Society ACM was held in Birmingham, England. The conference attracted more than 8,000 academic practitioners to participate, and selected the best papers from 235 submitted papers in the applied research track (applied research track). Applied Paper Award, a research paper on user behavior representation model from Ant Group won the award.

Collection of Big Model Daily on November 6th

Peking University’s new achievement of embodied intelligence: No training required, you can move flexibly by following instructions

 

Link: https://news.miracleplus.com/share_link/11471

The latest achievement of embodied navigation by Dong Hao’s team at Peking University is here: no additional mapping or training is required, just speaking navigation instructions, such as: Walk forward across the room and walk through the panty followed by the kitchen. Stand at the end of the kitchen. Here, the robot relies on active communication with the “expert team” composed of large models to complete a series of key visual language navigation tasks such as instruction analysis, visual perception, completion estimation and decision-making testing.

© Copyright notes

Related posts

No comments

No comments...