December 25 Large Model Daily Collection

News1years go (2024)release AIWindVane

122 0 0

[December 25 Large Model Daily Collection] Born for AI acceleration: Intel’s Xeon can now run 20 billion parameter large models; what is the origin of Mamba, which challenges Transformer? The author’s doctoral thesis clarifies the evolution path of SSM; Musk responded that Grok was out of control: They were all led bad by netizens, and the testing phase will continue to improve; OCR is over? Megvii proposes a multi-modal large model that supports document-level OCR, supports Chinese and English, and has been open source

Born for AI acceleration: Intel’s Xeon can now run large models with 20 billion parameters

Link: https://news.miracleplus.com/share_link/14275

Recently, Intel’s server CPUs have undergone another evolution. The fifth generation Intel(®) Xeon(®) Scalable processors are officially released. Intel said that a processor designed for AI acceleration and with stronger performance was born.

What is the origin of Mamba who challenges Transformer? The author’s doctoral thesis clarifies the evolution path of SSM

Link: https://news.miracleplus.com/share_link/14276

Recently, a research called “Mamba” can rival or even beat Transformer in language modeling. This is all thanks to a new architecture proposed by the author – selective state space model (selective state space model), which is a part of the S4 architecture (Structured State Spaces for Sequence Modeling) previously led by Albert Gu, the author of the Mamba paper. Simple generalization.

After the release of Mamba’s paper, many researchers became curious about SSM (state space model), S4 and other related research. Among them, one researcher said that he would read all these papers on the plane. In this regard, Albert Gu gave better advice: his doctoral thesis actually sorted out all these developments, and it may be more organized to read.

Musk responded that Grok was out of control: It was all led to bad things by netizens, and it will continue to improve during the testing phase.

Link: https://news.miracleplus.com/share_link/14277

Neutrality and the courage to speak out are the selling points that Grok, the “Musk version of ChatGPT,” has been promoting.

But recently, netizens have discovered that its answers are becoming more and more biased, and they can’t help but post comments and even @Musk himself.

At the same time, Grok’s performance also alarmed Musk himself.

He complained that the Internet information used to train Grok was too complicated and full of “arousing spam”, which was equivalent to indirectly acknowledging what netizens said.

At the same time, Musk also said: The current Grok is just a beta version and it will get better in the future.

The elegant fusion of Softmax attention and linear attention, Agent Attention promotes a new upgrade of attention

Link: https://news.miracleplus.com/share_link/14278

Combining the advantages of Softmax attention and linear attention, the agent attention module has the following characteristics:

(1) The computational complexity is low and the model expression ability is strong. Previous research usually treats softmax attention and linear attention as two different attention paradigms, trying to solve their respective problems and limitations. Agent attention elegantly blends these two forms of attention, thus naturally inheriting their advantages while enjoying low computational complexity and high model expressiveness.

(2) A larger receptive field can be used. Thanks to linear computational complexity, agent attention can naturally adopt larger receptive fields without increasing model computation. For example, the window size of Swin Transformer can be expanded from 7^2 to 56^2, that is, global self-attention is directly used without introducing any additional calculations.

Is OCR over? Megvii proposes a large multi-modal model that supports document-level OCR, supports Chinese and English, and has been open source!

Link: https://news.miracleplus.com/share_link/14279

Vary has shown great potential and a very high upper limit. OCR can no longer require lengthy pipelines, and can directly output end-to-end, and can output different formats such as latex, word, and markdown according to the user’s prompt.

Through the extremely strong language prior of the large model, this architecture can also avoid typo-prone words in OCR, such as “leverage” and “dupole”. For fuzzy documents, it is also expected to achieve stronger language prioritization with the help of language prior. OCR effect.

As soon as the project came out, it attracted the attention of many netizens. Some netizens called out “kill the game!”