Big Model Daily on November 22

News1years go (2023)update AIWindVane

165 0 0

[Big Model Daily on November 22] News Anthropic brings Claude 2.1

Anthropic brings Claude 2.1

https://news.miracleplus.com/share_link/11905

Anthropic: “Our latest model, Claude 2.1, is now available via API in our console and powers our Claude AI chat experience. Claude 2.1 delivers key feature improvements for enterprises, including industry-leading 200K Token context window, a significant reduction in model hallucination rates, system prompts and our new beta feature: Tool Usage. We are also updating our pricing to improve cost efficiency for customers across models.”

Stable Video Diffusion is here, code weight is online

https://news.miracleplus.com/share_link/11906

Stability AI, a well-known company for AI drawing, has finally entered the AI-generated video industry. On Tuesday this week, Stable Video Diffusion, a video generation model based on Stable Diffusion, came out, and the AI community immediately started discussing it. Now you can generate a video of a few seconds based on an original still image.

Based on Stability AI’s original Stable Diffusion graph model, Stable Video Diffusion has become one of the few video generation models in the open source or commercial ranks.

New benefits for users who subscribe (monthly fee: $16), X (Twitter) has invited them to try out the Grok AI experience

https://news.miracleplus.com/share_link/11907

Source @nima_owji revealed that X (Twitter) has currently invited some users to experience the new Grok AI chat experience in its X application. Invited users can click to enter from the left entrance of the X application. Judging from the exposed screenshots, the Grok AI chat interface is very simple. If the invited user has not purchased a Premium + subscription, a “Get Grok with Premium+” prompt will pop up, recommending the user to purchase a Premium + subscription at a price of US$16 per month.

Microsoft will launch Windows Copilot for Chinese enterprise and education users on December 1

https://news.miracleplus.com/share_link/11908

According to news released by Microsoft, Microsoft will launch the Web AI chat function Copilot, formerly Bing Chat Enterprise, for enterprises and educational institutions in mainland China on December 1, 2023. Enterprises and educational institutions can use Windows Copilot, Bing Chat Enterprise, and Copilot in Microsoft Edge. The AI model supports the Internet to obtain data.

Pinduoduo enters the large model industry

https://news.miracleplus.com/share_link/11909

Pinduoduo has established a large model team with dozens of people located in Shanghai. The large model team will explore the application of large models in Pinduoduo customer service, dialogue and other scenarios, and will expand to its cross-border e-commerce platform TEMU intelligent customer service, search, recommendation and other business scenarios. Currently, the entire process is still in the research and development stage. Industry analysts believe that Pinduoduo’s large-scale model will serve its e-commerce system, including applications in AI shopping guides and intelligent generation of product images.

Nvidia partners with Roche Pharmaceuticals to develop AI platform

https://news.miracleplus.com/share_link/11910

NVIDIA announced that it will cooperate with Genentech, a subsidiary of Roche Pharmaceuticals, to conduct an AI platform research to accelerate drug discovery and development. Both will build AI models on NVIDIA DGX Cloud.

The PyTorch team rewrote the “split everything” model, which is 8 times faster than the original implementation

https://news.miracleplus.com/share_link/11911

Generative AI has developed rapidly since the beginning of the year. But many times, we have to face a difficult problem: how to speed up the training, reasoning, etc. of generative AI, especially when using PyTorch. In this article, researchers from the PyTorch team provide us with a solution. The article focuses on how to use pure native PyTorch to accelerate generative AI models. It also introduces new PyTorch features and practical examples of how to combine them. What was the result? The PyTorch team said they rewrote Meta’s “Split Everything” (SAM) model, resulting in code that is 8 times faster than the original implementation without losing accuracy, all optimized using native PyTorch.

Using deep hypnosis to induce LLM “jailbreak”, Hong Kong Baptist University initially explores a credible large language model

https://news.miracleplus.com/share_link/11912

Although LLM (Large Language Model) has achieved great success in various applications, it is also susceptible to the induction of some prompts, thereby overcoming the built-in safety protection of the model to provide some dangerous/illegal content, that is, Jailbreak. Deeply understanding the principles of this type of Jailbreak and strengthening related research can reversely promote people’s attention to the security protection of large models and improve the defense mechanism of large models. Different from the previous use of search optimization or computationally expensive inference methods to generate jailbreakable prompts, this article is inspired by the Milgram experiment and proposes a lightweight jailbreak method from a psychological perspective: DeepInception. Deep Hypnosis LLM makes it a jailbreaker and allows it to circumvent built-in security protections on its own.

Covering more than 500 studies and more than 50 models, a review of large code models is here

https://news.miracleplus.com/share_link/11913

Language modeling has made significant progress in recent years with the emergence of pre-trained Transformers such as BERT and GPT. As large language models (LLMs) scale to tens of millions of parameters, LLMs are beginning to show signs of general artificial intelligence, and their applications are no longer limited to text processing. Codex first demonstrated the excellent capabilities of LLM in code processing, and later commercial products such as GitHub Copilot and open source code models such as StarCoder and Code LLaMA appeared. However, the use of pretrained Transformers for code processing dates back to before decoder-only autoregressive models became mainstream, and there is yet to be a complete review in this area. A research team from Shanghai Jiao Tong University and Ant Group has filled this gap. They provide a panoramic summary of language models for code, covering more than 50 models, more than 30 downstream tasks, and more than 500 related research results. They classify code language models from giant models trained on a general domain to tiny models trained specifically for code understanding or generation tasks.