Big Model Daily, February 22

News3months agorelease AIWindVane
14 0

Big Model Daily, February 22[Big Model Daily, February 22] The technology behind the explosion of Sora, an article summarizing the latest development direction of diffusion models; the open source big model throne has changed hands! Google Gemma enters the market, laptops can be run, and can be used commercially; NVIDIA is making 570 million a day, and Academician Huang is on the money printing machine; AI dark horse Groq will subvert NVIDIA? LPU performance and cost interpretation

The technology behind the explosion of Sora, an article summarizing the latest development direction of diffusion models

To enable machines to imitate human imagination, deep generative models have made significant progress. These models can create realistic samples, especially the diffusion model, which performs well in multiple areas. The diffusion model solves the limitations of other models, such as the posterior distribution alignment problem of VAEs, the instability of GANs, the computational complexity of EBMs, and the network constraint problem of NFs. Therefore, diffusion models have attracted much attention in aspects such as computer vision and natural language processing. The diffusion model consists of two processes: forward process and reverse process. The forward process transforms the data into a simple prior distribution, while the backward process reverses this change and uses a trained neural network to simulate differential equations to generate the data. Compared with other models, the diffusion model provides a more stable training target and better generation results.

Model fusion, mixed experts, smaller LLM, several papers to understand the development direction of LLM in 2024

Over the past 2023, large language models (LLMs) have grown rapidly in both potential and complexity. Looking forward to open source and research progress in 2024, it seems that we are about to enter a welcome new phase: making models better without increasing the size of the model, or even making the model smaller. Now that the first month of 2024 has passed, it might be time to take stock of how the first month of the new year went. Recently, AI researcher Sebastian Raschka released a report introducing four important papers related to the above-mentioned new stage. Their research topics are briefly summarized as follows:

1. Weight averaging and model fusion can combine multiple LLMs into a single better model, and this new model does not have the typical drawbacks of traditional ensemble methods, such as higher resource requirements.

2. Proxy-tuning technology can improve the performance of an existing large LLM by using two small LLMs. This process does not require changing the weights of the large model.

3. By combining multiple small modules to create a hybrid expert model, the resulting LLM can be as effective and efficient as or better than its larger counterpart.

4. Pre-training a small 1.1B parameter LLM reduces development and operational costs and opens new possibilities for educational and research applications.

Inspired by ChatGPT, combining Transformer and RL-MCTS for de novo drug design

The discovery of novel therapeutic compounds through de novo drug design is a key challenge in pharmaceutical research. Traditional drug discovery methods are often resource-intensive and time-consuming, prompting scientists to explore innovative ways to harness the power of deep learning and reinforcement learning techniques. Here, researchers at Chapman University in the United States have developed a novel drug design method called drugAI that utilizes an encoder-decoder Transformer architecture with Monte Carlo Tree Search (RL-MCTS). Reinforcement learning is performed to speed up the drug discovery process while ensuring the production of effective small molecules with drug-like properties and strong binding affinity for their targets. DrugAI generated compounds with significantly improved effectiveness and drug similarity compared to two existing baseline methods. Furthermore, drugAI ensures that the generated molecules exhibit strong binding affinity for their respective targets.

The open source large model throne changes hands! Google Gemma enters the market, the notebook can be run and can be used commercially

The big model in the open source field has welcomed a major new player. Google has launched a new open source model series “Gemma”. Compared with Gemini, Gemma is more lightweight, while remaining free and available. The model weights are also open source and allowed for commercial use. This release includes models in two weight scales: Gemma 2B and Gemma 7B. There are pre-trained and fine-tuned versions of instructions for each scale. Those who want to use it can access it through Kaggle, Google’s Colab Notebook, or through Google Cloud. Of course, Gemma also launched HuggingFace and HuggingChat immediately, so everyone can try its generation capabilities.

Nvidia is making 570 million a day, and Academician Huang is lying on the money printing machine

Nvidia’s latest financial report is out. Consecutive “three highs”: 1. Q4 quarter revenue of fiscal year 2024 reached US$22.1 billion (net profit of US$12.2 billion), an increase of 22% from the previous quarter and an increase of 265% from the previous year. 2. The leading data center revenue accounted for US$18.4 billion, an increase of 27% from the third quarter and a 409% increase from the previous year. 3. The full-year revenue for fiscal year 2024 is also available: US$60.9 billion (approximately 438.4 billion yuan), 126% more than last year. Net profit was US$29.7 billion, equivalent to approximately 213.6 billion yuan, equivalent to 5.7 “small goals” per day.

AI dark horse Groq subverts NVIDIA? LPU performance and cost interpretation

Groq is a technology company founded in 2016 by Jonathan Ross. Ross was the creator of Google’s first tensor processing unit (TPU). His founding philosophy stemmed from the idea that chip design should draw inspiration from software-defined networking (SDN). On February 13, 2024, Groq clearly won the latest LLM benchmark test of Groq defeated eight participants in key performance indicators such as latency and throughput. Groq’s processing throughput reached 4 times that of other inference services. , while charging less than 1/3 of Mistral’s own.

The head of Samsung’s mobile division reveals Galaxy AI development plans, which will be expanded to wearable devices

TM Roh, head of Samsung’s mobile division, recently revealed the company’s future plans for artificial intelligence (AI) and how to expand its application scope. Roh said Samsung’s next plan is to expand the application scope of Galaxy AI to more devices and services, including wearables. He revealed plans to bring Galaxy AI capabilities to “select” Galaxy wearables in the “near future.”

© Copyright notes

Related posts

No comments

No comments...