December 7 Big Model Daily Collection
[December 7 Big Model Daily Collection] Google’s big killer is finally here, the largest Gemini shocking release: True super GPT4, three major versions, directly available on mobile phones; open source, can be commercialized! The performance is 2.5 times better than Stable Difusion; Meta launches an independent AI image generator, which is currently free but only supports English prompt words
Small models can also “divide everything”, Meta improves SAM, and the parameters are only 5% of the original version
Link: https://news.miracleplus.com/share_link/12601
For the field of computer vision in 2023, “Segment Anything Model” is a research development that has attracted much attention. The key feature of Segment Anything is the cue-based Visual Transformer (VIT) model, which is trained on a vision dataset SA-1B containing more than 1 billion masks from 11 million images to segment a given any target on the image. This ability makes SAM a basic model in the field of vision and can also produce application value in fields beyond vision. In a recent study, Meta researchers proposed another improvement idea—masked image pre-training (SAM) using SAM. This is achieved by leveraging the MAE pre-training method and the SAM model to obtain a high-quality pre-trained VT encoder.
Bridging the dimensional wall between the fields of 2D and 3D generation, x-Dreamer achieves high-quality text to 3D generation
Link: https://news.miracleplus.com/share_link/12602
This article introduces a framework called x-Dreamer, which is mainly composed of two key innovations: CG-LORA and AMA loss. It bridges the field gap between text-to-2D and text-to-3D and achieves high-quality 3D generation.
A photo smoothly replaces the protagonist of the video, no matter how big the movement is, it’s OK1 Meta&National University of Singapore
Link: https://news.miracleplus.com/share_link/12603
I was ecstatic in the later stage. Now, I can replace the main character of the video with just one picture, and the effect is still so smooth! Let’s take a look at this new video editing model called “Videoswap”. Whether it is style transfer or theme/background transfer, the main challenge of this video editing task is how to extract motion trajectories from the source video and transfer them to the new video, overlaying onto new elements, while ensuring temporal consistency. Most of the previous models (principles include encoding source motion, using attention maps, optical flow, etc.) have failed to do the same, and either do not do well in temporal consistency, or strictly limit shape changes. Here, videoSwap proposes to use a small number of semantic points to describe the movement trajectory of objects.