Bytedance released a new model of multimedia intelligence (AI) last week. It is called Bagel, a visual language model (VLM), and is able to understand, generate and edit images. Beijing -based technology giant has open sources of artificial intelligence model, and is available for download via famous Amnesty International warehouses such as GitHub and Ugging Face. The company claims that BAGEL is able to free visual manipulation, multi -segmental synthesis, and global navigation, making it more capable of editing images compared to open source VLMS.
BEDEDANCE BAGEL surpasses Gueini-2-EXP in photo editing
GitHub menu page It sheds more light on the BYTEDANCE bread model, including its weights and data groups. However, the company did not provide details about post -training operations, or the structure of the form. It is currently available in the permissible APache 2.0 license, which allows both academic and commercial use.
BAGEL is a multimedia AI model that accepts both text and images as inputs. An open source VLM is characterized by a total of 14 billion teachers, of which seven billion are still active simultaneously. Bytedance claims that the model has been trained on multimedia data on a large scale. This means that different types of data, such as text and pictures, were combined while feeding the artificial intelligence system. As a result, learn the model from both methods jointly, instead of separately.
This method allows the basic models to gain the context between different methods. For example, if BAGEL is fed the photos and illustrations together, it would be better to understand what the text represents exactly in the optical medium. This would lead to more product efficiency, according to the company.
Bytedance also claims that the artificial intelligence model offers better photo editing capabilities compared to open source VLMS. It can perform complex tasks such as adding passion to an image, removing, replacing it or adding items, and moving the pattern, as well as making free adjustments. The company claims that with this ability, Bagel is able to provide much higher production during the global model.
The global model refers to the internal understanding of the artificial intelligence system for how the real world visually works. This will include the relationship between different things, material context, and the effect of material factors such as light, wind, rain and gravity.
Based on the internal test, Bytedance claims that Bagel was able to outperform the QWEN2.5-VL-7B, a similar model, in understanding the images. It is also said to record higher in the Standards of Janus-PRO-7B and Flux-1-Dee. In addition, it is also said to overcome the Gemini-2-EXP on the gun seats for photo editing.
Those who want to try the artificial intelligence model without operating it locally to go to the embrace of the embrace, where it created the cloud -based preteedance Interface To test photo, obstetric and editing.