Alibaba's Qwen2.5-VL AI model can run Booking.com on Android and book tickets from Chongqing to Beijing (video)
Alibaba's Qwen team has announced the release of a new line of AI models Qwen2.5-VL capable of performing a number of text and image analysis tasks.
Here's What We Know
The models can process files, understand videos, count objects in images, and control PCs, which is similar to the model that works in OpenAI Operator.
According to testing data, Qwen2.5-VL outperforms OpenAI's GPT-4, Anthropic's Claude 3.5, and Google's Gemini 2.0 Flash in video understanding, maths, document analysis, and question answering. The model is capable of analysing graphs and charts, extracting data from scans of invoices and forms, and "understanding" videos lasting several hours.
Qwen2.5-VL test results. Illustration: Alibaba
An interesting feature of Qwen2.5-VL is the ability to interact with software on PCs and mobile devices. A video posted on X shows a Qwen2.5-VL model launching the Booking.com app on Android and booking a plane ticket from Chongqing to Beijing. However, in a test on a Linux desktop, the model proved to be less efficient, limiting itself to switching tabs.
Don't Miss @Alibaba_Qwen 2.5 VL! Despite all the Deepseek Hype, Qwen just dropped the best open Multimodal! Qwen 2.5 VL is a Vision Language Model that can control your computer, similar to the @OpenAI operator, extract structured information from charts, and more!!
- Philipp Schmid (@_philschmid) 27 January 2025
TL;DR;
3️⃣... pic.twitter.com/GeEGVdl0tI
The Qwen2.5-VL models also have certain restrictions on the topics they discuss, particularly in Qwen Chat, due to Chinese internet regulator controls requiring adherence to "core socialist values".
LMAO Qwen 2.5 VL can perform Computer Use, out of the box, taking on OpenAI Operator HEAD ON! ???? pic.twitter.com/lwMECXzNSu
- Vaibhav (VB) Srivastav (@reach_vb) January 27, 2025
Qwen2.5-VL models are available for testing in the Qwen Chat app and on the Hugging Face platform. The Qwen2.5-VL-72B model has a special licence that requires commercial use authorisation for companies with more than 100 million monthly active users.
Source: @_philschmid