Alibaba's Qwen2.5-VL AI model can run Booking.com on Android and book tickets from Chongqing to Beijing (video)

By: Nastya Bobkova | 28.01.2025, 05:26
Alibaba has developed AI that can control your devices: PCs and phones Alibaba releases AI models that can control PCs and phones. Source: CrossML

Alibaba's Qwen team has announced the release of a new line of AI models Qwen2.5-VL capable of performing a number of text and image analysis tasks.

Here's What We Know

The models can process files, understand videos, count objects in images, and control PCs, which is similar to the model that works in OpenAI Operator.

According to testing data, Qwen2.5-VL outperforms OpenAI's GPT-4, Anthropic's Claude 3.5, and Google's Gemini 2.0 Flash in video understanding, maths, document analysis, and question answering. The model is capable of analysing graphs and charts, extracting data from scans of invoices and forms, and "understanding" videos lasting several hours.

Qwen2.5-VL test results
Qwen2.5-VL test results. Illustration: Alibaba

An interesting feature of Qwen2.5-VL is the ability to interact with software on PCs and mobile devices. A video posted on X shows a Qwen2.5-VL model launching the Booking.com app on Android and booking a plane ticket from Chongqing to Beijing. However, in a test on a Linux desktop, the model proved to be less efficient, limiting itself to switching tabs.

The Qwen2.5-VL models also have certain restrictions on the topics they discuss, particularly in Qwen Chat, due to Chinese internet regulator controls requiring adherence to "core socialist values".

Qwen2.5-VL models are available for testing in the Qwen Chat app and on the Hugging Face platform. The Qwen2.5-VL-72B model has a special licence that requires commercial use authorisation for companies with more than 100 million monthly active users.

Source: @_philschmid