Google announces new PaliGemma 2 model for image and text processing

By: Nastya Bobkova | 05.12.2024, 22:38

Following the announcement of Gemma 2 at I/O 2024 in May, Google is introducing a new version of the PaliGemma 2 model, an open source model for image and text processing.

Here's What We Know

The first version of PaliGemma was launched in May and was used for tasks such as adding captions to images and videos, recognising text in images, detecting objects, splitting them into parts, and answering questions about visual content.

PaliGemma 2 offers a "long caption" feature that allows you to generate detailed descriptions of images, taking into account actions, emotions and the overall atmosphere of the scene. The model is available in several variants with 3B, 10B, 28B parameters and different resolutions.

Text recognition and table structure analysis in documents have also been improved. PaliGemma 2 shows excellent results in recognising chemical formulas, musical scores, spatial reasoning, and creating reports based on X-ray images.

Google notes that PaliGemma 2 can be easily replaced with an earlier version of the model, with performance improvements without the need for major code changes.

PaliGemma 2 models and code are already available on Kaggle, Hugging Face, and Ollama.

Source: 9to5Google