Google I/O 2024: the Gemini era in search - AI will create drawings, videos and music, and warn of phone scammers

By: Viktor Tsyrfa | 15.05.2024, 09:21

At the current Google I/O conference, the search giant demonstrated that it has no intention of relinquishing its leadership in the introduction of artificial intelligence. AI will be more tightly integrated into Google Photos, Google Camera, Gmail, search, and other apps, and Gemini will become the primary personal assistant that can be communicated with via text, voice, and camera.

Google I/O 2024: highlights

Gemini will now become the lead assistant in Android. No word on the fate of Google Assistant, but we all know Google's habits.
Image generation - according to the detailed description, Gemini will create images of even non-existent objects. You can also create beautiful captions.
Video generation by description. Video can be extended to the desired duration. A very promising tool for bloggers to get free themed video inserts.
Music AI Sandbox - a tool to generate by text description a music sample or process the input audio track.
Google Photo will now analyse the full context of photos. It will be possible not only to search images by the description of what is taken on them, but also to make thematic selections, for example, progress from workouts over the year, etc.
API Gemini 1.5 Pro is a language model that will provide 1 Mn tokens to process queries, remember conversation history and take into account the largest context among competitors. The number of tokens available will be doubled in the summer. Gemini 1.5 Flash - Fast AI API for near real-time results.
AI for learning will systematise and explain learning material, including everyday examples.
Circle to search - the chip announced in the Galaxy S24 Ultra, which then made its way to the Google Pixel, will soon appear on all Android smartphones. As a reminder, it's enough to circle an object on the screen for Google to determine what exactly is depicted there and launch a search for that object.
Deep context analysis. For example, if you ask Gemini to organise a trip, it will not only buy a ticket, but will also suggest a place to stay and check the weather forecast. Or when ordering shoes, it will be able to adjust your size from your Gmail correspondence.
Gemini can be queried not only by text or voice, but also by camera. In the demonstration video, Gemini explained what the object in the frame was doing, analyses software code on the fly and explains what it does, solves puzzles, remembers where the object that was seen in the frame before was.
The Gemini era of search. Not only will search immediately produce AI-generated excerpts and answers, search will now be able to make plans and tasks. You'll be able to enter complex queries, and then Gemini will plot a route, check an establishment's rating, or create a menu for the week if needed, immediately making a list of purchases and places to make them.
Gmail will now provide a statement of emails, and it will be possible to combine multiple emails and do a statement of emails together. AI will also be able to reply to all the senders of those emails, add tasks to your calendar, or even create a Google Sheets table of all the similar offers that have arrived in the mail.
For teamwork, Gemini will analyse all your work chats, search and analyse information in them, and reply to the right colleagues even if you don't know in which chat the right discussion is taking place.
Gemini can be sent a PDF file of up to 1500 pages for analysis and ask for a brief translation. Or a video up to 1 hour long, and Gemini will analyse all the fragments and, if necessary, show exactly the segment the user needs.
Gemini can be asked why a certain mechanism is not working, such as a DJ console or a camera, and the AI will give an answer on how to fix it. To do this, the AI recognises the mechanism in the frame, its model, the action the user is trying to do, uses this data to make a search, analyses it and extracts exactly the necessary information.
AI for programmers will generate code according to the description of the task, create a database of photos of objects, and search for errors.
Protection against fraudulent calls. AI will analyse your conversations in real time, and if the interlocutor behaves suspiciously, for example, asks for your bank details, it will produce a sound signal and a warning about possible fraud.
The artificial intelligence will support 35 languages and will have the largest contextual window of any competitor.

This year, Sundar Pichai responded to the humour regarding last year's Google I/O presentation and immediately provided the statistic that the acronym "AI" was uttered 120 times in the nearly 2-hour presentation. And then one more time.

Source: Google I/O

Google I/O Announcements