Apple researchers are developing an advanced AI system to improve voice assistants

By: Bohdan Kaminskyi | 03.04.2024, 00:22

Jimmy Jin/Unsplash.

A team of Apple researchers has unveiled a new artificial intelligence system called ReALM (Reference Resolution As Language Modeling). It is capable of understanding ambiguous references to objects on the screen, as well as taking into account conversational and contextual background, allowing for more natural interaction with voice assistants.

Here's What We Know

ReALM uses large language models to transform the complex task of resolving screen references into a language modelling task. This approach has shown significant performance gains over existing methods.

"Being able to understand context, including references, is essential for a conversational assistant" Apple researchers noted. They demonstrated that ReALM outperforms even GPT-4 on this task.

A key innovation of ReALM is the reconstruction of the screen into a textual representation that conveys the visual layout and location of objects. This, combined with fine-tuning of language models, has provided significant improvements in screen reference resolution.


ReALM understands references to on-screen objects, enabling more natural interaction with voice assistants

The research highlights the potential for specialised language models to solve specific problems in production systems where huge end-to-end models are difficult to use. Apple's publication signals its continued investment in improving the usability of Siri and other products.

However, the authors caution that automated screen analyses have limitations. More complex visual tasks are likely to require computer vision and multimodal approaches.

While competitors are aggressively adopting generative AI, Apple is trying to close the gap in this rapidly evolving field. The company is expected to unveil new features based on large language models and artificial intelligence at the upcoming WWDC conference.

Source: VentureBeat