OpenAI has introduced the world to its latest powerful AI model, GPT-4, and refreshingly the first thing they partnered up on with its new capabilities is helping people with visual impairments. Be My Eyes, which lets blind and low vision folks ask sighted people to describe what their phone sees, is getting a “virtual volunteer” that offers AI-powered help at any time.
We’ve written about Be My Eyes plenty of times since it was started in 2015, and of course the rise of computer vision and other tools has figured prominently in its story of helping the visually impaired more easily navigate everyday life. But the app itself can only do so much, and a core feature was always being able to get a helping hand from a volunteer, who could look through your phone’s camera view and give detailed descriptions or instructions.
The new version of the app is the first to integrate GPT-4’s multimodal capability, which is to say its ability to not just chat intelligibly, but to inspect and understand images it’s given:
Users can send images via the app to an AI-powered Virtual Volunteer, which will answer any question about that image and provide instantaneous visual assistance for a wide variety of tasks.
For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it, but also extrapolate and analyze what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide on how to make them.
But the video accompanying the description is more illuminating. In it, Be My Eyes user Lucy shows the app helping her with a bunch of things live. If you’re not familiar with the rapid-fire patois of a screen reader you may miss some of the dialogue, but she has it describe the look of a dress, identify a plant, read a map, translate a label, direct her to a certain machine treadmill at the gym and tell her which buttons to push at a vending machine.
Source: Techcrunch