Even Grok AI Can 'See' Now

Grok Vision is now live in Voice Mode.

Apr 23, 2025 - 20:00
 0
Even Grok AI Can 'See' Now

There are a lot of trends in generative AI right now. There are the reasoning models like OpenAI's o3, that "think" through each step of a problem before it answers. There are also "deep research" features that can compile information from across the web to generate reports for you.

But perhaps the trend that is most "futuristic" of all is Voice Mode. This is the future 2013's Her promised: a chatbot that you can talk to like any other person. The chatbot doesn't say anything differently than it would if you were chatting over text; however, it responds in a "realistic" and "natural" voice, which could create the illusion that you're talking to a person, not a robot.

I've never found the feature to be particularly engaging, even from big names like ChatGPT. The tech is impressive, sure, but it's still painfully obvious to my ear that I'm talking to a bot. AI companies haven't been able to shake these identifying quirks, but that hasn't stopped people from forming "relationships" with chatbots—even falling in love with them.

What's more impressive to me is the feature's "vision" component. Some chatbots can not only talk back to you, but can access your camera to see what you're seeing, and incorporate that information in its replies. Both ChatGPT and Gemini offer these features, and now, so does Grok.

Grok can see

Grok is the latest chatbot to gain this ability in its Voice Mode. xAI developer Ebby Amir announced the feature, dubbed "Grok Vision," on X Tuesday, noting that Grok Vision supports multilingual audio as well as realtime search. Those latter features are exclusive to SuperGrok subscribers, however.

The feature is already live on my end. You can access it by tapping the existing Voice Mode option. If you haven't used this feature already, you'll need to grant Grok permission to access your device's microphone. Following this, you'll be able to start chatting immediately.

However, to access Vision, you'll need to tap the camera icon in the bottom left corner. Here, allow Grok to access your camera. Once the feed is live, you can start asking Grok about what it sees.

I'm not super keen on sending my live video feed directly to xAI, so I kept my phone directly on the table, so the video feed was all black. Grok, to its credit, tried earnestly to help me fix the problem, suggesting there might be something wrong with the camera, or that my environment was too dark. When I informed it that I had actually taken my phone up to outer space with me, it "laughed," and concluded that had to be the problem: "Ha, outer space, huh? That black feed makes sense now—no light out there, and the camera’s probably not designed for that environment. You might need a space-grade device to get a proper feed."

This is the second big feature drop for Grok this month. Last week, xAI rolled out a memory feature for the bot, which allows it to access past conversations for more relevant responses.