ChatGPT has new voice and image recognition superpowers

Jump ahead

Key Highlights

OpenAI is rolling out two new features for ChatGPT.
Users will now be able to prompt the chatbot with their voice or a picture.
The features will initially only be available to users who are subscribed to ChatGPT Plus.

On the heels of ChatGPT’s meteoric rise in popularity, OpenAI has announced plans to launch voice and image prompts within the next two weeks. The new capabilities aim to increase convenience, but also raise concerns over potential misuse as ChatGPT grows more powerful.

Voice Prompts Promise More Natural Conversations

The voice prompts feature will allow users to tap a button and verbally ask ChatGPT questions. Speech recognition will transcribe the query into text for the AI to process. It will then convert the text response back into natural sounding speech.

This evolution aims to enable more fluid, conversational interactions beyond just typing. OpenAI leveraged its advanced Whisper speech synthesis model to realistically clone voices from short samples.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb

— OpenAI (@OpenAI) September 25, 2023

The company sees promising applications in voice tech, like developing AI voice translations for podcasts on Spotify. However, realistic voice cloning also risks enabling new impersonation scams and fraud if misused.

Analyzing Images Opens New Doors and Dangers

Submitting images to prompt ChatGPT similarly intends to make interactions more dynamic. Like Google Lens, ChatGPT will inspect photos, drawings, and diagrams to discern the user’s intent.

Show ChatGPT one or more images. Troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data.

— OpenAI (@OpenAI) September 25, 2023

Additional text and voice clarification options help fine-tune the image prompts. But this capability to extract insights from pictures provided by users also raises privacy issues. There are clear risks of bad actors attempting to uncover personal details.

Guardrails Exist But Scope Remains Unclear

OpenAI stated that guardrails are in place to prevent misuse and limit new features to certain use cases through partnerships. However, the company provided few specifics on the safeguards’ breadth and limitations.

We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.

Past incidents revealed ChatGPT’s vulnerabilities to generating harmful content when pushed. As capabilities expand, so do the avenues for potential exploitation if protections falter.

Balancing Innovation With Responsible Development

The voice and image features underscore OpenAI’s central challenge – balancing rapid AI innovation users find helpful while preventing societal harms.

ChatGPT’s phenomenal growth makes this balancing act even more crucial. Only time will tell if OpenAI’s precautions are sufficient as its popular chatbot grows more powerful.

For now, users eager for added convenience may find the new capabilities worth the risks at an individual level. But on the societal scale, caution is warranted with any game-changing AI.