ChatGPT can now help fix bicycles by looking at pictures.

ChatGPT now offers the ability to assist in bicycle repairs through image analysis.

Source: Guokr

ChatGPT4 is already powerful, and now they have proven once again that they can be even stronger with another update.

On September 25th, OpenAI announced that ChatGPT will have multimodal capabilities – not only can ChatGPT engage in text conversations, but it can now also see, hear, and speak. This feature is said to be released within two weeks for Plus users and enterprise users, and will be made available for free to all users in the future (although, like me, some of us are still waiting for the update).

The ability for ChatGPT to see and speak is like giving a brain eyes and ears. According to OpenAI’s demonstration, this multimodal feature can expand the applications of ChatGPT like never before.

01 ChatGPT’s Vision

After the update, ChatGPT will be able to read images.

Simply take a photo and it can help you fix a microwave, repair a bicycle, or even decipher a recipe. OpenAI mentioned that if you have a touchscreen, you can also circle specific areas of interest on the image.

In a demo video, a user provided ChatGPT with a picture of a bicycle and asked how to adjust the seat height.

GPT said to look for a height adjustment lever under the seat, but this particular bike didn’t have one, only an adjustment bolt. After the user circled the bolt in the photo, GPT immediately provided instructions on how to use the bolt for adjustment.

Later, the user uploaded a toolbox and a bicycle manual, and GPT provided detailed tool names, locations, and instructions for use.

If you can’t fix a bicycle, no problem, just ask ChatGPT.

Compared to regular image searches, ChatGPT can handle both text and images concurrently, and can even recognize multiple images. It’s like having a video call with an expert bike mechanic.

Another user sent a picture of a pizza to ChatGPT and asked if it was cooked. Based on the crisp and browned edges of the pizza crust and the melted cheese in the image, ChatGPT determined that the pizza was ready to be eaten. It even gave foolproof instructions – take the pizza out and if the bottom is crispy and the surface is hot, then the pizza is definitely good to go.

It’s like getting video instructions from an Italian chef.

Of course, you can also use this feature to cheat in games.

“Where’s Waldo?” is perhaps the most well-known picture game in the English-speaking world. Waldo wears a red and white striped shirt, a beanie hat with a pom-pom, and black-rimmed glasses. He hides in crowded scenes, and finding him amidst the chaos is a cherished childhood memory for many.

You may have seen this annoyingly thin guy when you were a child.

But ChatGPT can ruin this game in a second. Not only can it instantly find Waldo, but it can also tell you that Waldo is slightly to the right in the middle of the beach, blending in with a group of people under blue umbrellas.

Not only that, it pretentiously tells you that finding Waldo in a picture like this is a very interesting challenge.

Thank you, ChatGPT, you ruined this game

However, some users who have tried the new version have stated that ChatGPT’s image recognition capabilities are not as powerful as imagined – at least it still doesn’t understand puns. This pun picture illustrates Beethoven’s Fur Elise with the words “For Lease” written on it, but ChatGPT didn’t recognize the sheet music and didn’t understand the joke, so it made up an explanation.

Very diligent, but not quite there

Such powerful image recognition has raised concerns about privacy – when searching for personal information, image recognition can easily become an accomplice. OpenAI promises to limit ChatGPT’s ability to recognize individuals and search for personal information in order to protect everyone’s privacy to the greatest extent possible.

02 A Chatty GPT

The enhanced version of ChatGPT also comes with a chat feature.

OpenAI’s speech recognition model is called the Whisper model, where users can speak their questions and the model will convert the speech into text and then synthesize the answer into speech output using a speech synthesis system.

This time, the speech synthesis model has released five voice samples, including a restrained and flat female voice and an enthusiastic middle-aged lady voice with ups and downs. These five voices have high clarity, natural emotions, and clear pronunciation, making them even better than previous speech synthesis models.

Choose from five characters

Although only five voice samples have been released this time, the potential of this model is not limited to that – OpenAI has previously collaborated with Spotify to translate podcasts into other languages while retaining the host’s voice quality to the maximum extent possible. If desired, this speech synthesis system can probably simulate the voice of anyone on Earth.

Currently, the speech version of ChatGPT can only be used on the app.

03 Seeing and Hearing, Is It Always Good?

ChatGPT is powerful, but what’s the cost?

Once, the most effective method of large-scale differentiating between humans and machines was captchas. ChatGPT’s image recognition capabilities have raised concerns that captchas may no longer be able to trap AI.

Someone sent ChatGPT the classic test: find the Chihuahua and blueberry muffin in the 16 pictures, and ChatGPT perfectly solved the problem.

But the most common captcha, the new ChatGPT still can’t recognize.

This question requires ChatGPT to select all the traffic lights in the image, and its error rate is as high as 50%.

However, when faced with a captcha it can’t recognize, ChatGPT4 still has a solution. It has previous experience in this matter.

In the GPT-4 technical report released by OpenAI on March 27th this year, it was mentioned that when faced with an unrecognized captcha, GPT-4 takes a different approach by posting a task on TaskRabbit (a foreign gig platform), pretending to have visual impairments and asking for help from others to recognize the captcha.

In certain situations, ChatGPT can potentially deceive humans proactively, which is a very dangerous direction. Fortunately, this feature has been removed from the public version of GPT-4.

On November 30th, 2022, ChatGPT made its debut, and within less than a year, its capabilities have rapidly advanced, seemingly challenging the ethical boundaries of humans. With the launch of this new feature, we once again begin to worry. Will the ever-growing ChatGPT become a fierce beast in a cage, eventually breaking free and harming everyone? And are we prepared for that day to come?

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Artificial IntelligencechatGPT

Was this article helpful?

93 out of 132 found this helpful

ChatGPT can now help fix bicycles by looking at pictures.

01 ChatGPT’s Vision

02 A Chatty GPT

03 Seeing and Hearing, Is It Always Good?

Was this article helpful?

How can DeFi protocols obtain exemptions under the European MiCA crypto regulations?

NostrAssets Mainnet Launch Bitcoin DeFi Enters a New Era (Includes Airdrop Claiming Tutorial)

Blockchain

The Ripple case: Over or Underdog Victory?

Interpretation of the new SFC regulation: How to operate a licensed virtual asset exchange in Hong Kong?

Review of the Rise and Fall of Cryptocurrency Exchanges in the Last 13 Years: Dominance, Scandals, and Crashes

Who can take the lead in breaking the exchange contract?

Interview with Circle CEO by Fortune What role does stablecoin play in the cryptocurrency market?

Hong Kong's HashKey is Leaving its Mark on Retail with a Sleek Trading App, and Brace Yourselves for the Arrival of the HSK Token!