Even the legendary GPT cannot create your dream divine vehicle

GPT cannot create your dream vehicle.

After ChatGPT became popular, large AI models have become a hot topic pursued by many technology companies. From chat conversations, to image generation, to desktop office, it seems that AI has gained the power to subvert everything overnight.

The trend has spread to the automotive industry, and practitioners are beginning to think: is it feasible to let GPT build cars?

Some car companies have announced the application of large model technology, some have said they will access third-party large models, and some are rushing to release automatic driving systems with the word GPT in them.

Some practitioners have told DeepTech that intelligent cabins and automatic driving may be the first scenes where large models are applied. Among them, automatic driving is the most anticipated.

Automatic driving is an extremely difficult track. In addition to Google, Baidu and other technology giants, a large number of talented entrepreneurs have invested billions of dollars in it, but they have not yet achieved satisfactory results.

Will the entry of large AI models into automatic driving be different this time?

What’s the relationship between GPT and cars?

At first glance, GPT and cars have no direct relationship, but in fact, they have a deep origin. The story dates back to six years ago.

In June 2017, Tesla’s boss, Musk, poached a Slovak researcher from OpenAI. This person is Andrej KarBlockingthy, who later became Tesla’s AI director.

At that time, Musk showed great interest in artificial intelligence, and he was also one of the donors of OpenAI. Shortly after bringing Andrej KarBlockingthy to his side, Musk left the OpenAI board of directors. He believed that Tesla and OpenAI were both researching AI, and there may be a conflict of interest in the future.

Later, Andrej KarBlockingthy rewrote the automatic driving algorithm at Tesla and developed the BEV pure visual perception technology, which brought Tesla’s automatic driving into a new stage. His former employer, OpenAI, then bet all the chips on general artificial intelligence and ultimately developed GPT.

From a product perspective, OpenAI’s GPT and Tesla’s BEV are completely different species. But from the underlying technology point of view, they both rely on artificial intelligence technology, especially the application of the Google Transformer model.

Transformer is a neural network architecture for deep learning proposed by eight AI scientists at Google in 2017. This is an extremely important invention in the artificial intelligence industry. The “T” in ChatGPT, which is popular today, refers to the Transformer large model.

Unlike traditional neural networks RNN and CNN, Transformer has good ability to process time series data by mining the connections and correlations of different elements in the sequence through self-attention mechanism. This makes it perform outstandingly in tasks such as machine translation, text summarization, and question-answering systems.

Therefore, Transformer was originally used in the field of NLP (advanced natural language processing) to understand human text and language.

Through pre-training on the Transformer model and continuous fine-tuning and iteration, OpenAI has successively launched language training models such as GPT-1, GPT-2, GPT-3, and GPT-4. ChatGPT is a dialog chatbot developed by OpenAI after fine-tuning the GPT-3 model. Because it can interact in a conversational manner and is easy for ordinary people to use, and appears smarter than previous chatbots, it has been highly praised.

Fundamentally, the GPT model of ChatGPT, Google’s LaMDA large model, and Baidu’s Wenxin large model are of the same origin.

Using the Transformer model for natural language, chat applications such as ChatGPT were born; and using it in computer vision also achieved amazing results, with Tesla being a pioneer in this field.

As the director of Tesla AI, Andrej Karpathy led the computer vision team for autonomous driving. By combining the Transformer model, Tesla successfully developed the BEV technology.

BEV stands for Bird’s Eye View, which can convert the 2D image captured by the camera into a 3D image by stitching and transforming it to a bird’s-eye view for processing, forming a “God’s perspective.” The reason for doing this is that driving is done in three-dimensional space, and what people see is a stereo world, not a 2D image.

This new perception solution was demonstrated to the public by Andrej Karpathy at Tesla AI DAY in August 2021. For this reason, Tesla rewrote the autonomous driving algorithm and restructured the infrastructure for training deep neural networks.

This is the first time that large model technology has been applied to the autonomous driving industry.

Looking back now, although GPT is currently mainly used in the field of natural language processing, we cannot let GPT drive a car, but the AI large model technology behind it, especially the Transformer architecture, has actually been applied in the field of autonomous driving for a long time.

From natural language processing to computer vision, the two fields have achieved unification in modeling structure based on the Transformer architecture, making joint modeling easier.

As understanding of AI deepens, automobile companies are becoming more like AI companies. In addition to Tesla, Ideal Automotive announced its vision earlier this year, claiming to become an AI enterprise by 2030. The city NOA navigation assistance driving system it will launch this year is supported by BEV perception and Transformer model technology.

Teaching AI to have a conversation with humans seems no different from teaching AI to drive a car, and the only difference is the application scenarios. In applying underlying technologies to specific products, humans are always full of imagination.

What GPT has taught us about autonomous driving

This year, the powerful capabilities demonstrated by GPT have made a big impression on the outside world. General artificial intelligence is no longer just an idea. People in the autonomous driving industry have started to think that the application ideas of generative AI in language models can be transferred to autonomous driving.

Fundamentally, the language model is a mathematical model built for human language. Computers still do not understand natural language, but through mathematical modeling, language problems have been transformed into mathematical problems. By predicting the probability of the next word appearing based on the history of the given text, the computer has indirectly understood natural language.

If we switch to the driving scenario, if the current traffic environment, a navigation map, and the driving history of a driver are given, then can the large model predict the next driving action?

Yu Kai, the founder of Horizon Robotics, said at an electric vehicle forum held in April this year that ChatGPT has given him a lot of inspiration. “We need to continue to use big data, bigger data, and bigger models, and learn human driving attempts unsupervisedly, just like you learn from a large amount of unsupervised, untagged natural text.” He believes that each driver’s driving control sequence is like our natural language text. Next, he wants to build a large language model for regression autonomous driving.

In theory, this idea is feasible. Artificial intelligence already has learning ability. Based on the adaptive language model, the machine will continuously iterate and optimize based on user feedback, learn user habits, and then improve the model. The current ChatGPT is using this technology. Therefore, teaching machines to learn drivers’ driving habits is not a difficult task.

Tesla’s shadow mode is to feed machine learning with real driver’s driving data. By comparing human driver behavior, the goal is to train the algorithm.

After GPT sparked a new round of AI boom, one of the cognitive impacts on the industry is that by continuously increasing the parameter scale of the model and exponentially increasing the amount of data, that is, the so-called large model, after reaching a certain critical point, the model will suddenly become very intelligent.

In the past, the data required by the model during the training phase was manually labeled. Taking autonomous driving as an example, data annotators tell the machine what cats and dogs are and how many types of cats and dogs there are through a large number of image annotations. The annotator is like the teacher of the machine, teaching it to recognize the world repeatedly.

The problem is that the machine still doesn’t know what the teacher hasn’t taught. A typical example is that Tesla has had multiple automatic driving accidents where the vehicle collided with overturned trucks because the machine couldn’t recognize them.

He Yuhua, founding partner of High Capital, gave Deepway an example: Guangzhou has frequent rainy days in summer, and there are a large number of flying insects in the air in some dimly lit scenes. When the car passes by, the lights shine, and thousands of flying insects may hit the front of the car. In this case, the automatic driving perception system of the car may mistakenly identify it as a wall.

The autonomous driving system cannot exhaust all corner cases, which is a major difficulty in its development.

ChatGPT captures unmarked data from the entire network. In self-supervised learning, the data itself is used as a supervision signal, rather than relying on manually labeled tags. One day people discovered that the large model suddenly had the ability to generalize from these data during the digestion process.

So, if the autonomous driving large model can also learn human driving behavior without supervision, without the need for a “teacher” to teach it step by step, does it mean that the system has become a “veteran driver”?

GPT “driving” is still not reliable

The dream is beautiful, and the road to realizing the dream is always bumpy.

AI large models like ChatGPT need to solve at least the following problems to exert their power in the field of autonomous driving.

First of all, let’s talk about the data source.

The data source of ChatGPT is very rich, including Wikipedia, books, news articles, scientific journals, etc. It can be said that all public data on the Internet are its nutrients.

Autonomous driving is different. Driver driving data and vehicle driving data are not public, and many of them involve privacy. Car manufacturers and autonomous driving companies are each acting on their own, and data is closed and not circulated, which makes it difficult to obtain data. Without data, autonomous driving is like water without a source.

He Zhiqiang, president of Lenovo Venture Capital, told Shentu that the core of autonomous driving is to have data, which is very important for training models. Host manufacturers like BYD have data, but algorithms still need to be polished. “Weixiaoli” and other new forces in car making are good at algorithms, but the sales volume of cars is not enough. Companies that have both data and algorithms can fully use large models.

Secondly, there are restrictions on the deployment of the system.

Yu Kai believes that OpenAI and ChatGPT are computing in the cloud, with sufficient energy and power supply in the cloud, and have very good systems. However, if the dependence on the vehicle is the battery and the heat dissipation on the vehicle, then this challenge is very big, which means that autonomous driving cannot use such large models and calculations.

The consumption of computing power by large models has made cloud computing companies the first players to benefit from this wave of AI boom. The big factories open up cloud computing, which is also paving the way for large models. But on the vehicle side, this will be a contradiction.

A bigger problem is that the reliability of large models has not been verified.

Those who have used ChatGPT know that ChatGPT sometimes talks nonsense, sometimes right and sometimes wrong. This is known in the industry as a tendency to hallucinate, that is, to produce completely untraceable non-real content. Large models will fabricate content without caring about the truthfulness and accuracy of the content.

Chatting can talk nonsense, but autonomous driving cannot. Any wrong output can result in fatal consequences.

“ChatGPT has made great progress, but autonomous driving has not arrived yet, because autonomous driving, especially unmanned driving, may have zero fault tolerance, which is a matter of life and death.” Yu Kai said.

Long Zhiyong, who was the COO of an AI start-up in Silicon Valley, believes that uncontrollable, unpredictable and unreliable are the biggest threats to the commercialization of large models. The typical manifestation is the tendency of large models to hallucinate.

It is not yet realistic to make autonomous driving systems learn to choose and distinguish, and output the optimal solution stably.

An insider from an AI company told Deep Road: “There are indeed many breakthroughs in visual perception at the algorithm level. But the requirements for the car scene are too high, and I personally don’t think there will be major breakthroughs in the short term. You can pay attention to Tesla’s movements.”

However, there is a trend in the technology circle recently, where companies big and small want to take advantage of the GPT trend. Some car manufacturers have announced that they will soon use GPT-like technology, and a bunch of cool concepts make people dizzy.

For example, an autonomous driving company under a traditional automaker released an autonomous driving generative model, claiming to use this model to train autonomous driving, claiming to be the “industry’s first”.

An investor who has long been concerned about the intelligent automobile track asked an industry leader how he views the model, and the other party replied with four words: “It’s complete nonsense.”

“It’s completely a PR move.” This investor evaluated Deep Road.

Will autonomous driving be pushed to start over?

Driven by Tesla, and combined with this year’s AI trend, the autonomous driving industry is gradually moving towards large models, large computing power, and big data.

The impact of large models on autonomous driving is currently not severe enough, but people with keen sense of smell have shown a contradictory mentality.

Just like when Tesla used Transformer to convert multi-camera data from image space to BEV space, it was willing to overthrow the original architecture and rewrite the algorithm. The application of large models now may also mean that the original autonomous driving algorithm will be pushed to start over.

He Zhiqiang believes that large models will have a huge impact on autonomous driving. Previously, autonomous driving used many small models, but now they have become large models, and they may need to start over. The autonomous driving industry will be reshuffled.

Zhao Dongxiang, the autonomous driving director of an AI chip company, told Deep Road that overall end-to-end changes are equivalent to starting over.

Reshuffling is an opportunity for new entrants and a threat to leading players. The story of overtaking in a bend often occurs during a period of rapid technological change. In the era of technology advancing by leaps and bounds, the more investment is put into the old route, the greater the sunk cost may be, and the more difficult it is to turn around. For the entire vehicle factory or autonomous driving company, in addition to considering the effect, they must also consider the cost of embracing a new technology.

Zhao Dongxiang said that, at the current stage, it is meaningless to change the technical route of autonomous driving. “The industry’s technical capabilities are not bad now. Everyone has spent so much money and done it for so long, and there is no motivation to change if there is no significant improvement.”

At last year’s AI DAY, Tesla upgraded its BEV to occupy the network, further improving its generalization ability. Through network occupancy, Tesla’s autonomous driving perception system can judge whether to avoid without knowing what the object it sees is, thereby solving more long-tail problems.

Regardless of any technical route, it is now in a phase of rapid change and iteration. Small models in the past may be replaced by large models, and today’s large models may also be replaced by some new species in the future.

But in any case, the practice of riding the wave and creating gimmicks is not beneficial to technological progress. “Riding the heat is a bad habit. It is useful to make products down-to-earth.” Zhao Dongxiang said.

The real “ace” of autonomous driving is still far from coming. What we need to do is to maintain awe for every round of technological change. The mythologized GPT cannot create your dream car, but at least, change has happened.

We will continue to update Blocking; if you have any questions or suggestions, please contact us!


Was this article helpful?

93 out of 132 found this helpful

Discover more