Is the hype around AI Agents that Silicon Valley giants are talking about real or just a bubble?

Is the hype about Silicon Valley giants' AI Agents real or just a bubble?

AI Agents are destined to be a marathon.

After the huge success of ChatGPT, OpenAI is already moving towards the next goal – AI Agents.

“If a paper proposes a different training method, OpenAI internally dismisses it as something we’ve already done. But when new AI Agents papers come out, we discuss them very seriously and enthusiastically. Ordinary people, entrepreneurs, and geeks have an advantage over companies like OpenAI in building AI Agents.” said Andrej Karpathy, co-founder of OpenAI and former Tesla AI director.

Karpathy’s public statement has added a lot of heat to AI Agents. But his judgment is not unique.

As early as March, AutoGPT received 74,000 stars on GitHub and quickly became the fastest-growing open-source project in history. Subsequently, BabyAGI and AgentGPT emerged like mushrooms after rain: ordering pizza, organizing emails, creating blogs, and even hosting a Valentine’s Day party…

More and more AI Agents are appearing in various scenarios of people’s lives, and the trend is quickly spreading from Silicon Valley.

Autonomous execution and independent operation have given AI Agents high expectations from technologists, who believe that it is a “productivity tool that changes society.” Some even see it as “the beginning of the era of general artificial intelligence (AGI).”

However, the voices cannot hide the existing problems.

“Large models are the premise of AI Agents. Only with a good enough hardware foundation can AI Agents be developed.” said Dai Yushen, managing partner of ZhenFund.

Strictly speaking, there is only one “qualified” large model base on the market, which is ChatGPT. Due to the model’s computing power, there is still a lack of development soil for AI Agents in China.

The future is bright, but the reality is cruel. Technological research and development, as well as venture capital investment, are all in a state of fluctuation. When will the dividend period of AI Agents really come with the wave of large models, no one knows. But one thing is certain, the change has quietly begun.

1. AI Agents: “Digital Assistants” that help you get things done

Instead of treating AI Agents as an upgraded version of ChatGPT, it is more appropriate to see them as “digital assistants” for humans.

AI Agents not only tell you “how to do” but also “help you do”. As a medium, AI Agents replace humans to interact repeatedly with large language models (LLMs) such as GPT. As long as the goal is given, it can simulate intelligent behavior, autonomously create tasks, reassess the priority of the task list, complete the most important tasks, and iterate until the goal is achieved.

Unlike traditional artificial intelligence, AI Agents can operate independently without human control. By accessing APIs, AI Agents can even browse web pages, use applications, read and write files, make credit card payments, and so on.

In simple terms, all you need to do is give it a goal, and AI Agents can complete the rest of the work. For example, HyperWrite’s AI agent can automatically help you order pizza through a control program in the Chrome browser.

Source: HyperWrite CEO Matt Shumer’s Twitter account

This kind of imagination is not difficult to find in science fiction movies, but in the course of artificial intelligence exploration, it has been going on for almost half a century.

As early as the 1980s, computer scientists began exploring how to develop intelligent software that can interact like humans. However, due to limitations in data and computing power, AI Agents lacked the necessary realistic conditions.

Joon LianGuairk, a computer science Ph.D. from Stanford University, once said in an interview, “We have been working towards that direction, but all the methods of the past few decades, even haven’t come close to what we have achieved with LLM… That’s why we forgot this vision. But when LLM appeared, we realized the opportunity had come.”

Large Language Models (LLM) are the core brains of AI Agents. By breaking down complex tasks, complex user needs can be broken down into achievable ways of tasks.

On the one hand, the training of large models is based on the Internet and includes a large amount of human behavioral data, which makes up for the key elements of building trustworthy AI Agents.

On the other hand, with a considerable knowledge capacity, large models have emerged with excellent contextual learning and reasoning abilities. By establishing thinking chains to achieve continuous thinking and decision-making of the model, AI Agents can analyze complex problems and break them down into simple and detailed sub-tasks.

At the same time, LLM, using language as a medium, has also changed the form of front-end interaction. Wen Yongteng, the head of BV Baidu Venture Capital’s AI application track and vice president of investment, told “Jiazi Light Years,” “BV Baidu Venture Capital has been paying attention to the development of AI Agents for a long time. Through analysis, we believe that the original graphical user interface (GUI) may be transformed into a language user interface (LanguageUI), and the front-end applications of AI Agents will exist in all possible forms of human interaction.”

Just breaking down tasks is far from being intelligent. AI Agents driven by LLM cannot do without three key components:

Planning: Breaking down large tasks into smaller, manageable sub-goals; reflecting and refining, analyzing, summarizing, and extracting past behaviors to improve their intelligence and adaptability, and improve the quality of the final results.
Memory: Short-term memory for contextual learning; long-term memory, the ability to store and retrieve unlimited information, usually achieved through external carriers for storage and quick retrieval.
Tool use: Can learn to call external APIs to obtain additional information missing from the model weights.

Overview of AI Agent System under the LLM Drive

Image source: Lilian Weng’s personal blog

With the collaboration of three components, AI Agents can not only think like humans but also act like humans.

Just like humans, there is often a reasoning process between each step when engaging in complex tasks. AI Agents also use the ReAct component (a JavaScript library for building user interfaces) to closely integrate the reasoning ability of large models with behavioral decision-making, allowing language models to logically plan and arrange based on knowledge.

The Reflexition framework provides AI Agents with the ability of dynamic memory and self-reflection. By reinforcing Language Agents through language feedback rather than updating weights, it allows them to improve their past action decisions, correct past mistakes, and continuously improve their performance.

In the process of information acquisition, storage, retention, and retrieval, AI Agents also strive to mimic human memory construction and build efficient memory systems.

Simulating human memory, AI Agents represent sensory memory, short-term memory, and long-term memory as learning embeddings of raw inputs (such as text, images, etc.), contextual learning, and external vector storage, respectively. Tasks and results are stored in the memory module, and when information is called upon, the information stored in memory will return to the conversation with the user, creating a tighter contextual environment.

One of the most prominent features of humans is the use and creation of tools. Equipped with external tools and using APIs to call various interfaces, AI Agents are able to simulate the use of tools by humans and complete more complex tasks.

Although the technology is not yet fully mature and issues such as data management and long-term memory are still being resolved, the ability of AI Agents to autonomously execute, iteratively optimize, and “free hands” makes their rise to popularity inevitable.

2. Replacing LLM, AI Agents become the next AI hotspot

The birth of ChatGPT enables AI to engage in multi-turn conversations with humans and provide information and suggestions. The launch of Copilot allows AI to assume the ability to complete initial drafts of work for humans, such as Github Copilot, Microsoft 365 Copilot, Midjourney, respectively becoming “intelligent co-pilots” in the fields of programming, office work, and image generation.

Tell AI to complete a task, and it can complete the task – writing copy, answering questions, or generating photos that are difficult for the human eye to distinguish between real and fake. However, at the same time, people often need to provide specific and clear instructions for each step of AI’s actions.

At this point, AI is like a newcomer, without any experience, and needs to be taught step by step like an intern. But what if you want an employee who listens to instructions, solves difficulties on their own during execution, and tries not to cause trouble for others?

In March and April, several AI Agents such as Camel, AutoGPT, BabyAGI, and Westworld Town exploded, seemingly showing people this possibility.

Since March, after Significant Gravitas open-sourced AutoGPT, in less than two months, AutoGPT has gained 130,000 stars on GitHub, becoming the fastest growing open-source project in terms of star count in history.

Western World Town created by Stanford University

Image source: Paper “Generative Agents: Interactive Simulacra of Human Behavior”

Andrej KarLianGuaithy once stated on Twitter: “The next frontier of prompt engineering is AutoGPTs.” As of now, AutoGPT has received over 140,000 stars on the code hosting platform Github, ranking 25th in history.

Sam Altman, co-founder and CEO of OpenAI, has repeatedly stated that the era of building large AI models is over, and the challenge lies in intelligent agents.

In an article introducing autonomous agents, Matt Schlicht, co-founder and CEO of Octane AI (a data marketing platform provider), collected the views and opinions of hundreds of people from the industry, academia, and investment communities. The experts include professionals from companies such as Meta, Nvidia, and Stability AI, as well as instructors from Stanford CS and AI investors who have invested in companies including Hugging Face. The majority of them expressed their expectations and prospects for the potential of AI agents, even calling them “primitive AGI”.

While AI agents seem to be the next hot topic in AI, there are also voices of opposition.

Yoshua Bengio, Turing Award winner, mentioned in his blog post “How Dangerous AI Came About” published in May of this year that the fact that humans can control the overall tasks and goals of AI agents does not mean that humans can control the subtasks and subgoals that AI agents decompose with their own intelligence. Unless there is a breakthrough in AI alignment research, humans will not have strong security guarantees.

The emergence of intelligent agents, the pursuit and questioning by industry leaders, have made the wave of AI agents rapid and heated.

However, AI agents are not a new term within the field of artificial intelligence.

In 2014, DeepMind launched the Go AI AlphaGo, which is actually a type of AI agent. Similar to this, in 2017, OpenAI released OpenAI Five for playing Dota 2, and in 2019, DeepMind announced AlphaStar for playing StarCraft II.

At that time, the trend in the industry was to train and improve AI agents using reinforcement learning methods, mainly applied in game scenarios, especially in some competitive matches with clear winners and losers. However, achieving generality in the real world was still an unresolved problem.

In the following years, OpenAI shifted to large language models, with the successive release of the GPT series. Large models became the race that various technology companies rushed into. It was precisely the development of large models that provided a breakthrough bottleneck for AI agents and a chance for their redevelopment.

Compared to a few years ago, when they were limited to game scenarios, what can AI agents achieve based on large models? Wen Yongteng, head of Baidu Ventures’ AI application track and vice president of investment, told “Jiazi Light Years”: “What we see is not only technological progress that greatly enhances the ability of AI to understand user intentions, collect information, and perform tasks, but more importantly, AI agents have the ability to completely reconstruct the future application ecology.”

Shortly after the launch of AutoGPT, many netizens have used AutoGPT to build automated personal assistants. For example, Udit Goenka, the founder and CEO of FirstSales.io, posted that he used AutoGPT to build an exploratory engine that can search for companies that received seed round investments last year and provide detailed information on creating lists.

Google software engineer Yew Jin Lim said that he used AutoGPT to create an email assistant that sends task details to AI Agents via email.

Daisen, a managing partner of True Fund, told “Jiazi Light Year”: “Agent is a direction that can greatly improve productivity, because if things are still done by humans, humans are always limited.”

“AI Agents will become productivity tools in daily life and work,” Matt Schlicht wrote. “From managing social media accounts and investment markets to publishing the best children’s books, AI Agents will exist in every industry and every imaginable task.” For example, aomni is an AI Agent that can search for any topic information on the Internet and complete users’ goals one by one by creating lists.

In addition to productivity needs, Inflection AI’s personal AI Agent Pi provides another possible application direction.

Unlike ChatGPT and Claude’s positioning as general artificial intelligence, Pi focuses on emotional intelligence, emotional companionship, and providing emotional value. Pi also remembers the user’s history of conversations and not only participates in and assists people’s work and life but also learns how to connect with friends and family. Currently, Inflection AI has received over 1.5 billion US dollars in investment, surpassing Anthropic and second only to OpenAI.

3. Will AI Agents be the next trend?

“Building a kind of JARVIS,” Andrej KarLianGuaithy’s latest update on Twitter, JARVIS is the artificial intelligence assistant of the Marvel superhero Iron Man, with independent thinking ability and can help the owner deal with various matters and calculate various information.

KarLianGuaithy’s profile also implies that the starting gun for the AI Agents race has been fired.

The foreign media “The Information” pointed out that Sam Altman privately told some developers in May that OpenAI hopes to turn ChatGPT into a personal work assistant, and insiders pointed out that OpenAI has been focusing on how to use chatbots to create autonomous AI Agents, and related functions are likely to be deployed in the ChatGPT assistant.

Similarly, Meta also sees the opportunity of AI Agents.

As early as April, Zuckerberg told investors that Meta sees the opportunity to introduce AI Agents to billions of people in a useful and meaningful way, but he did not specify the specific applications at that time.

At a company-wide meeting held in June, Zuckerberg announced a series of technologies at different stages of development, one of which is to bring AI Agents with different personalities and abilities to provide help or entertainment, initially mainly for Messenger and WhatsApp.

In China, products related to AI Agents have also emerged one after another.

At the WAIC event in early July, Alibaba Cloud released its first intelligent agent, ModelScopeGPT, targeting developers, and plans to launch a series of intelligent agents in the future to address various application scenarios.

Huawei is also involved in this field, but focuses more on embodied AI, which combines large models with robots.

In addition to large companies, AI Agents also present opportunities for entrepreneurs. KarLianGuaithy, co-founder of OpenAI, specifically mentioned in a previous speech: “Ordinary people, entrepreneurs, and geeks have an advantage over companies like OpenAI in building AI Agents.”

Wen Yongteng, Head of AI Application Track at BV Baidu Venture Capital, and Vice President of Investment, expressed optimism about the opportunities for startups in the AI Agents field.

“The future application ecosystem will be diversified, rather than dominated by a single giant. The emergence of AI Agents brings an opportunity for a paradigm shift, and many traditional applications face the possibility of disruption and transformation. In this process, startups have a lot of opportunities to explore new areas. For each specific task, AI Agents have a lot of optimization space, including the construction of specific algorithms and services, user data, and product design, which are places where startups can establish differentiated advantages.”

“In addition, the current AI Agents ecosystem is not yet clear, which provides favorable development opportunities for startups, because they do not need to compete under established rules. From this perspective, startups and large companies are on the same starting line, and startups are more flexible and can quickly adjust their products.”

With the cognitive accumulation in the field of artificial intelligence for many years, BV Baidu Venture Capital does not believe that model companies will monopolize opportunities at the application layer. For underlying model companies, building an ecosystem is more significant than monopolizing a particular application. If the underlying model companies adopt exclusive strategies to gain competitive advantages at the application layer, it may harm their own ecosystem. Underlying model companies may build powerful AI Agents in one or two areas they focus on, but they do not need to compete with startups in all areas.

In an undetermined ecosystem, a game without established rules, everyone is back on the same starting line.

However, it is undeniable that so far, apart from many demonstrations, AI Agents have not yet had real products.

Da Yuqin, Managing Partner of ZhenFund, compares the degree of AI and human collaboration to different stages of autonomous driving, and AI Agents are like the L4 stage of autonomous driving. But just like L4, AI Agents are easy to imagine and demonstrate, but difficult to implement. The true application of AI Agents is still uncertain in the future.

Comparing the degree of AI and human collaboration to different stages of autonomous driving

Image Source: Dai Yusen’s Instant Account @yusen

Dai Yusen emphasizes that in order to achieve usable AI Agents, it is necessary to greatly improve the capabilities of large models. Even for OpenAI at the top level, there is still a lot of room for improvement in terms of latency and performance.

“If we use a steam engine as an analogy, steam can only be generated when the water is heated to 100 degrees. If the intelligence of AI Agents has not reached a certain level, even if the water has been heated to 50 degrees, it is still unable to generate steam, even though a lot of energy has been consumed. It is still zero.”

The starting gun for the AI Agents track has already been fired. However, this is definitely not a short sprint within a few months, but a marathon that is destined to last for several years, or even ten years.

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Was this article helpful?

93 out of 132 found this helpful

Is the hype around AI Agents that Silicon Valley giants are talking about real or just a bubble?

1. AI Agents: “Digital Assistants” that help you get things done

2. Replacing LLM, AI Agents become the next AI hotspot

3. Will AI Agents be the next trend?

Was this article helpful?

Be vigilant of hidden Rug Pulls, as well as exit scams caused by contract storage.

How to handle the confiscated virtual currency worth 400 billion yuan involved in the case?

Blockchain

Wu's Weekly Picks CoinEX attacked, FTX's coin selling rules, Binance US layoffs, and Top 10 news (September 9-15)

Hong Kong Stock Exchange with cross-border marriage: will enter digital asset trading within three years

The three countries of China, Japan and South Korea exchanged cold on the same day? The reason behind it is not simple

"Japan Amazon" Lotte launches cryptocurrency transaction service

SBF Trial Records Fully Exposed Blame-shifting, Amnesia, Contradictions

Eat Reason Rationally | Who is the real winner of the Steem incident?