Why is AI Agent the ultimate trump card for AIGC?
Unleashing the Power of AI Agent The Advantages of AIGC's Ultimate Trump Card
Author: Hu Xiaomeng, Chen Chuyi Tencent Research Institute
AI Agent is undoubtedly the most exciting development in large models today, referred to as the “next big battle of large models,” the “ultimate killer product,” and the “Agent-centric era of the new industrial revolution.” On November 7th, OpenAI’s first Developer Day Conference set off the AI Agent craze. OpenAI released the initial form of AI Agent product GPTs, and launched the corresponding production tool GPT Builder. Users can simply chat with GPT Builder and describe the desired GPT functionality to generate a customized GPT. The customized GPT is more suitable for daily life, specific tasks, work, or family. In order to achieve this, OpenAI has also opened up a large number of new APIs (including vision, image DALL·E3, voice), as well as the newly launched Assistants API, allowing developers to develop their own custom GPT more conveniently. Bill Gates recently published an article clearly stating that AI Agents will thrive in the next five years, and every user will have a personalized AI Agent. Users no longer need to use different apps for different functional requirements, they only need to tell their Agent in everyday language what they want to do.
Within one week of the release of GPTs, there were already more than 17,500 of them.
So, what exactly is an AI Agent? Why is it so important that it has such high attention in the industry, with scholars even asserting that “if the American Agent Store develops well, it will continue to widen the gap between China and the US in large models?”
What is an AI Agent?
In the field of computer and artificial intelligence, the term “agent” is generally translated as “intelligent entity.” Its definition refers to a software or hardware entity that exhibits one or more intelligent characteristics such as autonomy, reactivity, sociability, proactive thinking, reflection, and cognition in a certain environment.
OpenAI defines an AI Agent as a system driven by a large language model, with the ability to autonomously understand perception, planning, memory, and tool usage, and can automate the completion of complex tasks. The basic framework of an AI Agent is as shown in the following figure:
Basic framework of an LLM-driven Agent
It consists of four main modules: memory, planning, action, and tool usage:
(1) Memory: The memory module is responsible for storing information, including past interactions, learned knowledge, and even temporary task information. For an intelligent entity, an effective memory mechanism can ensure that it can call upon past experiences and knowledge when faced with new or complex situations. For example, a chatbot with memory functionality can remember user preferences or previous conversation content, providing a more personalized and coherent communication experience. It is divided into short-term memory and long-term memory: a. Short-term memory, all context learning is done using short-term memory; b. Long-term memory provides the ability for the intelligent entity to retain and recall (unlimited) information for a long time. This is usually accomplished by utilizing external vector databases and fast retrieval, such as a large amount of data and knowledge accumulated in a particular industry field. With long-term memory, a lot of data can be accumulated, making the intelligence entity more powerful, with advantages such as industry depth, personalization, and specialized capabilities.
(2) Planning. The planning module consists of two stages: pre-planning and post-reflection. In the pre-planning stage, it involves predicting future actions and decision-making. For example, when executing complex tasks, the AI agent decomposes big goals into smaller manageable sub-goals, allowing for efficient planning of a series of steps or actions to achieve the desired results. In the post-reflection stage, the AI agent has the ability to examine and improve the shortcomings of the plan, reflect on errors and deficiencies, learn from experience, form and join long-term memory, helping the AI agent to avoid future mistakes and update its understanding of the world.
(3) Tool use. The tool use module refers to the AI agent’s ability to utilize external resources or tools to perform tasks. For example, it can learn to call external APIs to obtain additional information missing from the model weights, such as current information, code execution capabilities, access to proprietary information sources, etc., in order to supplement the AI agent’s own weaknesses. For instance, if the training data for the AI agent is not updated in real-time, it can use tools to access the internet to gather the latest information or use specialized software to analyze large amounts of data. There are already numerous digitalized and intelligent tools available in the market. The AI agent is more efficient and skillful in using these tools compared to humans. By invoking different APIs or tools, the AI agent can accomplish complex tasks and produce high-quality results. This use of tools represents an important characteristic and advantage of the AI agent.
(4) Action. The action module is the actual execution part of the AI agent’s decision or response. In facing different tasks, the AI agent system has a complete set of action strategies that it can select from when making decisions, such as well-known memory retrieval, reasoning, learning, programming, etc.
In summary, these four modules work together to enable the AI agent to take action and make decisions in a broader range of contexts, and perform complex tasks in a more intelligent and efficient manner.
The AI Agent will bring
more extensive human-machine fusion
Based on large-scale models, the AI agent not only enables everyone to have a personalized intelligent assistant with enhanced capabilities but also changes the pattern of human-machine collaboration, leading to more extensive human-machine fusion. The intelligent revolution of generative AI has evolved to present three modes of human-machine collaboration:
(1) Embedding mode. Users communicate with AI through language, using prompts to set goals, and then the AI assists in accomplishing those goals. For example, ordinary users input prompts to the generative AI to create novels, music compositions, 3D content, etc. In this mode, the role of the AI is similar to that of an executing tool, while humans act as decision-makers and commanders.
(2) Co-pilot mode. In this mode, humans and AI are more like partners, actively participating in the workflow and playing their respective roles. The AI is involved in the workflow, providing suggestions and assisting in various stages of the process. For example, in software development, AI can help programmers write code, detect errors, or optimize performance. Humans and AI work together in this process, complementing each other’s abilities. AI is more like a knowledgeable partner rather than a mere tool.
In fact, in 2021 Microsoft introduced the concept of Copilot for the first time on GitHub. GitHub Copilot is an AI service that assists developers in writing code. In May 2023, with the support of large models, Copilot underwent a comprehensive upgrade, introducing Dynamics 365 Copilot, Microsoft 365 Copilot, Power Platform Copilot, etc., and proposed the concept that “Copilot is a new way of working.” Just as work needs “Copilot,” life also needs it. Li Zhifei, the founder of “Go Ask”, believes that the best job for large models is to be the “Copilot” of humans.
(3) Intelligent Agent mode. Humans set goals and provide necessary resources (such as computing power), then AI independently takes on most of the work, and humans supervise the process and evaluate the final results. In this mode, AI fully embodies the interactive, autonomous, and adaptive features of intelligent agents, approaching the behavior of independent actors, while humans play more of a supervisory and evaluative role.
Three ways of human-AI collaboration
Based on the functional analysis of the four main modules of intelligent agents: memory, planning, action, and tool usage, the agent mode is undoubtedly more efficient compared to the embedded mode and copilot mode, and may become the main mode of human-machine collaboration in the future.
In the agent-based human-machine collaboration mode, every ordinary individual has the potential to become a super-individual. A super-individual possesses their own AI team and automated task workflows, establishing more intelligent and automated collaborative relationships with other super-individuals based on agents. There are already active explorations of one-person companies and super-individuals in the industry. On the Github platform, there are some automated teams based on agents – the GPTeam project. GPTeam uses large models to create multiple intelligent agents with roles and functions, and multiple intelligent agents collaborate to achieve predetermined goals. For example, Dev-GPT is a multi-agent collaboration team for automated development and operations, including roles such as product manager agents, developers agents, and operations agents. This multi-intelligent agent team can meet and support the normal operation of a startup marketing company, which is essentially a one-person company. Another example is NexusGPT, which claims to be the world’s first AI freelancer platform. This platform integrates various AI-native data from open-source databases and has over 800 AI intelligent agents with specific skills. On this platform, you can find experts in different fields, such as designers, consultants, sales representatives, etc. Employers can choose an AI intelligent agent from this platform at any time to help them complete various tasks.
AI Agent will change the game rules for software
Promoting the infrastructure of AI
AI Agent is redefining software. Bill Gates believes that AI Agent will completely disrupt the software industry and will impact how we use software and how software is written.
AI Agent will shift the paradigm of software architecture from being process-oriented to goal-oriented. Existing software (including apps) fix the workflow through a series of predefined commands, logic, rules, and heuristic algorithms to ensure that the software produces results that meet the user’s expectations. In other words, users follow the instructions, logic, and steps to achieve their objectives. This process-oriented software architecture is highly reliable and deterministic. However, this goal-oriented architecture can only be applied in specific domains and cannot be universally applied to all domains. Therefore, finding a balance between standardization and customization is one of the challenges faced by the SaaS industry.
Software Architecture Paradigm Shift 
The AI Agent paradigm gradually shifts the development of functionalities, which was previously dominated by humans, to be primarily driven by AI. Based on large-scale models as the technical infrastructure and Agents as the core product form, the tasks hierarchy of predefined commands, logic, rules, and heuristic algorithms in traditional software evolves into goal-oriented intelligent agents that can generate autonomously. With this shift, the previous architecture could only solve tasks within a limited range, whereas the future architecture can address tasks across infinite domains.  The future software ecosystem will not only have Agents as the interface for interactions with everyone but also revolve around the Agent to change the entire industry’s development, including underlying technologies, business models, intermediate components, and even people’s habits and behaviors. This marks the beginning of the Agent-Centric era. 
Comparison between RLianGuai Paradigm (Robotic Process Automation) and ALianGuai Paradigm (Agentic Process Automation) 
Take ChatDev Intelligent Software Development Platform, the first “big model + Agent” SaaS-level product released by MLab Intelligent, as an example. This platform is like a software development company fully composed of AI Agents. It has various Agent roles such as CEO, CTO, development manager, product manager, testing officer, supervisor, etc. Users only need to communicate their specific requirements to the CEO Agent, who will organize the entire software development process based on those requirements. The final delivery to the users includes not only the software product but also the code developed throughout the process, and everything is automated.  This will reduce production costs and improve customization capabilities in the software industry, ushering in the era of “3D printing” for software development.
Outlook and Challenges of AI Agents
AI Agents are a crucial driving force for artificial intelligence to become an infrastructure. Looking back at the history of technological development, a technology reaches its pinnacle when it becomes an infrastructure, just like how electricity becomes an essential but often unnoticed part of our lives, similar to cloud computing, and so on. Of course, this transition goes through three stages: the innovation and development stage, when new technologies are invented and start to be applied; the popularization and application stage, where the technology matures and is widely applied across various sectors, deeply impacting society and the economy; and the infrastructure stage, where the technology becomes pervasive and transforms into a fundamental part of people’s daily lives. Almost everyone agrees that artificial intelligence will become the infrastructure of future society, and intelligent agents are driving the infrastructure development of artificial intelligence. This is not only due to the advantages of low-cost Agent software production but also because Agents can adapt to different tasks and environments, learn, and optimize their performance, enabling their application in a wide range of domains and becoming the fundamental support for various industries and social activities.
Overview of Artificial Intelligence Agents
Agents may evolve in two directions simultaneously. First, there are human-assisted AI agents that assist humans by performing various tasks, emphasizing their utility. Second, there are agents with a more human-like direction, capable of autonomous decision-making, possessing long-term memory, and exhibiting certain human-like traits, focusing on humanoid or superhuman attributes.
From a technological optimization and implementation perspective, the development of AI agents also faces some challenges:
Firstly, as seen in OpenAI’s GPTs, the complex reasoning capabilities of LLM are not strong enough, and high delays inhibit the true maturity of agent applications. This is the direction for future industry engineering optimization and technological research breakthroughs.
Secondly, the development of multi-agents still faces significant challenges. Multi-agents are a very complex academic research field. As agents begin to enter the mass market, it has become an important technological reality. For example, Stanford’s virtual town includes multi-agent research with 25 agents. However, after the release of the town framework, according to developers’ testing, one agent consumes a token worth $20 per day, as it requires a significant amount of memory and cognitive processing power. This price is higher than many human workers, requiring further optimization of both the agent framework and LLM reasoning.
Breaking through the development challenges of multi-agents is an important prerequisite for building an agent society in the future. The collaboration of multi-agents can form the highest form of technical social systems, an agent society. Agent society exhibits complex, dynamic, self-organizing, and adaptive characteristics, enabling cooperation, competition, and continuous evolution. In this social system, agents can execute complex and flexible tasks according to goals and environmental changes, engaging in high-level, multidimensional interaction and collaboration with humans and other agents. Agent society not only helps humans explore and expand the physical and virtual worlds but also enhances and expands human capabilities and experiences.
At the same time, these development trends indicate that AI agents may face challenges in various aspects, such as security and privacy, ethics and responsibility, and economic and social employment impact.
(1) Security and privacy are crucial characteristics of AI agents, essential for their stable operation and the protection of users and society. These two factors directly impact the trustworthiness and control of AI agents. If AI agents experience vulnerabilities, attacks, or data breaches, it may result in harm to users or society. For example, OpenAI’s GPTs had a security vulnerability shortly after release, leading to the leakage of user-uploaded data.
(2) Ethics and responsibility are core principles of AI agents, determining their values, goals, and respect and protection of users and society. These principles directly impact the credibility and controllability of AI agents. If AI agents exhibit unfair, opaque, or unreliable behavior, it may lead to user or societal rejection of the technology. Assigning responsibility is also a crucial issue for AI agents. Unclear or unfair allocation of responsibility between humans and AI agents in collaboration can result in serious consequences.
(3) Economic and social employment impact. One important challenge in future work is the competition between humans and intelligent agents. For example, the emergence of AI freelancer platform NexusGPT has disrupted traditional freelancers. In the future, there will also be an increasing number of intelligent agents in collaborative social work, and employers may try to minimize human involvement based on efficiency and effectiveness considerations. As intelligent agent technology matures, we must think ahead about the long-term impact of these technological developments on society and individual careers.
With the release of ChatGPT as a turning point, the number and income of writers/editors on global freelancer platforms have entered a steep decline .