NGC Ventures Is the AI track still worth starting a business in?

Is it still worth starting a business in the AI track?

Author: Cherry, Investment Manager, NGC Ventures

Foreword

This article was initially drafted at the end of August during a day off and was hastily published, receiving a lot of criticism. As a result, the author made some additions, changes, and deletions to avoid embarrassment.

The content of this article mainly evaluates the current situation of the AI industry from an investment perspective, reflects on and speculates on the technical/product roadmaps of different companies, and abstractly summarizes the strategies of AI industry companies. Therefore, there may be omissions in the specific technical parts, please forgive me.

However, in the end, the few companies that can publish papers are still tearing each other apart, and it seems that no one can judge the correctness of the content of this article. It’s like using GPT-4 to score GPT-3.5, which seems reasonable but is a bit abstract when you think about it.

Therefore, the author suggests that this article should be regarded as a “judgment” formed after collecting information about an uncertain industry. Since it is a judgment, the position must be clear and substantial. As for whether the judgment is right or wrong, let time be the judge.

The author always believes: in a new industry with a lot of noise, it is never wrong to use your brain more and dare to make judgments. For a multiple-choice question, the probability of guessing correctly is 50%, and the probability of guessing wrong three times in a row is 12.5%. Even if it is as simple as flipping a coin, making a judgment is meaningful. Making a judgment is not scary; what is scary is having an accuracy rate lower than flipping a coin.

Before officially starting this article, I would like to express my gratitude to the following works, which have provided valuable inspiration and data sources for this article. Of course, since many inferences in this article are based on these works, if there are errors or misunderstandings in the author’s understanding, the inferences in this article will also be unstable. Readers are advised to use their own judgment. This article does not constitute investment advice and is difficult to be regarded as investment advice.

•Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models’ Reasoning Performance (https://arxiv.org/abs/2305.17306)

•LIMA: Less Is More for Alignment (https://arxiv.org/abs/2305.11206)

•June 2023, A Stage Review of Instruction Tuning (https://yaofu.notion.site/June-2023-A-Stage-Review-of-Instruction-Tuning-f59dbfc36e2d4e12a33443bd6b2012c2)

•GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE (https://www.semianalysis.com/p/gpt-4-architecture-infrastructure)

Alright, let’s officially start this article.

Large-scale Models: Launching Cyber Rockets

The first step in discussing AI in 2023 is to discuss whether it is still feasible to start a business with large-scale models.

Large-scale models (pre-training) have now become a matter of launching rockets. As long as you can afford the cost and have the right direction, anyone can do it. It can be said that training large-scale models is like launching cyber rockets.

One counterintuitive thing is that investors underestimate the difficulty of training large-scale models while overestimating the difficulty of launching real rockets. With the same cost of $60 million, investors may think that there is a second chance if the rocket fails to launch, but training large-scale models is considered a waste of funds if it fails.

GPT-4 still consumes $60 million in GPU efficiency (allegedly about 30%) at OpenAI. This is a problem of {performance = efficiency × cost}, and performance is the barrier. If other startups cannot achieve a performance effect greater than 30% × $60 million = $18 million, users might as well use GPT-4 directly.

Currently, the financing rounds of many companies claiming to train large models are in the range of $1 million to $5 million. This means that even for companies with the highest financing amount, their ammunition is only enough to support one launch. And even if the GPU utilization rate of this launch reaches 100%, it is still difficult to surpass GPT-4.

From this perspective, launching rockets is easier because most rockets are carrier rockets that carry satellites into space, and the payload for each launch is limited. Therefore, small rocket companies can take over satellites that others have not had a chance to launch.

Large models are different. The marginal cost of horizontal expansion for large models is only computing power cost, and computing power cost can be elastically expanded, which means that for large model companies, every single profit is a free profit and requires almost no additional cost. Their capacity to undertake projects is very large. It is difficult for newly established and low-quality large model companies to receive overflow demand.

Unless training costs decrease significantly, many companies will find it difficult to develop marketable large models in the short term, even if they know the complete architecture of GPT-4.

Customization: Facing the “winner-takes-all” problem directly

In the hardware industry, a common phenomenon is to achieve early profits through customized demand, and then achieve technological breakthroughs (or catch up) through early profits. However, customization in the large model industry is difficult to be the way out for newcomers.

Explaining this judgment is very simple: the vast majority of fine-tuned models cannot catch up with GPT-4. Even if they do catch up, the cost of directly using the generalization of GPT-4 is lower, requires fewer personnel, less luck, and less data. As long as there is still a performance gap between GPT-4 and other models, customization cannot be the way out for large model companies.

A very typical example is Jasper, which serves enterprise customers using fine-tuned GPT-3. However, after OpenAI released ChatGPT (GPT-3.5), its users quickly churned. This is because the output of Jasper can be obtained by simply inputting a simple prompt for GPT-3.5, without using a “lagging version” with poor generalization and limited internal use.

Compared to new companies, Jasper at least had a window of opportunity to develop from GPT-3 to GPT-3.5. However, new companies now need to face the squeeze of low-cost and high-speed GPT-3.5 and high-performance GPT-4 at the same time.

Therefore, the probability of survival is very low for those who hope to accumulate profits through customization and achieve technological breakthroughs.

Fine-tuning: Necessary but not blindly relied upon

The current AI industry has an unrealistic expectation of fine-tuning, which is overestimated both in terms of specific technical implementation and macro technical rhythm.

The fine-tuning currently discussed in the industry mostly refers to “making the model generate answers that align with human intent” based on pre-trained models. This kind of fine-tuning can be called “alignment,” which means aligning the answers with human intent rather than making the large model smarter.

According to the research results of multiple papers, the knowledge of the large model should mainly come from pre-training, and fine-tuning is mainly used for alignment.

In simple terms, pre-training determines the brain capacity, and fine-tuning determines the native language. Fine-tuning a pre-trained model is a process of “eradicating illiteracy”.

However, in the current industry, fine-tuning is often seen as a way to “make the model smarter”, that is, improving model performance and increasing model knowledge through fine-tuning, and it is believed that this can achieve the “holy grail of artificial intelligence”. This line of thinking is somewhat biased.

Firstly, the performance of the model itself does not improve, it just aligns better with human intent. If the complexity of the task exceeds the performance of the model, fine-tuning may not achieve the expected results. It’s like asking the human brain to perform quantum computations, which is not an issue of education.

Secondly, “knowledge supplementation” in the part of “intent alignment” is more like “parroting”. That is, the model only imitates experts’ speech without understanding the meaning behind it. Although many industries can get good solutions through “parroting” (after all, most industries are not complex…), this is obviously not the result we should pursue in the long run.

Finally, the training of “supplementing additional datasets, improving model performance, and increasing model knowledge” should be regarded as the model’s ability of “incremental learning/continuous learning”, that is, the full parameters of the model can be optimized through incremental datasets. This is not the same concept as the so-called “instructional fine-tuning”.

In summary, fine-tuning is very important, but it is wrong to have a “superstitious” attitude towards current fine-tuning, especially the rush to label current fine-tuning as the holy grail. It is somewhat reminiscent of “two dark clouds floating above the building of physics today”.

On a side note, if the demand for “enhancing intelligence” can really be solved through instructional fine-tuning, a simple vector search, directly embedding knowledge into the context, and writing a few prompt templates can likely achieve the same or even better results.

Everyone likes fine-tuning, which may also be a revival of alchemy in modern times…

Outlook on large models: Four arithmetic operations

(Note: This section is completely based on the data revealed by Dylan LianGuaitel and its reliability cannot be verified at present)

The training of GPT-4 is based on the A-series of N-card, with a training efficiency of 30%, a training time of about 2 months, a cost of about 60 million, and a total parameter size of {17 trillion = 110 billion * 16 expert models}. The parameters for processing a single problem are around 280 billion.

That is to say, there are several key parameters that can change the pattern of training for large models.

Training Efficiency: Increasing from 30% to 60% can directly reduce the time by half

Improved Computing Power Intensity: Upgrading from the A series to the H series and then to dedicated AI cards can increase computing power intensity, solving many efficiency issues in architecture

Reduced Computing Cost: Huang, the founder of Nvidia, offers discounts on graphics cards, resulting in significant cost reductions

Improved Parameter Efficiency: There is room for improvement in the parameter efficiency of models. Referring to the past, the parameter efficiency of new models is usually several times higher than that of old models. It is possible to achieve similar effects with only 30% of the parameters used in GPT-4

In summary, there may be a cost optimization space of 10-20 times when training a model with GPT-4 level performance from scratch, compressing it to $3-6 million. This cost is more acceptable for both start-ups and large companies in terms of cost control.

And this change may take about 2 years to complete.

Currently, the technology of mainstream large models is still based on transformers, and the underlying architecture has not changed. The idea of adding parameters to achieve miracles has not been exhausted. The training of GPT-4 is carried out under significant computing power constraints, and the training time is not long enough.

If the parameters increase linearly with training time, the upper limit of the parameters for models with architectures similar to GPT-4 may be around 10 trillion, that is, training time is doubled (×2), parallel graphics cards are doubled (×2), training efficiency is increased by half (×1.5), and parameter efficiency is increased by half (×1.5), resulting in a tenfold improvement. According to the risk preference style in Silicon Valley, this parameter volume will probably be reached within a year, regardless of whether the performance has improved or not.

However, after reaching 10 trillion parameters, it is completely unknown whether LLM can still achieve miracles by adding parameters.

If the increase in parameter volume has a diminishing marginal effect on model performance, then 10 trillion is likely to be a hurdle. However, there is also a conjecture that the increase in parameter volume has an increasing marginal effect on model performance, similar to “a person who is smart enough can learn quickly.” The former is acceptable, but if the latter comes true, the model performance may experience an exponential improvement, and what will happen then will be completely unpredictable.

Predicting alchemy is difficult, but predicting the pace of corporate strategy is easy. For most companies, whether they are giants like Google/MS/APPL or smaller ones like OpenAI, a model with a total of 10 trillion parameters is a milestone-level endpoint where they can pause and explore new technologies.

The preference for risk by companies/capital can be converted into a “tolerance time”. If the entire tolerance time is spent burning costs intensely, it is difficult to exceed 6 months. The rate of human technological growth is not fast enough, usually taking 5 years or even longer as a cycle. Therefore, within 5 years, the maximum parameter limit of the model can be estimated to be between 200 trillion and 500 trillion. Unless there is another major breakthrough in technology/architecture, the probability of exceeding this order of magnitude is very low.

Multi-modal: The Elephant in the Room

Multi-modal is the elephant in the room that could have a profound impact on the landscape of the industry.

The simple definition of multi-modal is: supporting input and output of multiple modalities of information. This definition is broad, for example, some products in the market that claim to support multi-modal input are actually just ChatBots with an added layer of OCR. There are also models that completely fit the definition of multi-modal, but their performance is not commendable. Even GPT-4’s image multi-modal input capability has not been widely released, indicating that this feature is not very stable.

However, the release of multi-modal is not a distant thing. GPT-5 is likely to natively support multi-modal, which means redesigning the structure and retraining the model. Based on previous reasoning, there is still a growth space of 10 to 50 times in the parameters of large models, which should be sufficient to incorporate multi-modal capabilities. Therefore, it can be expected that high-performance and highly available multi-modal models will appear within 2 years, and optimistically, within 1 year.

Multi-modal is the elephant in the room that everyone knows will eventually exist, but many products/research/strategies overlook its existence, leading to misjudgment in critical areas.

For example, a single-image model theoretically will be severely oppressed by multi-modal models, but most of the current research/investment overlooks this problem, resulting in overvaluation of some companies that focus on image models. These companies are likely to lose their technological barriers and transform into service providers in the future, and their valuation system should refer to service providers rather than technology companies.

If you want to tell the story of “investing in people, the same team can do business transformation,” then just pretend I didn’t say it. Legends always exist, but research cannot blindly believe in legends.

Who Can Train GPT-4: Anyone Can, but No Need

Alchemy doesn’t take that long, and big companies are all buying graphics cards. One very obvious thing is that one year from now, large companies will all have the ability to train models at the level of GPT-4. However, whether or not to train is another question.

In the gaming industry, there is a classic proposition called “Have Genshin Play Genshin,” which means that when players have the option to play Genshin or a competitor’s game that is not as good as Genshin, they will choose to play Genshin.

This “winner takes all” mindset also applies to the large model industry. If a company closely follows OpenAI and, after six months of research and development, releases its own large model with 90% of the performance comparable to GPT-4, and hopes to bring it to market. At this time, the company will face the following problems:

• OpenAI has a scale advantage in cloud resources, with lower costs

• OpenAI’s API is already widely used in product code, making it difficult to switch

• The company’s product performance still does not exceed GPT-4

• OpenAI’s next-generation product (possibly GPT-5) is about to be released

It can be seen that the pressure on the company is quite high. Instead of training GPT-4, it is better to directly bet on the next-generation model (comparable to GPT-5). Then the problem will shift from “competition among peers” to “technological innovation”. This is a burden that small companies find difficult to bear.

Therefore, discussing “who can train GPT-4” is a strategically futile question. Instead of thinking about this problem, it is better to find more certain and opportunistic directions.

Advice for AI startups: prioritize performance and avoid stagnation

I have written many articles criticizing langchain, the fundamental reason being that langchain did not leave room for developers to improve performance. It is called a “universal framework” and in order to ensure universality, many performance improvement opportunities of large models were sacrificed, such as multi-turn dialogues and format control achieved through fine-tuning. Similarly, guidance/Auto-GPT/BabyAGI, etc. all aim to be a “framework that can be used for a lifetime”.

An objective fact is that in May, OpenAI released Function Calling, which provided better implementation solutions for many troublesome parts of the code, but the cost of implementing better solutions is key parts of the product code being refactored. In August, OpenAI released the permission to fine-tune GPT-3.5, which brought new potential solutions to many aspects that require precise control of outputs.

Therefore, startups must face a crucial choice: whether to choose ① to improve performance and constantly refactor the product, or ② to reduce the use of new features and always develop with old features?

For startups applying new technologies, “development” not only represents the process of writing code, but also represents the “upper limit” of product functionality/strategy. The higher the manageable performance, the more theoretical functions the product can have, and the higher the strategic flexibility.

The development of technology cannot be predicted, and even minor technological innovations can bring about highly sensitive changes in the competitive landscape. Startups should have the ability to be antifragile towards technological development.

– In plain language: priority on performance, avoid stagnation. In terms of development, use new features more; in terms of product, consider what functions can be achieved with new features; in terms of strategy, consider the impact of new features on the strategy.

In “Guo Qin Lun”, it was mentioned that after the establishment of the Qin Dynasty, all metallic weapons in the world were confiscated and cast into twelve bronze statues to eliminate the possibility of popular uprisings. But the Qin Dynasty was famously short-lived. It is more advantageous to pay attention to changes rather than ignore them.

Advice for AI startups: confidently focus on application

There is a very common hidden danger for startups in application development: the entry of large companies. These large companies include not only application giants like Meta/ByteDance/Tencent, but also upstream players in the AI industry such as OpenAI.

The reasons for the entry of large companies usually fall into two categories: seizing product opportunities and cutting down from upstream to downstream.

“Seizing product opportunities” is as it sounds, the large company believes that this direction is worth pursuing, so they do it.

“Cutting downstream from upstream” is often a last resort. It may be because the company has developed large models that compete with OpenAI, but due to the winner-takes-all nature of large models, there are no users, leading to burning costs, no revenue, no data, and gradually falling behind in performance. At this point, entering the downstream, developing specific applications, and using one’s own technology becomes the only option.

Based on historical experience, due to organizational structure issues, the closer a company is to the downstream, the more likely its technology will lag behind. The more the technology lags behind, the more it has to move downstream. These so-called technology companies will eventually compete with application layer companies for the same ecological position.

However, in the battlefield of the application layer, AI technology has appeared for a very short time and there are no effective and reusable scale advantages. Big companies and startups have similar starting points. Compared to big companies, startups are more efficient, have deeper insights, and are more likely to take advantage.

One noteworthy situation is that almost all of MS Azure’s promotional materials are centered around OpenAI. The fact that such a large company like Microsoft has to rely entirely on OpenAI as its platform indirectly proves that startups have a natural advantage in the field of AI.

Of course, some cloud providers may not accept being led by startups and want to dominate the entire market themselves. However, their high costs and slow speed are not immediate threats.

The fact is that while some AI application tracks are very short-lived, there are still many long-lasting tracks that have not been discovered. AI applications are not winners-take-all. Extending from applications to platforms or technologies is also a more feasible path.

Therefore, we should have a rational view of the ability of large companies to invade the application layer. Our advice is that AI startups can confidently focus on applications.

Advice for AI startups: Pay attention to the lifeline of your products

As mentioned earlier, AI startups can confidently focus on applications, but they need to consider the performance issues of AI models and avoid stagnation. This situation directly manifests as AI products possibly losing their user base within a few months and gradually declining, which may happen frequently.

AI applications need to use services with large models, and the performance of these models keeps improving. This improvement is not just in terms of “speed” or other single dimensions, but in terms of overall improvements in output quality, output length, output control, and more. Each significant upgrade in technology will cause existing application layer products to fall behind technically and create new opportunities and competitors.

We refer to the time that AI applications maintain advantages and necessity in terms of strategy/products/technology as the “lifeline”.

Here are some examples of short lifelines:

• When ChatGPT/Claude supports file uploads, ChatPDF loses its necessity.

• When Office365 supports Copilot, products that use AI to create PowerPoint presentations lose their advantage.

• When GPT-3.5 emerges, Jasper loses its necessity.

Considering the rapid development of the AI industry, having a limited lifespan is the norm. Therefore, accepting the fact of having a limited lifespan and choosing a direction with a longer lifespan as much as possible is beneficial for maintaining long-term advantages and product necessity.

In general, the lifespan can be divided into levels of 3/6/12 months.

• 3 months: Functions that big companies don’t have time to develop (such as functions that Office/ChatGPT haven’t had time to develop)

• 6 months: Have a certain level of implementation difficulty, cannot be integrated into existing solutions, but advantages/necessity will disappear as AI performance improves (such as general AI frameworks)

• 12 months: Advantages/necessity can exist for a long time, not easily influenced by big companies/technological developments (such as Hugging Face)

* The lifespan of platform products is not necessarily long, after all, the prompt store is also a platform

For startup companies, as long as there is a 6-month lifespan level, they can proceed, but a 12-month lifespan level is hard to come by.

When the product’s lifespan comes to an end, there are generally two situations. The first situation is when the advantage disappears, requiring product reconstruction and technological upgrades, please refer to the previous section “Performance First”; the second situation is when the necessity disappears, and the product will gradually be replaced. At this time, the product still has several months of “operational lifespan,” which is enough for startup companies to choose the next direction.

Advice for AI startups: Web3+AI is a viable option

Currently, there are many projects focused on the theme of Web3+AI entrepreneurship. Considering the uncertainty of technological development and the early stage of the market, the topic of Web3+AI still has many variables in the future.

This article aims to find a high probability of correctness in uncertainty. Therefore, the author still hopes to provide some potential opportunities and directions for startup companies and interested researchers to refer to.

Decentralization

Currently, leaders in the AI industry only provide closed-source models, and the stability, transparency, and neutrality of their continued services are uncontrollable. Decentralization may become an important topic in the AI industry, namely: based on a decentralized architecture, providing stable, transparent, and neutral AI services.

Decentralization is an “alternative solution” and also a “deterrent” that can significantly increase the unethical cost of centralized/sovereign AI companies and prevent them from using AI models in military, cult, political, and other aspects.

In extreme cases, if centralized/sovereign AI services are no longer available/reliable for some reason, decentralized AI can continue to provide highly available services, preventing individual countries/regions and even humanity from losing AI services and falling into a paralyzed state.

Practical use of computing power

The transition of ETH from PoW to PoS is criticized for the dilemma of “mining not generating value,” but combining Web3 with AI can provide scenarios for the practical use of computing power, thus achieving the effects of digesting existing computing power and promoting the overall growth of computing power.

Tokenization of Virtual Assets

AI is an asset native to computing power and storage. The combination of Web3 and AI can provide a channel for transforming AI into virtual assets, creating true native virtual assets for Web3 while realizing the value of the AI industry.

Variability of Web3 Applications

The combination of Web3 and AI may bring new features and growth opportunities to Web3 applications, and existing Web3 applications can be completely redesigned.

Final Thoughts: Is AI Still Worth Startups in September?

Let’s start with the conclusion: Yes, and this conclusion is highly likely to hold until Chinese New Year.

People often have biases in their perception of situations, and I am no exception. Some people are overly optimistic, while others are overly pessimistic. I have had discussions with two teams, one team believes they can develop an AI agent by Q1 next year, while the other team believes AI is only suitable for knowledge base management work. Obviously, the former is too optimistic, and the latter is overly pessimistic.

When making long-term plans, being overly optimistic or overly pessimistic can both lead to pitfalls. The widely disseminated opinions often have significant biases, and independent thinking is precious. Therefore, whether readers can accept the viewpoints in this article or not, as long as readers generate independent thinking and judgment during the reading process, I will be extremely gratified.

Finally, let me advertise. If you have good AI startup ideas or already have mature projects, feel free to communicate with friends from NGC (like me) at any time.

We identify projects with disruptive innovation, aiming to solve problems with solutions that are characterized by simplicity, cost affordability, speed, uniqueness, and a compelling product market fit.

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Share:

Was this article helpful?

93 out of 132 found this helpful

Discover more

Market

Wu's Weekly Picks: HSBC launches cryptocurrency ETF, US SEC rejects spot ETF application, Azuki criticized by community, and top 10 news (June 24-30)

Author | Wu's Top 100 Blockchain News This Week. US SEC Returns Spot ETF File According to WSJ, the US...

Opinion

Bloomberg Thousands of Words Uncover How SBF's Elite Parents Helped Him Build a Cryptocurrency Empire?

A tall building rises from flat ground, and the success of FTX is not the result of one person's efforts. With the ba...

Policy

BlockFi Emerges from Bankruptcy, Ready to Pay Back Creditors and Recover Assets

In November, popular crypto lending platform BlockFi made headlines for their bankruptcy filing caused by the FTX con...

Blockchain

Hong Kong Cryptocurrency New Policy's One-Year Anniversary A Year of Major Leaps and Key Milestone Review

Over the past year, Hong Kong has made great progress and shown strategic development in virtual asset policies. Sinc...

News

A picture to understand the blockchain: expansion, going to sea, ending, a decade of exchange history

Expansion, going to sea, ending-ten years history of exchanges On November 14, the Central Bank's Shanghai Headq...

Blockchain

The consensus of using "money" to forge coins - a high-tech that condenses developers' miners' exchanges and users

In 1776, the American Revolutionary War broke out. Why is this war going to fight? The American side said that "...