Past: Pioneer Reveals How Bitmain S9 Becomes "One Generation King"

Author: former technical director Dan Xie bit continent

Source: Wu Said Blockchain

Foreword: With the large-capacity mining machines collectively entering the market, the first-generation King Ant mining machine S9 will withdraw from the historical stage. Looking back at the history, the author believes that the Chengdu team has achieved cross-generation advantages over competitors through dynamic triggers, which is also the main reason for the success of S9. The design of the S9 is a miracle of the Bitcoin mining machine, and it has become the most produced mining machine in the Bitcoin mining machine. It has a life cycle of more than three years and is called the machine emperor. The opinion of the article is the personal opinion of the author and does not represent Wu's position on the blockchain.

In August 2014, I opened a back-end design service company in Chengdu. I want to do value-added back-end design services. When searching for customers, I searched online and thought that the company that is a Bitcoin mining machine should have this demand, so I wrote an email to the mailbox on the Bitmain website at the time:

At the time, I wrote emails to many integrated circuit design companies, and Bitmain was the fastest responder. Later, I learned that this email address was used by Wu Jihan and was always used by Bitmain. At that time, Bitmain was also looking for a solution to improve the competitiveness of the chip. So Wu Jihan forwarded this email to President Zhanke's regiment. President Zhan passed through Chengdu and met me in September. President Zhan and I have interviewed twice, and everyone talked well. Mr. Zhan always felt that this project was long, risky, and there were concerns about intellectual property rights. I suggested that I dissolve the company and lead the team to join Bitmain.

In October 2014, I officially joined Bitmain with a small team of two people and became a fully customized department of Bitmain. The original direction was domino logic in the email.

Domino logic is a relatively mature dynamic logic structure, and its main form is this:

Compared with corresponding static circuits, domino logic circuits have the following advantages and disadvantages:

  1. Because the signal only needs to drive the N tube, the capacitance is smaller and the speed is faster.
  2. Because point A is a dynamic point, the final inverter M3 / M4 is necessary.
  3. There is a charge-sharing problem at point A.
  4. M1, M2, M3, M4 are new devices, and PMOS are reduced devices
  5. Because M1 and M2 are driven by clock signals, their power consumption is at least twice that of other common signals.

Therefore, from the perspective of power consumption, because M1 and M2 are clock signals, their power consumption is calculated based on at least twice the power consumption of the signal. The power consumption of M1 + M2 + M3 + M4 is increased by the power of 6 MOS transistors. For general logic, there are not many standard cell libraries with more than 12 MOS transistors. In terms of area, M2 + M4 increases the area of ​​two NMOS tubes, and although PMOS is less, it is difficult to reflect this saved area on the layout. In other words: Although the Domino logic will speed up the chip, the area and power consumption are slightly increased.

When we submitted our conclusion in January 2015, we basically confirmed that domino logic is not suitable for mining machine chips. Because Bitcoin miner chips are purely parallel, speed is not that important. The biggest cost of a miner is the electricity bill, so power consumption is the most important. When we evaluated the rule of thumb for area and power consumption, the weight of power consumption was more than 3 times the area and speed.

The failure of our Domino logic attempt did not hurt our exploration, because we found a standard unit suitable for dynamic logic with a large number of pipes. One obvious example of this standard unit is the flip-flop . So we went back to the beginning, with dynamic triggers as the goal.

At the beginning of the integrated circuit, in the 1970s of the last century, because the cost of each transistor was high, the flip-flops at that time were dynamic, whether it was domino flip-flop, C2MOS edge-triggered flip-flop, or TSPC positive-edge Flip-flop, etc., are the products of the time. We found a treasure trove from it.

For example, a TSPC positive-edge Flip-flop has the following logic:

Still counting as two gates according to our clock gate, this flip-flop is equivalent to 4 * 2 + 7 = 15 gates in total.

The structure of our most commonly used static triggers is this:

Adding clk's inverter, we add this together to be 8 * 2 + 12 +4 = 32 gate equivalents. Static logic triggers consume more than twice the number of equivalent tubes as dynamic logic.

Similarly, in terms of area, dynamic logic is 11 devices, while static logic is 22 devices, which is exactly double the area.

After we have identified the triggers that use dynamic logic, the next step is to integrate into our design process. Finally, we added a time constraint to the functional description of the trigger of the static logic to prevent the leakage current in the dynamic logic from causing leakage at the dynamic capacitor point. And in timing and power consumption library, some parameters of the static trigger are applied. To put it simply, we added a frame to the dynamic logic to make it appear to the front-end designers as a normal static trigger. There is no difference in front-end design and synthesis.

After completing the library of dynamic flip-flops, we are equivalent to a new flip-flop with half the area and half the power consumption. Because bitcoin needs to be calculated continuously, static triggers need to hold data for longer periods of time. After we complete the dynamic trigger, Bitcoin's distributed operation and fully pipelined logic are particularly suitable for dynamic triggers. For the pipeline, its structure looks like this:

We can directly change to:

Considering that the area and power consumption of the new dynamic flip-flop are only half of the original, the new pipeline doubles the speed under the same area and power consumption. In our bitcoin mining chip, we have changed from a 32-level pipeline to a 64-level pipeline, doubling the computing power.

We completed the design of the 28nm BM1385 chip (Antminer S7) in mid-2015, and completed the design of the 16nm BM1387 chip (Antminer S9) at the end of 2015. In terms of performance, our 28nm is almost the same performance as our opponent's 16nm, and our 16nm chip is half the cost of our opponent. With dynamic triggers, we achieve cross-generational advantages over our competitors. In particular, the design of S9 is a miracle of Bitcoin mining machines, and it has become the most produced mining machine in Bitcoin mining machines. It has a life cycle of more than three years and is called the emperor.

In the digital currency mining industry that requires almost no software environment, a product is only half the cost of competitors. This is a very big competitive advantage, which means that you can carry out a price war advantage as you want. Because you do n’t make money selling your opponents, and Bitmain has a gross profit margin of more than 50%. It is by virtue of the secret weapon of dynamic logic and the success of S7 and S9 sales that Bitmain has become a dominant player (with a share of more than 70%) from a dispute of less than 20% in market share.

A direct phenomenon of the rise of Bitmain is the withdrawal of foreign bitcoin chip companies. In 2014 and 2015, the high-profile KNC, bitfury, Spondoolies-Tech, and 21 Inc. all quickly declared bankruptcy or exited the mining chip market.

With the departure of some employees of Bitmain, the technology of dynamic trigger has gradually spread to other domestic chip developers, but this technology is basically confined in China. In 2017, Japan's GMO also tried to enter this field at 12nm and 7nm. From the company's propaganda, they still used the structure of static triggers, coupled with the bear market cycle in 2018, they lost money after one year Withdrawal from this market is to be expected.

Before 16nm, the new generation process mask was within millions of dollars, and the technology, manpower, and risk of redesigning dynamic logic were obviously higher than millions of dollars, so the advantages of dynamic logic could not be reflected. However, after 16nm, the cost of the new 10 / 7nm and the future 5nm will be tens of millions of dollars. The ability of dynamic logic to surpass the performance of a process node will make it appear more advantageous and give new life to it. I look forward to the emergence of more companies applying dynamic logic to make this ancient design art reappear in our time.