Introduction to Technology | Solidity Programming Language: Introduction to Basic Compilation Principles and Adding New Instructions

Objective of this paper

The main purpose of this paper: 1. Understand the basic compiling principle of solidity 2. Learn how to add new instructions by way of example, without involving the syntax of solidity language.

Introduction to solidity

Solidity is the development language of smart contracts and is a high-level language with a syntax similar to javascript. The contract source code is compiled and generated to run virtual machine code in the virtual machine.

Development documentation:

Common IDE: #Includes development environment, compiler, debugger

Solidity source code:

Solidity contract instance

Contract code

The solidity routine below is a smart contract that stores and gets the block number. Set the block number to storedData by calling the set interface by sending the transaction, and then get the stored storedData by statically calling the get interface.

pragma solidity >=0.5.0; contract storenumber{ uint storedData=0; function set() public { storedData = block.number; } function get() public view returns (uint) { return storedData; } } 

pragma solidity >=0.5.0; contract storenumber{ uint storedData=0; function set() public { storedData = block.number; } function get() public view returns (uint) { return storedData; } }

pragma solidity >=0.5.0; contract storenumber{ uint storedData=0; function set() public { storedData = block.number; } function get() public view returns (uint) { return storedData; } }

pragma solidity >=0.5.0; contract storenumber{ uint storedData=0; function set() public { storedData = block.number; } function get() public view returns (uint) { return storedData; } }

pragma solidity >=0.5.0; contract storenumber{ uint storedData=0; function set() public { storedData = block.number; } function get() public view returns (uint) { return storedData; } }


The above code is compiled in remix: using the 0.5.1 commit version.

Abi=[{"constant":true,"inputs":[],"name":"get","outputs":[{"name":"","type":"uint256"}],"payable ":false,"stateMutability":"view","type":"function"},{"constant":false,"inputs":[],"name":"set","outputs":[], "payable": false, "stateMutability": "nonpayable", "type": "function"}]

data = "0x60806040526000805534801561001457600080fd5b5060c2806100236000396000f3fe6080604052600436106043576000357c0100000000000000000000000000000000000000000000000000000000900480636d4ce63c146048578063b8e010de146070575b600080fd5b348015605357600080fd5b50605a6084565b6040518082815260200191505060405180910390f35b348015607b57600080fd5b506082608d565b005b60008054905090565b4360008190555056fea165627a7a72305820825c534e94b487410e10fa0ba5da11584c0b0ad2bd9e56397a3dfa89e504ee1f0029"


Fixed instruction: PUSH1 0x80 PUSH1 0x40 MSTORE

Variable: PUSH1 0x0 DUP1 SSTORE // corresponding storedData=0

Inline function: CALLVALUE DUP1 ISZERO PUSH2 0x14 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP // for error rollback

Deployment code instructions: PUSH1 0xC2 DUP1 PUSH2 0x23 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN INVALID //The core instructions of the deployment contract

Fixed instruction: PUSH1 0x80 PUSH1 0x40 MSTORE

Fixed instruction: PUSH1 0x4 CALLDATASIZE LT / / used to verify the input size.

Load contract code: PUSH1 0x43 JUMPI PUSH1 0x0 CALLDATALOAD PUSH29 0x100000000000000000000000000000000000000000000000000000000 SWAP1 DIV DUP1 PUSH4 0x6D4CE63C EQ PUSH1 0x48 JUMPI DUP1 PUSH4 0xB8E010DE EQ PUSH1 0x70 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST





Other instructions: LOG1 PUSH6 0x627A7A723058 KECCAK256 DUP3 0x5c MSTORE8 0x4e SWAP5 0xb4 DUP8 COINBASE 0xe LT STATICCALL SIGNEXTEND 0xa5 0xda GT PC 0x4c SIGNEXTEND EXP 0xd2 0xbd SWAP15 JUMP CODECOPY PUSH27 0x3DFA89E504EE1F0029000000000000000000000000000000000000 " // (I don't know the specific role)

The above abi, data is the data needed to deploy the contract and execute the contract. Where abi contains the function name used in the contract, the input and output of the function, and the properties of the function. Opcodes are specific code instructions to be executed by the virtual machine. Data is the hexadecimal of opcodes, which can be converted to each other. Here's how to generate abi and opcodes.

Brief description of solidity compilation principle

Here, taking the above contract code as an example, briefly introduce the analysis process.

1. Read the complete contract code in the form of a string and go to step 2;

2, remove the space before the string, and then traverse the string, and use the space, '{', '}', ';', '(', ')', etc. as a separator to split the string, and then with TOKEN_LIST The TOKEN defined in the comparison is compared and replaced with the TOKEN, and the third step is taken.

3, the first TOKEN is a pragma, and then start with pragma, until ';' ends, determine the language is solidity, the version number is greater than or equal to 0.5.0, and compare the current compiler version to match, go to step 4.

4, continue to traverse, TOKEN is contract, (the contract, interface, library processing is the same), and then start from the contract, determine the next string storenumber for the contractname, continue traversal, starting with '{', (intermediate processing Go to step 5), and the paired '}' ends. At this point, the contract content with the contract name storenumber is determined. Go to step 9.

5, continue to traverse, TOKEN is uint, judged as the data type, ending with ' ; ', determine the data type is uint, the type is storedData, go to step 6

6, continue to traverse, TOKEN is function, the subsequent string set is the function name, with '(', start, to ')' to determine the input is empty, continue to traverse TOKEN to public, determine the function properties, continue to traverse TOKEN to ' {', ending with the paired '}', determine the function body, go to step 7.

7, continue to traverse, TOKEN is function, the processing logic is the same as step 6, but the view attribute and returns, the resolution of the returns corresponds to the output of abi, go to step 8.

8. Continue to traverse the '}' that matches the initial '{' of the contract. Go to step 4 and continue processing.

9. At the end of the traversal, check the legality (grammar check, naming rule check, instruction check, etc.), and go to step 10.

10, start compiling the contract, that is, the process of generating opcodes. The compilation process can be divided into three processes, going to step 11.

11, compiler initialization. The initialization instructions are fixed: PUSH1 0x80 PUSH1 0x40 MSTORE. Then take out all the state variables, where the state variables will be compiled as: PUSH1 0x0 DUP1 SSTORE, go to step 12.

Remarks: 1. The instructions here are not the same as the beginning, but are translated later. For example, the correct representation of PUSH1 0x80 here is AssembllyItem (type:pushdata, data:0x80), and then the correspondence of token and instruction Converted to instruction 2, state variable instruction PUSH1 0x0 DUP1 SSTORE indicates that the initialization variable has a value of zero and the variable position offset is 0. If the code is initialized to 1, the instructions here will be compiled into PUSH1 0x1 PUSH1 0x0 SSTORE. If you add a variable initialized to 3, it will be compiled into PUSH1 0x1 PUSH1 0x0 SSTORE PUSH1 0x3 PUSH1 0x1 SSTORE

12, continue to compile, mainly to complete the compilation of the function, add an inline function for checking and rolling back. Corresponding instructions: CALLVALUE DUP1 ISZERO PUSH2 0x14 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP, turn 13 steps;

13, add contract initialization: PUSH1 0xC2 DUP1 PUSH2 0x23 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN. At this point, the main opcodes for the deployment contract are generated. Start compiling the function below, go to 14 steps;

14, first generate a corresponding function address according to all the function names, such as 0x6D4CE63C, 0xB8E010DE in the example, when the actual function is called, in the input of the transaction, there is this value, turn 15 steps;

15. Compile the function and generate the instructions for each function. Refer to the previous example. Turn 16 steps;

16. Finally, compile missingFunctions. Turn 17 steps;

17 , print the results, the compilation is over.

The above analysis process only introduces the basic idea, the actual processing process is much more complicated, because the contract can have the existence of classes, inheritance, polymorphism, interfaces, libraries, etc., and some additional processing is needed.

Generate abi:

The content of abi is related to the function of the contract, including the constant, name, inputs, outputs, payable, stateMutability, type of the function. The above information can be obtained from the parsing of steps 2 to 8 above, and then encapsulated into json and returned to The front end is fine.

Generate opcodes:

The above steps 10 to 16 are the process of generating cpcodes, in actual use, the hexadecimal of the opcodes used.

Add new instructions

Sphere of influence

According to the analysis of the above compilation process, to add a new instruction, you need to consider the following 4 points.

1, the definition of token: grammar definition, such as token {Add, +}, + and Add corresponding, when the parsing, replace the + in the code with Add

2, the definition of the instruction: the instructions provided to the virtual machine, you need to add the same definition in the compiler and virtual machine

3, case token processing: the token and the instruction corresponding, in the process of compiling the token::Add replaced by the instruction:: ADD instruction for the virtual machine to identify.

4, the impact of new instructions on the compiler: such as the impact on the function (whether it affects the pure, view, payable properties of the function), the impact on the storage, etc., this modification can refer to other instructions of the same type, such as adding operators Refer to the addition, subtraction, multiplication and division instructions, add the block attribute to refer to the existing number, gaslimit instruction.

5. Definition and processing of new instructions in the virtual machine

Example: To add a RANDOM command (get the random number attribute in the block, refer to the number attribute, use the block.number, block.random in the contract) as an example to illustrate the location added in the code.

Modify the compiler code

1. View the token definition, code location: liblangutil/Token.h. In TOKEN_LIST, 2 types of tokens have been defined, one is the keyword token, and the other is the non-keyword token, such as parentheses, operators, and data types. The random to be added is not of the above type and no token definition is required.

 #token定义示例,格式为M(name,string,precedence),M可以是T或者K,T表示非关键字token,K表示关键字token。name表示token名称,string为token的原生字符串,precedence表示优先级。 #define TOKEN_LIST(T, K) \ ...... T(LParen, "(", 0) \ T(RParen, ")", 0) \ T(LBrack, "[", 0) \ T(RBrack, "]", 0) \ T(AssignShr, ">>>=", 2) \ T(AssignAdd, "+=", 2) \ T(AssignSub, "-=", 2) ...... K(Continue, "continue", 0) \ K(Contract, "contract", 0) \ K(Do, "do", 0) \ K(Else, "else", 0) ...... 

2, the instruction definition, code location: libevmasm/Instruction.h. Find the relevant properties of the block in the enum calss Instruction and append the RANDOM directive after it. As shown below, RANDOM=0x46. Note that the added instruction number cannot conflict with other ones. For example, you cannot add another 0x40 instruction, which will conflict with the existing BLOCKHASH instruction.

Enum class Instruction: uint8_t { …… BLOCKHASH = 0x40, ///< get hash of most recent complete block COINBASE, ///< get the block's coinbase address TIMESTAMP, ///< get the block's timestamp NUMBER, ///< get the block's number DIFFICULTY, ///< get the block's difficulty GASLIMIT, ///< get the block's gas limit RANDOM, ……


The above definition is hexadecimal, and a string "RANDOM" is required to correspond to the instruction. The code is located in libevmasm/Instruction.cpp.

Std::map<std::string, Instruction> const dev::solidity::c_instructions = { …… { "NUMBER", Instruction::NUMBER }, { "DIFFICULTY", Instruction::DIFFICULTY }, { "GASLIMIT", Instruction::GASLIMIT }, { "RANDOM", Instruction::RANDOM }, …… } static std::map<Instruction, InstructionInfo> const c_instructionInfo = { …… { Instruction::ADD, { "ADD", 0, 2, 1, false, Tier::VeryLow } }, { Instruction::NUMBER, { "NUMBER", 0, 0, 1, false, Tier::Base } } , { Instruction::DIFFICULTY, { "DIFFICULTY", 0, 0, 1, false, Tier::Base } }, { Instruction::GASLIMIT, { "GASLIMIT", 0, 0, 1, false, Tier::Base } }, { Instruction::RANDOM, { "RANDOM", 0, 0, 1, false, Tier::Base } }, …… }// followed by 0,0,1,false,Tier: :Base is mutable, depending on the needs of the instruction. The first default is 0, the second 0 indicates the number of parameters, and 1 indicates that one return value is required. False can be understood as being used only inside the virtual machine. If it involves reading and writing to the database, it should be filled in as true. The final Tier::Base is the level of gasprice, just fill it in as needed.

3, the processing of instructions: code location libsolidity/codegen/ExpressionCompiler.cpp

Bool ExpressionCompiler::visit(MemberAccess const& _memberAccess) { …… case Type::Category::Magic: if (member == "coinbase") m_context << Instruction::COINBASE; else if (member == " Timestamp") m_context << Instruction::TIMESTAMP; else if (member == "difficulty") m_context << Instruction::DIFFICULTY; else if (member == "number") m_context << Instruction::NUMBER; else if ( Member == "gaslimit") m_context << Instruction::GASLIMIT; else if (member == "random") m_context << Instruction::RANDOM; …… } //Different commands have different cases Processing, such as token:Add, is handled as follows: void ExpressionCompiler::appendArithmeticOperatorCode(Token _operator, Type const& _type) { …… switch (_operator) { case Token::Add: m_context << Instruction::ADD; break ; case Token::Sub: m_context << Instruction::SUB; break; case Token::Mul: m_context << Instruction::MUL; break; …… } //If you are adding other types of instructions , you can find the corresponding case to add.

4, the impact on the function, storage:

Determine the data type, code location libsolidity/ast/Types.cpp

MemberList::MemberMap MagicType::nativeMembers(ContractDefinition const*) const { //Specify the stored data type… case Kind::Block: return MemberList::MemberMap({ {"coinbase", make_shared<AddressType> (StateMutability::Payable)}, {"timestamp", make_shared<IntegerType>(256)}, {"blockhash", make_shared<FunctionType>(strings{"uint"}, strings{"bytes32"}, FunctionType::Kind ::BlockHash, false, StateMutability::View)}, {"difficulty", make_shared<IntegerType>(256)}, {"number", make_shared<IntegerType>(256)}, {"gaslimit", make_shared<IntegerType> (256)}, {"random", make_shared<IntegerType>(256)} //Note that the data type is uint256, if you need other data types, refer to the type definition in libsolidity/ast/Types.h}); ……

Impact on the function: code location libevmasm/Semanticlnformation.cpp

 bool SemanticInformation::invalidInPureFunctions(Instruction _instruction) { switch (_instruction) { ...... case Instruction::TIMESTAMP: case Instruction::NUMBER: case Instruction::DIFFICULTY: case Instruction::GASLIMIT: case Instruction::RANDOM: //增加的random指令影响函数的Pure属性。return true表示该函数不能使用pure关键字。 case Instruction::STATICCALL: case Instruction::SLOAD: return true; default: break; } return invalidInViewFunctions(_instruction); } 

Modify virtual machine code

The definition of the random instruction, code location: hvm/evm/opcodes.go

 const ( // 0x40 range - block operations BLOCKHASH OpCode = 0x40 + iota COINBASE TIMESTAMP NUMBER DIFFICULTY GASLIMIT RANDOM //新增) var opCodeToString = map[OpCode]string{ ...... NUMBER: "NUMBER", DIFFICULTY: "DIFFICULTY", GASLIMIT: "GASLIMIT", RANDOM: "RANDOM", //新增 ...... } var stringToOp = map[string]OpCode{ ...... "NUMBER": NUMBER, "DIFFICULTY": DIFFICULTY, "GASLIMIT": GASLIMIT, "RANDOM": RANDOM, //新增 ...... } 

Definition of instruction operation: code location: hvm/evm/jump_table.go, add the operation attribute of the instruction

 instructionSet[RANDOM] = operation{ execute: opRandom, gasCost: constGasFunc(GasQuickStep), validateStack: makeStackFunc(0, 1), valid: true, } 

The above operation code corresponds to the definition of the function opRandom: code position hvm/evm/instrucitons.go, refer to the definition of the number function

 func opNumber(pc *uint64, evm *EVM, contract *Contract, memory *Memory, stack *Stack) ([]byte, error) { stack.push(math.U256(new(big.Int).Set(evm.BlockNumber))) return nil, nil } func opRandom(pc *uint64, evm *EVM, contract *Contract, memory *Memory, stack *Stack) ([]byte, error) { stack.push(math.U256(new(big.Int).Set(evm.Random))) return nil, nil }上述opRandom中使用了evm.Random,因此需要在evm结构体增加Random的属性。代码位置hvm/evm/evm.go 
 type Context struct { ...... Coinbase common.Address // Provides information for COINBASE GasLimit *big.Int // Provides information for GASLIMIT BlockNumber *big.Int // Provides information for NUMBER Time *big.Int // Provides information for TIME Difficulty *big.Int // Provides information for DIFFICULTY Random *big.Int //新增} 

The above adds the Random property, which needs to be initialized. The code location is: hvm/hvm.go

 func NewEVMContext(msg Message, header *types.Header, chain ChainContext, author *common.Address) evm.Context { ...... return evm.Context{ CanTransfer: CanTransfer, Transfer: Transfer, GetHash: GetHashFn(header, chain), Origin: msg.From(), Coinbase: beneficiary, BlockNumber: new(big.Int).Set(header.Number), Time: new(big.Int).Set(header.Time), Difficulty: new(big.Int).Set(header.Difficulty), GasLimit: new(big.Int).Set(header.GasLimit), Random: new(big.Int).Set(header.Random),//新增 GasPrice: new(big.Int).Set(msg.GasPrice()), } } 

The header obtained above is the header of the currently checked block. The addition and generation of header.Random is not covered here.

At this point, compile the source code and the virtual machine source code to add the Random command to complete the modification.

Build compiler

1. Download the source code: git clone

2, cd solidity && git checkout v0.5.7 #This example is modified based on v0.5.7 version

3. Modify the relevant code as described above.

4, compile the source code generation compiler

Binary compiler: mkdir build && cd build && cmake .. && make #Generate binary after execution: solc

Js compiler: Execute ./scripts/ #Generate js file after execution: soljson.js

5, use the compiler to compile the contract code

Use a binary compiler: solc –abi test.sol #产生abi

Solc –bin test.sol #generated data

Solc –opcodes test.sol #View opcodes

Using the js compiler: You can replace soljson.js with remix for testing. Need to build a remix environment and modify the loading path of soljson.js or write your own js script for testing.

6. Modify the virtual machine code and deploy it to the test chain according to the previous introduction. Use the generated abi and data to perform the chain test. The contract deployment and calling process are not described here.

Note: If you have any questions, please leave a message below to contact our technical community.

Wang Xiaoming blog:

Wang Xiaoming: Founder of HPB core chain, Babbitt columnist. More than ten years of experience in financial big data and blockchain technology development, he has participated in the creation of UnionPay big data. The main creative blockchain teaching video program "Ming Shuo" more than 30 issues, compiled the "Ethernet official website document Chinese version", and as the main author wrote the "blockchain development guide", in the Chinese blockchain community with ID "blue lotus "famous.