Technical Guide | Libra Technical Interpretation! Explain the move syntax, interpreter, and introducer

This week will mainly introduce:

Move syntax, interpreter and validator

Move the small bench to listen to the class!

Move syntax

The white paper is described using a semi-formal description language. As for this set of description languages, the main symbols are explained as follows:

  • =: definition
  • ::= : Assignment
  • ⇀: mapping
  • ×: product type, that is, the structure
  • ∈: indicates that it belongs to a type or an element in the collection

With these symbols, Move defines the following syntax types:

Global state: The address to the account map, the account consists of Resource and Module. The formal definition is as follows:

Modules: consists of a name, a structure declaration, and a procedure declaration, which can be simply understood as a C++ class. The module is externally indexed (accessed) by ModuleId, and the structure is externally indexed by structId. The structure declaration is a product type of <kind, FieldName->non-reference type>.

Module defines the scope of the resource, similar to the function of the c++ namespace. There are several important functions built into the Module for type operations:

  • Pack and Unpack resources for creating and destroying modules from unrestricted types;
  • MoveToSender/MoveFrom publishes or deletes the corresponding resource under the current account address;
  • BorrowField : Get a reference to a field of the structure

Types: contains basic types (bytes are fixed-size strings, AccountAddress is 256bits), structure types, non-reference types, and

The type declared as a resource (labeled resource kind) is removed from the Module, and the remaining types are collectively referred to as unrestricted types. Variables or fields of a resource type can only be moved, and references to resource type variables cannot be dereferenced and cannot be repeatedly assigned. In addition, an unrestricted struct cannot contain a restricted field. The reason is very simple. When an unrestricted structure is assigned or copied, if there is a restricted field, the field will not be actually manipulated.

Values:

Move supports references. References are short-lived, so they cannot be used to define fields of a structure, nor can they reference references. The declaration period of an application is the execution of a transaction script. By using Borrow{Loc, Field, Global}, you can get a reference to a local variable, a structure variable, or a global variable (click on the blackboard, learn rust).

In addition, because the reference cannot be stored in the struct, it can be guaranteed that the struct must be a tree without backedge. This is also the most important point that move is easier than rust, so move does not require a complicated lifetime. Therefore, it is also impossible for Resource to have a graph structure. This really simplifies the processing of the language.

Procedures and transaction scripts:

The signature of the procedure contains the access control modifiers for the function, the parameter type, and the return value type. The procedure declaration includes a procedure signature, a local variable, and a series of instructions. (The author believes that this statement is understood to be more appropriate for definition). A transaction script is a process that is not associated with a specific module, so it will not be reused, the global state transition of the transaction script operation, and the modification of these states will either all succeed or all fail.

ProcedureID identifies a procedure that is uniquely identified by the moduleId and the procedure signature, and Call specifies that it be called as the first argument. This also means that function calls are statically determined, and there are no function pointers or function tables. At the same time, the process dependencies in the module are acyclic, and there is no dynamic assignment of the module itself, which enhances the immutability of function calls during execution: that is, the call frame of a procedure in the execution process must be adjacent. Therefore, it also prevents a re-entrancy attack similar to that in Ethereum (this is a famous attack that causes the fork to exit ETC).

Move interpreter

Move's bytecode instructions are executed in a stack-type interpreter. The benefits of the stacker are easy to implement and control, and require less hardware environment, making it ideal for blockchain scenarios. At the same time, it is mentioned that compared with the register emulator, the stack interpreter makes it easier to control and detect copy and move between different variables.

The definition of the interpreter is as follows:

The interpreter consists of a Value Stack, a Call Stack, and a global variable reference counter and a GasUnits (like Ethereum's Gas Limits). CallStackFrame contains all the context information for a process execution as well as the instruction number (the instructions are uniquely encoded, reducing the code size, and the usual processing methods). Locals is a map of the variable name to the value of the runtime.

The bytecode interpreter supports procedure calls (nonsense). When a Call instruction is executed in a procedure to call another procedure, a new CallStackFrame object is created, and the corresponding call parameters are stored on Locals. Finally, the interpreter starts executing the new contract instructions. When the execution process encounters a branch instruction, a static jump occurs within the process (that is, the jump before the Basic Block). The so-called static jump actually means that the offset of the jump is determined in advance, not Will jump dynamically like evm. This is the no dynamic dispath mentioned earlier. Finally, return is called to end the call, and the return value is placed at the top of the stack.

The method of Gas measurement is the same as that of EVM. Each instruction has a corresponding Gas Units, which is executed once, minus the consumption of the corresponding instruction, until it is reduced to 0 or all instructions are executed.

Move's instructions include 6 categories:

  • Variable operation: CopyLoc/MoveLoc implements copying and moving of data from local variables to the stack, and StoreLoc is storing data back to the local
  • Constant/numerical/logical operations
  • Resource operations such as Pack/Unpack/MoveToSender/MoveFrom/BorrowField, the specific explanation can be seen in the previous article.
  • Reference related instructions, including ReadRef/WriteRef/ReleaseRef/FreezeRef, where FreezeRef converts a mutable reference to an immutable reference
  • Control flow structure, including call and return, branch, BranchIfTrue, BranchIfFalse, etc.
  • Blockchain-specific operations, including getting a sender of a transaction script or creating an account.

A detailed list of instructions is listed in the White Paper's Appendix A.

Bytecode validator

Validators are seen in many compilers, such as the commonly used SMT prover Z3. The core function of the validator is to ensure that the security features of the language (contract) are satisfied and enhanced during the compilation phase. Validator static verification is a necessary step in the issuance of contract scripts.

The status of the validator is as follows:

For a transaction script that may contain multiple modules, verify, the verification result returns ok, or various errors that do not meet the conditions.

The binary format of the Move module encodes a collection of entities that are placed in a table, including constants, type tags, structure definitions, and process definitions. There are three main types of testing processes:

  • Structure legal check: ensure the integrity of the bytecode table (well-formed), detect the error of the illegal table index, duplicate resource entities and illegal type signatures, such as referencing a reference, etc.
  • Semantic detection of process definitions: including parameter type errors, dangling indexes, and resource duplication definitions.
  • Error when linking, illegally calling internal procedures, or linking a process that declares a definition and does not match.

The following focuses on semantic detection and link-time error detection.

Control-flow graph construction

The validator will first create a control flow graph of the bass code of the BasicBlock. A BasicBlock can be understood as a block of instructions without a branch instruction in the middle.

Stack balance checking

Detects the access scope of the callee in the stack, ensuring that the callee of the contract cannot access the caller's stack space. For example, when a process is executed, the caller first initializes the local variable in the CallStackFrame, and then puts the local variable into the stack. If the height of the current stack is n, then the valid bytecode must satisfy the invariance: When it reaches the basic block At the end, the height of the stack is still n. The verifier is mainly to analyze the possible impact of the instructions of each basic block on the stack, and to ensure that the stack space with a height lower than n is not operated. The one exception here is that a block that ends with return must have a height of n+m when it exits, where m is the number of values ​​returned by the procedure. (This special operation is a bit ridiculous. Is it that the height of the stack is placed in the first parameter of the process by default? This can be further tested when exiting. It is confirmed later, because it does not support multiple return values ​​at present, so Just added together).

Type checking

In the binary format, the type of the local variable is well defined, but the value of the stack does need to be derived. In each basic fast, this derivation and type detection are performed independently, because the height of the stack that is guaranteed to be accessed by the calling process is legal, so that the type of variables in the stack can be safely derived at this time. The specific test is to maintain a corresponding Type Stack for the Value Stack. When executing, the TypeStack also pops and pushes with this instruction.

Kind checking

The difference between kind and type is that type may contain an alias. Kind of inspection main inspection resources are satisfied

  • Can't double flower
  • Not destroyable
  • Must have attribution (return value must be accepted)

For non-resource types, there are no such restrictions.

Reference checking

The syntax of a reference includes a variable reference, an immutable reference. So reference detection combines dynamic and static analysis. Static analysis uses a borrowing mechanism similar to the rust type system to ensure that: 1. So the reference must point to an already allocated storage to prevent dangling; 2. All references have secure read and write permissions, and reference access can be Sharing can also be excluded (that is, limited read and write permissions).

In order to guarantee 2 points, BorrowGlobal will dynamically count the references to global variables when it is called. The interpreter will judge each released resource. If it is borrowed or moved, it will report an error again.

Linking with global state

When linking, you need to check whether the linked object and the declaration match, the access control of the process, etc.

The above is the static verification of most of the current Move. It can be seen that each process has very strict analysis and restrictions to ensure the safe transfer and access of Resouce to the greatest extent.

Finally, summarize all the states and transitions of the virtual machine as follows:

The Move virtual machine implements the transfer of the global state by executing a script in the block transaction. E represents the state modification set generated by the transaction script for an account (can be understood as the read/write set of XuperChain):

The virtual machine sequentially executes each transaction of the block, producing a series of Es, and the previous E is valid when the subsequent transaction is executed.

The current vm is a serial execution transaction and then produces a series of read and write sets. However, when Design is designed, Move has considered predictive execution to generate read and write sets, and then merges to resolve conflicts based on resource access paths (which can be compared with XuperChain's read-write set version). .

Finally, we will talk about the future planning, focusing on improving the type system and providing more class library support.

This white paper sharing series ends here, you can see as a whole, Move uses the logic type, module. System has done a lot of static detection on the transfer control of resources to ensure the security of asset transfer. Compared with EVM, it avoids many problems. After Libra main online line, it may be caused by DeFi Dapp in Ethereum. Small impact.

Author: Baidu super chain XUPER

Source: Baidu Super Chain (WeChat Public Number)