One article goes through: Is the blockchain a database?

Source of this article: stakefish
Original title: "A Database is not a Blockchain" Author: source and compiled: stakefish

"Distributed database" and "distributed ledger" are often understood by many as another name for blockchain. Many times, they "look the same" or even "use almost the same."

So can you equate the blockchain with the database?

An article from the Cardano community explores this topic using data as a keyword. Share the main points in the abstract article of stakefish to the community for everyone to find their own answers.

It is often said that the blockchain is a slow and expensive database with poor scalability. Is this really the case?

Indeed, blockchains will never be as fast as traditional databases. However, blockchain is better than databases, and we need to understand these advantages.

In this article, we will discuss what a blockchain is from a data perspective, and then look at the most important differences between a blockchain and a database.

"Blockchain" to "Blockchain"


From a data perspective, the blockchain uses blocks to store data, which is a very similar structure to a "Linked list". A linked list is a linear data structure. The entry point is called the linked list header. Each element in the linked list is a separate object consisting of data and pointers, and the pointer of the last block is null.

Linked List Structure

The blockchain adds a feature to prevent historical data from being tampered with.

In 1991, the two authors, Stuart Haber and W. Scott Stornetta, studied cryptography-based "chain of blocks" for the first time, and wanted to build a time stamping system that cannot tamper with documents.

In 1992, Bayer, Haber, and Stornetta embedded the Merkle tree into the design, collecting multiple document certificates into one block, improving the efficiency of the block chain.

Cryptography-based "blockchain" structure

Note that the term "blockchain" was not used at the time, but "chain of blocks" was used. Satoshi Nakamoto also used the term "blockchain" in the Bitcoin white paper.

Today, concepts are redefined. Many projects and IT giants are talking about blockchain technology. We already know that the original blockchain actually refers to the data structure of the "blockchain" based on cryptography. Later, the meaning of blockchain changed as people talked about it. The term blockchain is more widely used. To represent distributed networks with the same data structure, they are more often referred to as "distributed ledgers."

Difference 1: Data access Only "CR", not "UD"

Common databases do not use "blocks" but use "tables". A table is a collection of related data stored in a table format in a database and consists of columns and rows.

In a relational database, a table is a set of data elements (values) that uses a model of vertical columns (identifiable by name) and horizontal rows to form cells where rows and columns intersect. A table has a specified number of columns, but can have any number of rows.

data sheet

One can use four basic operations on data in a database: create, read, update, and delete (CRUD).

However, the blockchain allows only two operations: create and read. The blockchain can only attach a complete block (including transactions) to the end of the blockchain. Data cannot be updated or deleted after adding.

Databases allow people to constantly change or even delete data stored in the past. The blockchain intentionally keeps historical data unchanged and always available.

Difference 2: data authority "a group of administrators" ≠ blockchain node

In addition to "what operations are allowed", "who will operate" is another important aspect of the difference between a blockchain and a database.

The database is maintained by one or a group of administrators. The administrator has the right to do whatever he wants with the data (CRUD four operations). Managers are usually employees of large companies and must follow the rules set by the company's owners, granting users limited power to create, read, modify, or delete data.

However, even if the user enters the correct data, the administrator can still modify or delete it. If there is a dispute over the correctness of the data, the user does not have or has only limited modification rights, and the administrator always has more rights than the user.

There are no administrators in the blockchain to modify and delete permission data. The nodes in the network must agree on any data to be added. Once blocks are added and confirmed, no one can easily change historical data, and people can always confirm what happened in the past through the blockchain.

The blockchain replaces the single server maintained by the administrator, replacing it with a set of independent nodes to reach a consensus on what is added.

From the perspective of direct participants, the private blockchain between a few entities can be viewed as a distributed, decentralized system. If a private blockchain is used in a single company, it will still be a centralized solution, although it has some of the advantages of a distributed system. For a single company, a database may be a better choice.

Difference 3: data backup "redundant database" ≠ blockchain

"Data replication" in traditional databases is mainly to prevent data loss. It cannot prevent tampering with historical data or administrators to rewrite data. If one server accepts the changes and the other does not, the data may be inconsistent.

Data replication

The blockchain uses decentralized consensus to concisely solve the above problems. Once all or most nodes in the network agree to add a new block, data is written to many hard disks. Even if the node producing the new block crashes immediately after this synchronization, the data can always be secured on other nodes, and the crashed node can subsequently obtain valid versions of all blocks.

Data is written to all disks after all nodes agree

Multiple backup databases cannot be as secure as the blockchain.

"Data replication" means that one server sends data to another server for backup. Before the data was stored, there was no consensus between the servers to store a certain version of the data. If one server sends invalid or erroneous data, the other server just blindly receives and stores it (some kind of data validation is still working).

In contrast, in a blockchain, most nodes must agree before storing a block on the blockchain.

Difference 4: Data Transmission Distributed System ≠ Decentralized Solution

The traditional database adopts a client-server architecture (client-server), which is a software structure model consisting of a client system and a server system. They communicate over a computer network or on the same computer. The master-slave architecture application is a distributed system composed of client and server software, but it is still a centralized solution with the server as the center.

The database is built on the server. Therefore, if there is only one server, it may cause a so-called single point of failure. Once the server is not running, all clients cannot communicate with the server and cannot communicate with each other.

From a data perspective, all clients must rely on the server, believing it is honest and adequately secured.

Nowadays, it is rare to see a network with only one server. In most cases, there are more redundant servers in the network. If one server crashes or is temporarily unavailable, another server will handle all requests on its behalf, but this is only possible if the data has been replicated between servers.

If a transaction or request is sent to the server, the data will be written to one database within a given time and then backed up to another database. There is usually some delay, and there may be inconsistencies in data transmission.

Difference 5: data storage immutable and proof of value

The database can be used in scenarios such as security monitoring, signaling, information collection and authorization. Many databases provide effective database features in the form of database triggers. When using a cloud database, data is usually important to only a few people, and implementing security in a database system is sufficient. Users can trust the database owner because there are other mechanisms, such as laws, that can solve problems that may arise.

When it is required to store immutable information, such as proof that the state X is valid for user Y at time Z, the advantages of the blockchain are manifested. It is suitable to preserve and prove ownership. That's why people can create digital currencies on the blockchain. This type of information cannot be altered by individuals and must be highly secure. The process of adding blocks is actually a process of adding many X-states to a large number of users in a trustless manner.

Differences and trade-offs

The database is very powerful, people can use it to achieve almost all the desired functions, but the unique features of the blockchain, the database can not do.

Let's review it and see what the traditional database can't achieve and the characteristics of the blockchain:

The data cannot be changed. The blockchain is essentially a decentralized distributed network. Data is written to many disks at the same time after reaching an agreement. It is very difficult and almost impossible to change historical data. The main difference is whether to implement a database in a decentralized manner.

Additional safety data. As mentioned in the previous point, new blocks will only be added if most entities agree. Therefore, it is not possible to insert some data that is considered invalid. Participants must strictly abide by the rules, and more independent entities will pay attention to the implementation of the rules.

No administrator. There is no such role as administrator on the blockchain to grasp the power to change any content. Nodes negotiate with each other and share responsibilities. Blockchain has the characteristics of trust-free and anti-deletion.

There is no single point of failure. This mainly applies to PoS and PoW consensus mechanisms. For DPoS consensus, problems may occur when several nodes are unavailable at the same time.

People can choose to use traditional database or blockchain technology according to different needs. When adopting a blockchain, different levels of decentralized data management are achieved by choosing a private or public chain.

It is difficult to achieve high scalability and maintain a high degree of decentralization. Data must be distributed around the world, so network latency must be considered. It takes some time to reach a global consensus. Blockchain will never be as effective as a database, but it can provide protection in terms of trustlessness, decentralization, and protection against tampering with historical data.

The decentralized approach also has the potential to replace traditional Internet giants, allowing more people to grasp and benefit from data.