Don't let the analytics company sell you: read the privacy and privacy of Bitcoin

Bitcoin is neither completely anonymous nor completely transparent. The bitcoin privacy challenge lies in a gray area: the exposure of users' financial activities ultimately depends on the capabilities of the investigator and the complexity of the tools chosen by the user. There is no perfect privacy solution for any activity on the Internet; and in many cases, there is no one-size-fits-all solution, which means that privacy-conscious choices are often accompanied by a balance of cost and ease of use. In addition, privacy has never been static, but has evolved to echo the struggles between those who build privacy protection and privacy tools.

01 Preface

As we have seen in the evolution of encrypted messaging, virtual private networks and free knowledge programs such as the Tor project, Wikipedia, and Signal, technology can be a tool for achieving freedom if the right values ​​are built. But as we have seen from a centralized platform such as Facebook, technology can also become a monitoring tool or even a social machine.

Unless we are now clearly stating that help platforms and protocols put the concept of user privacy and decentralization into the mind, large-scale monitoring and social credit systems will be our inevitable future.

02

Key points

  • Bitcoin is only semi-anonymous; the agreement does not know your real name, but through various methods, you can still link to you by transaction information.
  • Blockchain Analysis is committed to anonymizing Bitcoin activities and selling relevant data to companies and law enforcement agencies.
  • Learn how the Bitcoin system works, master the use of tools such as Tor, coin control, coin combiner, etc., and avoid the reuse of addresses, protecting your identity and trading information from exposure. important.

03 Why pay attention to cryptocurrency?

When observing cryptocurrencies from a protocol perspective, it is obvious that it is more privately oriented than traditional digital payment methods. At the basic level of these protocols, although there is usually no mapping between the user's key and the real world, it gives us the freedom to store and transfer wealth on a global scale with unprecedented levels of freedom.

The degree to which cryptocurrencies contribute to privacy is by no means small or binary—it varies greatly depending on the user's choice of core and assistive technologies, usage patterns, and individual choices of attackers' capabilities and complexity.

We can observe that the acceptance rate of cryptocurrencies (especially bitcoin) is growing in countries where residents' economic freedom is limited, such as Venezuela. Although cryptocurrencies have significant advantages in building anti-censorship trading networks and monetary policies that are not undermined by various forms of government, these advantages are of little utility as long as authoritarian regimes can anonymize user information and arbitrarily sue relevant users. This issue needs to be taken seriously by us.

04 Bitcoin Privacy Introduction

Bitcoin is neither completely anonymous nor completely transparent. The bitcoin privacy challenge lies in a gray area: the exposure of users' financial activities ultimately depends on the capabilities of the investigator and the complexity of the tools chosen by the user. There is no perfect privacy solution for any activity on the Internet; and in many cases, there is no one-size-fits-all solution, which means that privacy-conscious choices are often accompanied by a balance of cost and ease of use. In addition, privacy has never been static, but has evolved to echo the struggles between those who build privacy protection and privacy tools.

The Bitcoin protocol evolved over time, which caused dramatic changes in privacy features. Changes in core protocols are not limited to simple choices between privacy and transparency, but are often accompanied by changes in security, scalability and transparency, and backward compatibility of the software. Historically, the Bitcoin community is more inclined to choose privacy than transparency, but the former is more conservative than the cryptocurrency that focuses on privacy.

Therefore, people considering using bitcoin to escape the monitoring of authoritarian governments or businesses need to understand what type of traces they will leave on when they use Bitcoin, and whether the privacy attributes of Bitcoin are sufficient to meet their needs. However, achieving this level of understanding requires some effort.

05 tracking transactions

When you trade on the Bitcoin network, you will leave two traces. They can be classified into "chain information" and "chain information." Information on the chain does not directly link your identity information to transaction information, but it reveals information that can correlate your transaction information with others. The second category, the information under the chain, is associated with your identity information and transaction information.

06 chain information

When you trade on the Bitcoin network, you sometimes send/receive bitcoin to/from people who know you. The latter will have off-chain information that associates your identity with transaction information.

When you combine the above facts with the fact that your trading information can be associated with others, the result is that some motivated entities can sometimes clarify how you use your own bitcoin, how many bitcoins you have and who you are with. Trade this information. Even if you don't trade with someone who knows you, there are still countless ways to associate your same transaction information, because bitcoin transactions are usually done on the network in a non-encrypted package, source IP address. It can be traced back in several ways. When trading through a full node such as Bitcoin Core, some triangulation or target traffic is required to estimate the source IP address, while other "light" wallets such as mobile wallets (blockchain wallet, Coinbase wallet) It is usually done through a company-operated server that can directly see your IP address and full transaction history.

The IP address database can usually give a rough estimate of your physical location via an IP address. You can use this link (https://www.maxmind.com/en/locate-my-ip-address) to test it yourself and enter the coordinates of your interface in Google Maps. More importantly, your IP address exposes your Internet Service Provider (ISP), who knows the real-world identity of the IP owner, and the ISP is usually legally obligated to store this information for several months.

Even if you use public WiFi to conduct transactions, you may still accidentally connect your real identity to the websites you visit and the backend services that your devices connect to. When you start your laptop, your Dropbox app is happy to connect it to the corporate server, which will associate your IP address with the Dropbox account in the company's server logs. Even if you do not access any personal website account, the cookie information stored on your laptop can be associated with the cookie information of your previous browsing history to show you your identity to the website you are viewing. Many websites allow third parties to track such users for analysis – only Google estimates that it will track users in 80% of the entire network.

Even if you clear the cookie information, the website operator can track you through different websites. As long as your browser imprint is unique, you can associate your IP address with the identity information. Even if you don't run any services and avoid browsing some content together, your device's MAC address will still be exposed to the network provider, in a number of complicated ways, which may be associated with your true identity. So, even if your IP address is not traced back to you via ISP records, you may still leave other traces when using your private device. The worst privacy category is of course the use of KYC's third-party services as your bitcoin wallet, as these services will record all your trading information and real-world identity.

You may also be associated with the latter by using a web-based tool to search for bitcoin addresses and transaction information, because no one except you will search for relevant information online for no reason. When you search for trading information or make a transaction, the best known way to hide your source device and IP address is to use the Tor stealing service. Many wallets, including Bitcoin Core, use it as an option, while others include it. The Tor browser is a useful tool for your web-based bitcoin-related activities because it not only hides your IP address, but also clears cookie information every time you log out, blocking third-party cookies, and is not subject to most browsing. The impact of fingerprint recognition technology.

06 chain information

We can use the block browser to gradually understand the types of information revealed by the Bitcoin blockchain. To practice, we will use the open source block browser blockstream.info.

As of the writing of this article, the latest block (March 8, 2019, block number #563899) contains 2,122 transactions. Let's take a look at what the freely selected trading information reveals.

The transaction information includes the input value and the output value and is identified by the transaction ID (visible from the top of the above figure). If your Bitcoin wallet initiates a transaction, each transaction will be associated with a similar identifier.

From a high dimension, the information revealed is as follows:

  • The approximate time the transaction was mined (from the beginning of the block)
  • The address and quantity to which bitcoin is sent (for example, "transaction output value")
  • Source of trading funds (for example, input value)
Let's take a look at each part of the above transaction (https://blockstream.info/tx/e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8). The 07 time transaction was not time stamped, but the block has a timestamp. The block timestamp is not necessarily correct, but considering that most miners report the time truthfully, the timestamps of all blocks should be correct within a few hours. For those blocks that are mined by honest miners, they will be correct. This does not mean that the block timestamp must be accurate to within a few hours of the transaction broadcast time, because sometimes it takes longer to package the transaction into the block. Some block browsers see the time of the transaction for the first time by showing the time it first saw a transaction on the network.

The time for the above transaction to be packaged into the block can be obtained by looking at the block header (in our case block #563899, its timestamp is 2019-02-20 14:45 UTC) 08 bitcoin is turned to the address and the amount of this The receiving address in the transaction is:

1: 32Z63LVtUERdEEwz275JHt3o4cewPfE8YC 0.26119849 BTC

2: 31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2 0.2214705 BTC

An address is not just as simple as it seems on the surface, it is not always just an indicator of a user's key. The address is in fact the descriptive symbol of the spending rules that someone will want to transfer bitcoin next time. For example, if you send a bitcoin to the address 37k7toV1Nv4DfmQbmZ8KuZDQCYK9x5KpzP, the configuration of the address change is like this: instead of spending the bitcoin on the owner of the specific private key, the token is released to any one with the same SHA-1 hash. And the ability to provide two sets of different strings for people's spending rules (this means that the SHA-1 function has failed, which happened in 2017 – so don't send any bitcoin to that address). It's worth noting that since many of the address formats used today are hashed when we send them bitcoin, we usually don't know how the spending rules are until someone transfers the bitcoin from that address. Because they need to disclose the hashed content in order to achieve their goals.

In our trading example, the blockchain shows that bitcoin has been spent from two addresses, so the cost rules for these addresses are known. In trading f491dfe9867c36e85950116a90a6128060d6070866ad0f (https://blockstream.info/tx/f491dfe9867c36e85950116a90a6128060d6070866ad0f3598d70d146750162f) in, 3598d70d146750162f32Z63LVtUERdEEwz275JHt3o4cewPfE8YC (https://blockstream.info/address/32Z63LVtUERdEEwz275JHt3o4cewPfE8YC) appears as a 2/2 multi-signature address. We'll look at the next section to see how the information reveals the above.

Similarly, 31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2 (https://blockstream.info/address/31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2) is a frequently used 2/3 multi-signature address, which was written at this address with approximately 2,700 bitcoins (worth $10.6 million). More advanced blockchain tools (such as oxt.me) can even draw wallet balances over time and display the most active time periods it sees with near-accuracy. Historical balance and activity at address 31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2 | Source: oxt.me 18:00-22:00UTC is the period of least activity for this address, a reasonable assumption is that this corresponds to the local time of the person controlling the address at 01:00 -05:00 or 02:00-06:00. Considering the active time period of the address change, the transaction volume, and the multi-sign/name setting, it can be guessed that the address belongs to the time zone located in GMT+7/8.

If you have a good privacy practice, please never reuse the Bitcoin address, which will help you cut off the transaction association. This is also a good idea for all P2SH addresses (all 62-character addresses starting with "3" and starting with "bc"), because when you disclose the cost rules for that address, you have moved to a new one. Bitcoin is sent at an address that is hashed and whose cost rules are unknown. HD wallets can generate multiple addresses but only need one backup seed to reach the funds. These wallets automatically generate a new address each time you receive a new transfer.

Now let's take a look at the transaction to see what we can also interpret from the tokens we have sent.

Bitcoin transaction results are usually directed to two addresses, one of which is the true payment situation and the other is called the "change output" (return output) returns to the sender. This is similar to how you paid $5 for a $3 item, which creates two payments: one for $3 for the item, and the other $2 for the change to the payer.

Heuristics are required to identify changing output values. Examples of heuristics that can be used to distinguish change output values ​​from other payments are: the use of round numbers (the number of special currency or the value of the legal currency when trading), the order in which the values ​​are output in the transaction, and so on. In our chosen transaction, it is easy to see the change output value because it returns the same address as the bitcoin, as we will see in the afternoon.

In general, different Bitcoin wallets behave differently and leave different traces on the blockchain—similar to how the browser displays its information when browsing the web. Therefore, it is sometimes possible to identify certain transactions as they come from a particular Bitcoin wallet application.

If your investigator knows which wallet app you are using, this will help it correlate your identity with the transaction information, weakening your privacy. Every piece of information helps him understand who you are and what you are doing.

09 trading source

In Bitcoin transactions, the “source of funds” will always come from other unsold transactions, or more precisely, unspent transaction output (UTXO). In the block browser, what you see is a combination of decoded raw blockchain data and derived data. A block browser might display transactions like this:

From Bitcoin.com

The source of funds here is shown as an address. However, the "funding source" of the Blockstream browser can be displayed as a transaction.

The reason Blockstream does not display the source of the transaction funds as an address is because the address is not strictly part of the transaction input, and it does not always infer the starting address of the transaction. In addition, because address reconstruction is discouraged, it can break the public's mentality in traditional payment systems and no longer display the sender's address, making the user mistakenly believe that the funds can be returned to the recipient.

First, technically, if you run a full node (or use a trusted Internet), you can go to the local copy of the Bitcoin blockchain to see the raw data decoded by the transaction. As shown below:

The e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8 decoding of this funding source is described as a Vin-Array. He did not mention the mention of an address, but it would point to the output of the previous transaction. 593e2d5c65b3505d897a13033741037d6c59e683b3345314a58253a8f1572758 is vout: 0, which is the first output of the transaction (vout: 1 means the second output of this transaction, and so on). This unspent transaction output (UTXO) is the source of funding. At this point it can be clarified that the “source of funds” is neither an address nor a transaction, but a specific expenditure for a particular transaction. A clear understanding of this will help you protect your privacy when using Bitcoin, as explained in the following sections.

Source of funds for e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8. We can further decode the content of the transaction from the decoded raw data, for example from txinwitness for more information about the source of funds. The last line of hexadecimal string characters in Txinwitness shows a 2-of-3 multi-character script, which is a wallet we can infer that might use an exchange.

The other two hexadecimal strings in .txinwitness are just signatures that satisfy the 2-of-3 multiple condition.
Now that we have defined the source of funding, we can see that in this example, there is an output of 0.48298999 BTC, even if the amount paid has only a portion of $1,000. This has a bad consequence: imagine a scene where a friend pays you $10, but the trade shows that he owns a million dollars and has the right to send the full amount immediately, which is not particularly good for privacy protection. of. If you are concerned about revealing your wealth when sending Bitcoin transactions, you should look at which inputs are used in your trading. 10 stitching information

Since transactions always need to provide a source of funding, transactions are linked together to produce a so-called trading map. If you send Bitcoin to your friends, your friends will see your money input in the transaction, but you can also see when your friends send these tokens and the tokens will be sent to those addresses. .

Some bitcoin addresses are well known, such as Bitfinex's cold wallet or the seized Silk Road token. An address can be known because it is an entity, such as a company or a charity, because they will publicly collect or donate addresses on the promotion website or inadvertently disclose the address through forum posts and law enforcement records. Blockchain analysis companies will regularly search the network to collect such information.

Other addresses will also be revealed through association through cluster analysis.

11 cluster analysis

Let us return to the previously listed transaction e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8. Now we can immediately see the source of funding for this transaction, and our transaction (red dot) has been used to fund the third transaction (Blue Dot).

Transaction map for e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8
In particular, the transfer of the second transaction and the transfer of the first transaction of our transaction were included in the financing of the transaction, which were previously sent to the following two addresses: 3Qt1YaJwQwtHMb4mjJ41DZVawWXih9LGMq
32Z63LVtUERdEEwz275JHt3o4cewPfE8YC
In the interface, this seems to be two separate addresses, each with only one seemingly unrelated turn-in and roll-out transaction. But because their private keys can be signed on BlueDot transactions, these addresses now belong to a small cluster (including the other 407 addresses that the transaction transfers to), and we can assume they belong to the same user. This heuristic has had many different names in the past, the latest being called common-input-ownership-heuristic.
Blue dot transaction f491dfe9867c36e85950116a90a6128060d6070866ad0f3598d70d146750162f atlas
Blockchain analysis companies will use this heuristic to create huge clusters. The blockchain browser WalletExplorer has fixed these two addresses in a cluster of 162,787 addresses. The analysis company will classify the clusters by tag (IP address, user account, organization, and real name). They can fix each address in a cluster, thus drawing the ecology of bitcoin transactions. Next, they will sell access to these data sets to law enforcement agencies and other companies. Many blockchain analysis companies receive information about transactions directly from their users, such as cryptocurrency exchanges. But two head analysis companies, Chainalysis and Elliptic, have stated that they will not trace back any individual transaction information they have acquired, but only to exchanges or other commercial entities.

It only needs to de-anonymize an address in the cluster to de-anonymize the entire cluster.

12 anti-cluster analysis

At present, we can see that user identity can be associated with Bitcoin's address and transaction in a variety of ways, and Bitcoin transactions can be linked to each other in a variety of ways. Combining these leaked information will reveal all of our financial privacy.

 
Some Bitcoin users deliberately try to crack the methods used by these analytics companies, using tools and techniques to make analysis more difficult. Some techniques can reduce the effectiveness of the analysis by distorting behavior, while others try to avoid being analyzed. Bitcoin wallets can help users by automating these technologies or providing them on user pages.
Here are some of the initiatives:
  • When creating a transaction, the order of the rollout is randomized to reduce the accuracy of the change rollout detection.
  • Avoid HD wallet address reuse.
  • PayNym is a publicly sharable ID that allows you to receive payments in different unrelated addresses, and this payment is known to the originator and yourself. PayNym allows each transaction to be assigned a new address without having to actually create a new address each time. This is very helpful if you want to go online to receive Bitcoin donations.
  • Token selection / token control. The wallet can be designed to prioritize the selection of fewer addresses for cluster analysis by more carefully selecting the revenue in the transaction, or to allow the user to manually select transaction transfers to avoid revealing ownership of certain tokens.

Bitcoin Core's Token Selection – A more avant-garde technology that allows users to manually select the source of the transaction to increase privacy protection is CoinJoin. CoinJoin is a solution to add some of the capital input of different users to the joint transaction before the broadcast transaction.

In our case, we can see how the transfer of trading funds refers to the specific rollout of the previous trade, not the entire trade:

Source of funds for e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8.
However, the capital input and output of each transaction will not affect each other in any way; as long as the number of bitcoins entered is sufficient to pay for the output, the transaction is valid.
A CoinJoin transaction initiated by Wasbi Wallet is divided into a number of equal pieces, so it is impossible to determine any funds invested and fees. The result is that there will be many sources of funds in a single payment. The place to send is difficult to distinguish. Technically, there is no need to hide the source and destination of the transaction, but because it has been mixed and disrupted, it is difficult to prove which address initiated the transaction and which user's bitcoin flows to which address.

Another interesting aspect of this type of transaction is that they can make the concept of heuristic co-input ownership more complicated. These funds are all marked as belonging to the same owner, but this is not the case in this transaction. These images show the error clusters of independent payments generated using the hybrid technology.

The Wasabi wallet's mixed currency transaction ID is from left to right.
72046c65fa25724f11c91f35799f69b66072bc07b2b4e3fc363852c2506b2b90, d7a428a8e3d69f236519cb999dbcb47b3b283548875371da567259be806e35ea, 20cf4fa2f685167f46682dd30c7720a06618656939fadbd1f20e3d471d08dfbb (oxt.me).
Because these transactions have seemingly odd amounts of capital output, these transactions are easier to resolve and can be more easily removed from the cluster analysis tool. When the user wants to confuse the source of the payment and the destination of the payment to the outside world, the equal amount of the coin transaction can be understood as an obfuscator.
A recent project called PayJoin or Pay-to-EndPoint (P2EP) also uses the same principle to create transactions that are indistinguishable from ordinary transactions. This emerging transaction type mixes the funds input from the payer and the payee, and when the actual payment is made, the payment amount is transferred directly from the sender's output funds to the recipient's output, thereby paying The corresponding amount of the recipient.
A template for a mixed transaction: the sender pays the recipient 0.5BTC and shuffles the funds in the process
 
This transaction is not too confusing, but it will incorrectly trigger common input ownership heuristics. More importantly, when it is triggered, it will not leave the analysis company with the clue that the capital input is not suitable for aggregation (the analysis company needs this clue to avoid false positives). If mixed transactions are universally applied, the false common input ownership will be so large that the heuristic itself is not reliable, which will be a huge blow to the blockchain analysis company. 13 lightning network

Lightning Networks is a beta technology developed on top of the Bitcoin protocol to facilitate low-cost, instant payments. Lightning Network Wallet users can currently use Lightning Network. Lightning transactions are different from base-level transactions in many ways, and they are more advantageous from the perspective of privacy protection.

Lightning transactions are not stored on a public bill:
  • Lightning network transactions use onion routing, but do not broadcast the final payee to other parts of the network.
  • Lightning network transactions do not disrupt the output of funds and do not get together.

The Lightning Network is a channel system that requires mobility; currently merchants and users who accept lightning network payments are a small part of Bitcoin users, not all payments (especially huge payments) can be spread through the lightning network, after a while There should be some improvement. This also means that although Lightning Networks can provide better privacy protection for transactions in its channels, these channels still need to be funded by regular Bitcoin transactions, which in turn are subject to the privacy issues described in this article.

Another problem is that unlike the base bitcoin payment, the payee of the lightning network payment needs to run the lightning network node. Your node will use TCP / IP to communicate with other nodes. Whenever your node interacts with others on the network (send, receive, or route other payments), someone will know your node and the existence of the public key and IP address. From the public key, it's easy to find out which channels are open between you and other nodes and how many bitcoins are submitted when you open each channel. For private channels, the IP address is only displayed to the channel you are open to, but for the public channel, it will be displayed to the entire network, and even someone may detect the current balance of the channel to determine if you will be the next target.

When you operate a lightning network node, you should assume that your channel balance is known and that they will connect to your IP address. Therefore, operating a lightning network node on Tor is a good choice for privacy protection.

Lightning Network is currently in a period of rapid development, and many properties will change dramatically in the near future.

14 agreement innovation

Here are some techniques to improve privacy protection, which are developed from the underlying Bitcoin protocol:

  • Schnorr Signature: A signature scheme that, among other improvements, makes the multi-signature address and single-signature address indistinguishable.
  • Scriptless script: A method of using scripts without revealing actual payment rules.
  • Taproot: A technique that does not distinguish between transactions of various spending rules.

15 summary

This article is intended to introduce how bitcoin privacy protection works. Due to its anonymity and transparency, the Bitcoin blockchain makes privacy ultimately dependent on the tools used by the user and the company being monitored. If the companies that monitor are analyzing the blockchain, users who do not have privacy protection are likely to disclose some financial information, which is very dangerous.

16 extended reading

In order to fully understand the inside story of Bitcoin, Andreas Antonopoulos's "Proficient Bitcoin" is a good resource and has been translated into multiple languages.

The privacy page on the Bitcoin Wiki provides a more in-depth look at several of these topics, recently updated by Chris Belcher. The Blockstream block browser has also recently been patched to show the "privacy rating" of the transaction, which can be used to learn more about what conclusions can be drawn from the transaction information.

Special thanks to Adam Gibson, Tomislav Dugandzic and Simon Bohlin for their thoughts and feedback.
Author: Eric Wall
Translator: Wang Zelong, Fang Chen
Produced: Carbon Chain Value (ID: cc-value)