Technical Guide | Using Pubma to perform chaotic testing of the network

In the development of the alliance blockchain, it is necessary to simulate the jitter of the blockchain network and test it. This article will use the `pubma` for `chaos-testing` for the hyperchain platform.

First, the problem is clear

Currently we need to simulate the following scenarios: 1. The performance of nodes under high network delay; 2. The performance of nodes under high network packet loss rate.

Some bugs are tested under more severe network conditions, and according to traditional experience, we will find ways to deploy the environment to the Internet for testing, but this method is very primitive, and the results are also a lot of contingency. It is very difficult for us to reproduce the same results, so we use container technology to simulate different network conditions in order to get a controllable network jitter test environment.

Second, the introduction of pubma tools

First, a brief introduction to the container network Chaos-testing tool `pubma` and instructions for related use.

What is Pubma?

I believe that the friends born in the 90s know that a movie called The Lion King(狮子王) has a character named Pumbaa . In Swahili, Pumbaa 's consciousness is "keep stupid, ignorant, and lazy." . Of course, the meaning of this tool from the name is probably to simulate a "stupid, unpredictable environment."

What can Pumba do?

Simply put, Pubma can do the kill , stop , remove , pause including the Docker container.

Of course, Pubma can also perform network simulations, including a series of network problems (delay, packet loss, use of different packet loss models, bandwidth limitations, etc.). For network simulation, Pumba uses the Linux kernel tc netem . If the target container does not support tc, Pumba will use sidekick to attach to the target container for control.

How to use Pumba?

You can usually pass a list of containers to Pumba, and you can simply write a regular expression to select the matching container. If you don't specify a container, Pumba will intervene on all running containers.

If you use the `–random` option, Pumba will select some random containers in the list of provided containers to interfere.

You can also control the chaos 混沌 you need to generate more finely by passing in some repetitive parameters, as well as duration parameters.

How to install Pumba?

# Download binary from https://github.com/gaia-adm/pumba/releasescurl -L https://github.com/alexei-led/pumba/releases/download/0.5.2/pumba_darwin_amd64 --# Linuxcurl -L https://github.com/alexei-led/pumba/releases/download/0.5.2/pumba_linux_amd64 --output /usr/local/bin/pumbachmod +x /usr/local/bin/pumba && pumba --help# Install with Homebrew (MacOS only)brew install pumba && pumba --help# Use Docker imagedocker run gaiaadm/pumba pumba --help

Pumba use examples?

1. You can check the help with --help :

# pumba help pumba --help # pumba kill help pumba kill --help # pumba netem delay help pumba netem delay --help

2. Randomly kill some Docker containers by ^test regular

# 在第一个terminal中运行7个测试容器,并什么都不做for i in {0..7}; do docker run -d --rm --name test$i alpine tail -f /dev/null; done # 然后运行一个名叫`skipme` 的容器docker run -d --rm --name skipme alpine tail -f /dev/null # 在另一个terminal 中查看当前运行的docker 容器watch docker ps -a # 回到第一个terminal中,然后每隔10s kill一个'test'开头的容器,并且忽略`skipme`容器pumba --random --interval 10s kill re2:^test # 你可以随时按下Ctrl-C 来停止Pumba

3. Add a `3000ms`(`+-50ms`) delay to the ping command for 20s and assign the model using `normal`

# 运行"ping" 容器在terminal 1中docker run -it --rm --name ping alpine ping 8.8.8.8 # 在termainal2中, 运行pumba netem delay 命令, 分配到"ping" 容器; 使用一个"tc" 辅助容器pumba netem --duration 20s --tc-image gaiadocker/iproute2 delay --time 3000 jitter 50 --distribution normal ping # pumba 将会在20s 后退出, 或者用Ctrl-C 退出

4. Simulate packet loss. To simulate packet loss we need to use three terminals and use the [iperf](https://iperf.fr/) tool to monitor the current network bandwidth.

In the first terminal, we run a `server` Docker container and then use ipref to monitor the dokcer, which will start a UDP server.

In the second terminal, start a container with iperf monitoring message sending, the container will send UDP packets to the server container. Then we run the pumba netem loss command in the third terminal to add a packet loss scenario to the container.

# 创建一个docker网络docker network create -d bridge testnet # Terminal 1 # 运行server 容器docker run -it --name server --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh" # shell inside server container: run a UDP Server listening on UDP port 5001 # 在进入交互命令行的Server容器中运行UDP服务,在5001端口监听sh$ iperf -s -u -i 1 # Terminal 2 # 运行client 容器docker run -it --name client --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh" # 在进入交互命令行的client容器中,发送UDP数据报到服务端,可以看到没有数据丢包sh$ iperf -c server -u # Terminal 1 # 我们可以看到服务端没有数据丢包# Terminal 3 # inject 20% packet loss into client container, for 1m # 往client容器注入20% 的数据丢包,持续一分钟pumba netem --duration 1m --tc-image gaiadocker/iproute2 loss --percent 20 client # Terminal 2 # 重新在客户端container 中发送数据报,可以看到20%的丢包sh$ iperf -c server -u

Third, Weave Network

Weave Network is a widely used, easy-to-use, simple container network solution that supports the interconnection of Docker containers between hosts. The current docker network connection can be monitored by weave-scope.

Weave enables docker containers to communicate across hosts and automatically discover each other by creating virtual networks.

With the weave network, a microservice architecture-based application of multiple containers can run anywhere: host, multi-host, cloud or data center.

Applications use the network as if the container were plugged into the same network switch, without the need to configure port mapping, connections, and so on.

In a weave network, the services provided by the application container can be exposed to the outside, regardless of where they run. Similarly, existing internal systems can accept requests from application containers regardless of where the container is running.

We use the weave network to be compatible with the network connections between the containers, and we can also observe the resource usage of each container. Weave-network and weave-scope work together to achieve network monitoring purposes.

Install weave-network and weave-scope

Weave-network is the docker's network plugin, usually installed with docker-compose.

curl -sSL https://get.docker.com/ | shapt-get install -yq python-pip build-essential python-devpip install docker-composecurl -L git.io/weave -o /usr/local/bin/weavechmod a+x /usr/local/bin/weave

Weave-scope is a network monitoring platform that is usually launched locally:

sudo curl -L git.io/scope -o /usr/local/bin/scopesudo chmod a+x /usr/local/bin/scope

Start the weave network, and scope monitoring

1. Start the weave scope:

scope launch

After booting, you can see the docker status of the current machine at http://localhost:4040.

2. Start the weave network

weave launch

Fourth, doker-compose and weave network

Usually the hyperchain cluster is a group of 4 nodes, which we can configure with the following docker-compose:

```yaml# docker-compose.ymlversion: "3"services: node1: image: hyperchain/hpc:latest ports: - "5001:5003" command: ['-n', '4', '-i', '1', '-p', '5000'] dns: 172.17.0.1 networks: - internal node2: image: hyperchain/hpc:latest ports: - "5002:5003" command: ['-n', '4', '-i', '2', '-p', '5000'] dns: 172.17.0.1 networks: - internal node3: image: hyperchain/hpc:latest ports: - "5003:5003" command: ['-n', '4', '-i', '3', '-p', '5000'] dns: 172.17.0.1 networks: - internal node4: image: hyperchain/hpc:latest ports: - "5004:5003" command: ['-n', '4', '-i', '4', '-p', '5000'] dns: 172.17.0.1 networks: - internalnetworks: internal: driver: bridge # 请注意,这里需要配置为网桥模式,否则无法进行网络干预driver: weavemesh # 如果使用多机集群,建议使用weavemesh```

To explain briefly, the four nodes respectively specify the corresponding ID and the number of nodes allowed to connect. The corresponding configuration file is automatically generated inside the container. At the same time, the connection service listening port of the node is `5000`, and all container requirements are specified as `node. ${i}`, because the container internally uses hostname to connect.

We need to pay attention to the fact that the internal network driver of the stand-alone cluster is **bridge**, and the following is a stand-alone cluster for demonstration.

Start the container cluster:

docker-compose up

We can see that the service is working.

At this point we can look at the state inside the weave-scope:

We can see that the four containers are connected to each other, which is in line with our expectations. A simple P2P network cluster has started up normally, then we try to use `pumba` to add network jitter.

Five, network jitter chaos test

Our network cluster has started normally, then we add some network Chaos to the node.

Network delay

Use the new terminal to run the following command:

pumba --random --interval 6s --log-level info netem --tc-image="gaiadocker/iproute2" --duration 5s delay --time 3000 --jitter 30 --correlation 20 re2:^docker

Use the above command to randomly specify the container starting with `docker` to accept the packet experiment and increase `3000ms+-50ms`.

Through testing, the system throughput capacity plummeted.

Network packet loss

– Run the following command in Terminal to perform packet loss simulation:

pumba --random --interval 6s --log-level info netem --tc-image="gaiadocker/iproute2" --duration 5s loss --percent 80 --correlation 20 re2:^docker

The above command simulates the 80% packet loss rate of a random node. We need to note that both the delay and the packet loss interference will affect the request initiated by the client.

Before turning on packet loss interference

After turning on packet loss (80%)

Note: There are a lot of extra interference in the tests on my PC (macbook pro 2015 early). The above data is for reference only and cannot be used as conclusion data.

Run the following command to fix the interference node:

pumba --random --log-level info netem --tc-image="gaiadocker/iproute2" --duration 10m loss --percent 60 --correlation 20 re2:^docker

After removing the `–interval` parameter, we can see that pumba applies packet loss control (60%) for `docker-compose_node4_1`, which is our node4.

We can see that the interference for `node2` is to interfere with all network requests for this container.

Request blocking for the client

We can also see that for higher durations, if the packet loss rate is high, it will completely affect the normal operation of the service.

You can see that the packet sent for node4 is unreachable, while the other nodes are normal.

Run the following command to block the network of node2 and node3 (100% 10min):

pumba --log-level info netem --tc-image="gaiadocker/iproute2" --duration 10m loss --percent 100 docker-compose_node3_1pumba --log-level info netem --tc-image="gaiadocker/iproute2" --duration 10m loss --percent 100 docker-compose_node2_1

We use this command to observe the state between the containers:

Containers cannot connect to node2/3, and node2/3 cannot access all nodes.

Run the following command to repeat the package generation:

pumba --random --interval 11s --log-level="info" netem --tc-image="gaiadocker/iproute2" --duration 10s duplicate percent 80 --correlation 20 re2:^docker

It can be observed that the duplicate packet was received by the 4 nodes at the same time:

We can also scramble the data packets (possibly multiple, possibly less), and 80% of the packets are descrambled by the specified node1 node for 10m:

pumba --log-level="info" netem --tc-image="gaiadocker/iproute2" --duration 10m corrupt percent 100 docker-compose_node1_1

We can get the following results:

Before the disturbance, according to each node receiving a normal operation of the package, after the disturbance, we can see that the number of packets has not changed, but the arrival time has changed.

Summary

We can see that in the docker+docker-compose combination, pumba can do a lot of Chaos-testing and the configuration is very flexible. It can perform noise including delay, packet loss, disturbance, and repeated packets. The pumba tool itself can also help us to perform stop, rm, pause and other operations on the docker container itself, which can simulate abnormal conditions such as service downtime. Pumba can help us discover some potential problems with software in complex network and physical scenarios.

However, in the process of using pumba, I also encountered some problems. For example, mac itself does not have the tc command, which requires the formulation of `–tc-image`, and if the tc image is used, if you specify `–interval `, then it will create a lot of `container`, this is a bug of pumba, the author also gave the project the relevant issue.

More chaos tests and pumba related content can refer to the project: [pumba] (https://github.com/alexei-led/pumba)

Seven, references

[https://codefresh.io/docker-tutorial/chaos_testing_docker/](https://codefresh.io/docker-tutorial/chaos_testing_docker/)

[https://github.com/alexei-led/pumba/blob/master/README.md](https://github.com/alexei-led/pumba/blob/master/README.md)

[weave network introduction] (https://blog.csdn.net/happyanger6/article/details/71104577)

[https://microservices-demo.github.io/deployment/docker-compose-weave.html](https://microservices-demo.github.io/deployment/docker-compose-weave.html)

[https://github.com/microservices-demo/microservices-demo/blob/master/deploy/docker-compose-weave/docker-compose.yml](https://github.com/microservices-demo/microservices- Demo/blob/master/deploy/docker-compose-weave/docker-compose.yml)

[https://www.weave.works/docs/net/latest/install/using-weave/#peer-connections]

Author: Chen right, interest chain technology Technology