TCP/IP network stack is irreplaceable for Web services in datacenter front-end servers, and the demand for which is growing rapidly for emerging high concurrency network service applications (including Internet, Internet of Things, mobile Internet, etc.) especially. Therefore, C10M problem was proposed in industry, that is, how to enable a single commercial server to simultaneously handle millions of clients, and support tens of millions of concurrent connections. The existing network stack schemes often face the dilemma between high concurrency and low tail latency on application service. Our research group break this dilemma via a flexible architectural design QStack, as a solution to support C10 problem. This report will talk about QStack and 3 test tools (including MCC, LightShaper, HCMonitor), all of which have been open source. As this architecture shown, these tools can work together to test high concurrency. They also can be used separately with other tools. The only problem is that, when they work with other tools, some functions or performance, like concurrency, would be compromised.
Also, we propose an abstract benchmark MCCBench (published in Bench'22) for C10M, defining the methodology of 1) load generation, 2) service function and 3) service performance evaluation, and give a C10M test case MCCBench-IoT, designing from a real scenario of some well-known IoT Company based on our open-source tools. MCCBench-IoT has completed C10M test in one server and 100 millions of concurrency test in multiple servers.
Please feel free to contact us at firstname.lastname@example.org, if you have any question. We also sincerely invite people interested to communicate, study and work together.
QStack is a user-space TCP/IP network stack that enables high concurrent network service up to 10 million in a single server with good user experience (i.e., low tail latency).
Full-datapath zero copy and full-stack lock free
low overhead processing in user space
Application definable full-datapath priority
low tail latency with request feature labels to guide the cross-layers prioritization in low overhead
High CPU efficiency and high concurrency by adjusting CPU resources used by stack adaptively from as low as one core to the whole server for fluctuant datacenter workload. while scaling to the whole server, C10M is support
MCC is a distributed load generator to simulate massive clients that enables high concurrency up to 10 million in one server.
Kernel-bypass with lightweight user-level stack (mTCP)
Scalability in multi-core and multi-server systems
Shared-nothing architecture, distributed with multi-threaded model
LightShaper, a pure software network traffic transformation tool in low cost based on dpdk, as an optional auxiliary to the network load generator.
Waveform shaping, speed control and simulate WAN traffic characteristics (e.g., OoO, high latency, packet drop) for network load
Filling placeholder packet to get microsecond accuracy in packet interval control
Decouple traffic feature management from load generator for independent regulation
HCMonitor, a full traffic monitor system for that enables high concurrency up to 10 million TCP concurrent connections.
Accurate latency based on full traffic
Compute server-side latency, excluding the queuing delay of the client, etc
Full traffic statistics instead of sampling, and accurate to each request latency
High concurrency monitoring
Lock free, pipelined process with multiple thread based on DPDK, and display results (including latency CDF distribution, concurrency) in real time
Transparent to network services
Utilize the Switch mirror