Rust, Databend and the Cloud Warehouse (4) How Databend Community Does Testing [ Bohu's Blog ]

Databend is an open-source, cloud-native data warehouse built in Rust. It aims to provide lightning-fast elastic scaling and deliver an on-demand, pay-as-you-go Data Cloud experience.
Open source repo: https://github.com/datafuselabs/databend

Databend has been open source from day one. The testing system is also built on the open-source ecosystem, heavily using GitHub CI (for free), which has supported our rapid iteration for the past six months.

For an open-source database project, testability is the key to accelerating iteration. When a Pull Request (commonly called a Patch) goes from submission to merging into the main branch, reviewers typically focus on these questions:

Does it break functionality?
Does it affect distributed execution?
Are there cross-platform compilation issues?
Does it degrade performance?

This post walks through a Pull Request’s testing cycle — from creation to merging into the main branch — to see what tests Databend runs. We’ll address the four questions above so every Pull Request has quality assurance.

Unit Tests

Unit tests are the smallest testing units.

Every function we write should be independently testable. If the function has state dependencies, those states should be mockable.

In Databend, unit tests are placed in separate files, like x_test.rs:

#[test]
fn test_y() -> Result<()> {
   ... ...
}

Currently, Databend has 500+ unit tests. We’ve globally mocked some states to make it easier for developers to write test cases. This ensures functions execute as expected at the code level, catching and fixing problems early.

Databend’s unit tests run on both Ubuntu and MacOS (the two main systems Databend developers use).

Functional Tests

Passing unit tests doesn’t necessarily guarantee correctness, because functionality usually comes from multiple functions working together logically.

Functional tests are divided into Stateless and Stateful models. Stateless tests don’t require loading datasets, while Stateful tests need pre-loaded datasets. Let’s focus on the Stateless model.

Databend follows ClickHouse’s approach, using the numbers_mt table function for convenient Stateless testing.

For example, this slightly “complex” SQL:

1	SELECT number%3 as c1, number%2 as c2 FROM numbers_mt(10000) WHERE number > 2 GROUP BY number%3, number%2 ORDER BY c1, c2;

It filters data by condition, then performs GROUP BY aggregation, and finally sorts. This SQL execution involves many functions, so we need a convenient mechanism to ensure the combined functionality is correct.

How does Databend do it?

We first define a set of SQLs to test, x.sql:

1	SELECT number%3 as c1, number%2 as c2 FROM numbers_mt(10000) WHERE number > 2 GROUP BY number%3, number%2 ORDER BY c1, c2;

Then we define the expected result set, x.result:

During functional testing, Databend runs the x.sql file, then compares the result set with the x.result file. If there’s a discrepancy, it errors out and provides hints.

Since Databend has distributed MPP capabilities, functional tests run in both Standalone and Cluster modes for regression testing to ensure patches don’t break functionality.

Performance Tests

After unit and functional tests pass, we also care about an important metric: does this patch degrade performance? Or if it’s a performance optimization patch, how much does it improve?

To answer this question, Databend uses quantifiable numbers. We just reply in the Pull Request: /run-perf master. CI automatically compiles the current branch, runs performance tests, compares with master, and generates a performance comparison report:

This way, reviewers can clearly see the patch’s performance impact from the report, ensuring every patch’s performance is under control.

Compilation Tests

Databend aims to build a cross-platform Cloud Warehouse, so every patch must compile and work properly on these platforms:

- {os: ubuntu-latest, toolchain: stable, target: x86_64-unknown-linux-gnu, cross: false}
- {os: ubuntu-latest, toolchain: stable, target: aarch64-unknown-linux-gnu, cross: true}
- {os: ubuntu-latest, toolchain: stable, target: arm-unknown-linux-gnueabi, cross: true}
- {os: ubuntu-latest, toolchain: stable, target: armv7-unknown-linux-gnueabihf, cross: true}
- {os: macos-latest, toolchain: stable, target: x86_64-apple-darwin, cross: false}

After this CI completes, we can confirm the current patch has no impact on cross-platform compilation.

Summary

Only after all the above CI tests pass is a Pull Request considered qualified and ready to merge into the main branch.

Without these automated testing CIs as safeguards, every issue would consume massive amounts of reviewer energy for verification. This model wouldn’t be sustainable and would seriously slow down product iteration and community pace.

From day one, Databend has worked hard to build a testable system. We’ve developed test-infra and the fusebot for community collaboration to accelerate Databend’s product iteration and deliver a usable Alpha version as soon as possible.

Unit Tests

Functional Tests

Performance Tests

Compilation Tests

Summary

References