Supplementary material for AUGUST.

Two-sample testing is a fundamental problem in statistics. While many powerful nonparametric methods exist for both the univariate and multivariate context, it is comparatively less common to see a framework for determining which data features lead to rejection of the null. In this paper, we propose a new nonparametric two-sample test named AUGUST, which incorporates a framework for interpretation while maintaining power comparable to existing methods. AUGUST tests for inequality in distribution up to a predetermined resolution using symmetry statistics from binary expansion. Designed for univariate and low to moderate-dimensional multivariate data, this construction allows us to understand distributional differences as a combination of fundamental orthogonal signals. Asymptotic theory for the test statistic facilitates p-value computation and power analysis, and an efficient algorithm enables computation on large data sets. In empirical studies, we show that our test has power comparable to that of popular existing methods, as well as greater power in some circumstances. We illustrate the interpretability of our method using NBA shooting data.

Two-sample tests are one of the most frequently used methods for statistical inference. While rooted in classical statistics, the two-sample problem is relevant to numerous cutting-edge applications, including high-energy physics [

We begin with two samples

While we face no shortage of effective two-sample tests, certain factors may hinder their real-world applicability. For one, we find some nonparametric tests to be more parsimonious than others against the range of potential alternatives. Relatively speaking, a method may excel at detecting location and scale shifts but struggle to catch bimodality when mean and variance are held constant, as one example. We explore this phenomenon in Section

Furthermore, many tests offer non-transparent rejections of the null hypothesis. While data visualizations and summary statistics offer some degree of explanation for a test, such analyses do not quantify the contribution of various data features to the test’s rejection. For multivariate tests, this problem is compounded, as distributional alternatives may easily exceed human intuition for higher dimensions.

Here, we formulate a new nonparametric two-sample test called AUGUST, an abbreviation of Augmented CDF for Uniform Statistic Transformation. Our method explicitly tests for multiple orthogonal sources of distributional inequality up to a predetermined resolution

Some well-known rank-based tests are designed for the univariate context, including [

For nonparametric multivariate methods, the range of approaches is quite broad. Tests based on geometric graphs, including [

As for interpretable methods, one line of work [

Regarding methodological relatives, the use of subsampling has been considered for the two-sample location and scale problems [

We begin by introducing our procedure in the context of univariate data. Given independent samples

To begin, imagine a one-sample setting where we test

Returning to the two-sample setting, the same intuition holds true: we might construct transformed variables that are nearly uniform in

Given the fact that the transformed variables

We can think of

One possible choice of test statistic is the quantity

For our testing purposes, recall that we are only interested in the uniformity of

Let

To collect information about every

Importantly, the cell probabilities in

Theorem

To combine information on all forms of asymmetry, we propose the statistic

The negative sign in

Informally, we could say the following: if

One may use entries of

Before performing the test based on

For

Higher depths

Below, Algorithms

Augmented CDF

AUGUST

The two samples

However, by first sorting the concatenated

This improved algorithm, named AUGUST+, is stated explicitly in the supplementary materials. The time complexity is asymptotically equivalent to that of an efficient sorting algorithm applied to the concatenated data. The constant factor in this comparison depends on the resolution

With an appropriate transformation, we can extend the univariate test to the problem of multivariate two-sample testing. For the purposes of this subsection, let

Given a mean

From the univariate method, recall that vectors of symmetry statistics quantify regions of imbalance between the univariate samples, as imbalances in distribution appear as non-uniformity in the vector of cell probabilities. Under a Mahalanobis distance transformation, cells in the domain of

As in the univariate case, it is desirable for the test statistic to be invariant to the transposition of

In practice, we calculate

To simplify

Because the exact form of Σ is notation-intensive, we state it in the supplementary materials. In light of the above result, given distributions

Building on these ideas, the limit

For convenience with inverse functions, we assume the CDFs

The importance of this theorem is as follows. In Section

From this perspective, fixing

We conclude by remarking that as a heuristic, this discussion is relevant to interpreting the multivariate version of AUGUST, whose symmetry statistics measure imbalance in the transformed collections

Here, we compare AUGUST to a sampling of other nonparametric two-sample tests: Kolmogorov–Smirnov distance [

The first two plots of Fig.

Univariate comparison of power between AUGUST in red, Kolmogorov–Smirnov distance in black, Wasserstein distance in green, DTS in blue, and energy distance in yellow. Our method performs comparably to existing approaches, with superior power in some circumstances.

For the location alternatives, the power of each method depends on the shape of the distribution. DTS, Wasserstein, and energy distance tests perform slightly better than ours for normal and beta distributions, and ours in turn outperforms Kolmogorov–Smirnov. In contrast, for a Laplace location shift, Kolmogorov–Smirnov outperforms every test, with our test in second place and DTS last. For the Laplace scale family, Kolmogorov–Smirnov performs poorly, with DTS and our test leading. DTS has the edge on the gamma skewness family, while we outperform all other tests at detecting normal versus symmetric normal mixture.

As expected, no single test performs best in all situations. Even for simple alternatives such as location families, the precise shape of the distribution is highly influential as to the tests’ relative performance. In fact, the performance rankings of DTS, Wasserstein, energy distance, and Kolmogorov–Smirnov in the Laplace location trials are exactly reversed compared to the normal location trials. We theorize that because the symmetry statistics

In Fig.

We consider a variety of alternatives. In order:

In Fig.

Overall, our test is robust against a wide range of possible alternatives, and it has particularly high performance against a scale alternative, where it outperforms all other methods considered. We theorize that, in part, this is because some of the other methods rely heavily on interpoint distances. The scale alternative does not result in good separation between

In the supplementary materials, we include additional comparisons with

Multivariate comparison of power between AUGUST in red, [

We demonstrate the interpretability of AUGUST using 2015–2016 NBA play-by-play data.

Source:

Data were split according to shots versus misses and early game versus late game. Four separate AUGUST tests at a depth of

To demonstrate interpretability, we provide visualizations in Fig.

Each plot in Fig.

Greatest asymmetries in NBA data. Successful shots are closer to the net than missed shots and come from more extreme angles. Shots in the early game come from a more intermediate distance than in the late game, as well as from more extreme angles.

An important future direction involves refining the multivariate approach. The simulations of Section

The interpretability of our two-sample test also sheds light on transformations of data from one distribution to the other. This problem is a fundamental subject in transportation theory [

The authors thank the editor, associate editor, and reviewers for their helpful feedback. The authors additionally thank Shankar Bhamidi, Hao Chen, Jan Hannig, Michael Kosorok, Xiao-Li Meng, and Richard Smith for valuable comments and suggestions.