Algorithm-agnostic significance testing in supervised learning with multimodal data (2025)

  • Journal List
  • Brief Bioinform
  • v.25(6); 2024 Nov
  • PMC11424510

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Algorithm-agnostic significance testing in supervised learning with multimodal data (1)

Link to Publisher's site

Brief Bioinform. 2024 Nov; 25(6): bbae475.

Published online 2024 Sep 25. doi:10.1093/bib/bbae475

PMCID: PMC11424510

PMID: 39323092

Lucas KookAlgorithm-agnostic significance testing in supervised learning with multimodal data (2) and Anton Rask Lundborg

Author information Article notes Copyright and License information PMC Disclaimer

Abstract

Motivation

Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests.

Results

We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing.

Availability and implementation

COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https://github.com/LucasKook/comets. All data sets used in this work are openly available.

Keywords: Conditional independence, Generalised Covariance Measure, multimodal data, Projected Covariance Measure, significance testin

Introduction

A fundamental challenge of modern bioinformatics is dealing with the increasingly multimodal nature of data [1–3]. The task of supervised learning, that is, the problem of predicting a response variable Algorithm-agnostic significance testing in supervised learning with multimodal data (3) from features Algorithm-agnostic significance testing in supervised learning with multimodal data (4), has received considerable attention in recent years resulting in a plethora of algorithms for a wide range of settings that permit prediction using several data modalities simultaneously [4]. With the advent of deep learning, even non-tabular data modalities, such as text or image data, can be included without requiring manual feature engineering [5]. Methods such as these are highly regularized (if trained correctly) which minimizes the statistical price of adding too many irrelevant variables. However, continuing to collect features or modalities that do not contribute to the predictiveness of a model still has an economic cost and, perhaps more importantly, it is of scientific interest to determine whether a particular feature or modality Algorithm-agnostic significance testing in supervised learning with multimodal data (5) adds predictive power in the presence of additional features or modalities Algorithm-agnostic significance testing in supervised learning with multimodal data (6) [6].

The problem of determining which features or modalities are significantly associated with the response is usually addressed by means of conditional independence testing. The response Algorithm-agnostic significance testing in supervised learning with multimodal data (7) is independent of the modality Algorithm-agnostic significance testing in supervised learning with multimodal data (8) given further modalities Algorithm-agnostic significance testing in supervised learning with multimodal data (9) if the probability that Algorithm-agnostic significance testing in supervised learning with multimodal data (10) takes any particular value knowing both Algorithm-agnostic significance testing in supervised learning with multimodal data (11) and Algorithm-agnostic significance testing in supervised learning with multimodal data (12) is the same as the probability knowing just Algorithm-agnostic significance testing in supervised learning with multimodal data (13). In particular, Algorithm-agnostic significance testing in supervised learning with multimodal data (14) does not help in predicting Algorithm-agnostic significance testing in supervised learning with multimodal data (15) if Algorithm-agnostic significance testing in supervised learning with multimodal data (16) is taken into account already (see Background on conditional independence for a more precise definition).

Traditional variable significance tests start by posing a parametric relationship between the response Algorithm-agnostic significance testing in supervised learning with multimodal data (17) and features Algorithm-agnostic significance testing in supervised learning with multimodal data (18) and Algorithm-agnostic significance testing in supervised learning with multimodal data (19), for instance, the Wald test in a generalized linear model. When Algorithm-agnostic significance testing in supervised learning with multimodal data (20) or Algorithm-agnostic significance testing in supervised learning with multimodal data (21) are complicated data modalities, it is seldom possible to write down a realistic model for their relationship with Algorithm-agnostic significance testing in supervised learning with multimodal data (22); thus a different approach is required. Furthermore, even when models can be explicitly parametrized, it is not clear that the resulting tests remain valid when the model is not specified correctly [7].

More recently, kernel-based conditional independence tests have been proposed which use a characterization of conditional independence by means of kernel embeddings to construct tests [8, 9]. However, these tests are difficult to calibrate in practice and rely intimately on kernel ridge regression. Several alternative algorithm-agnostic tests have been developed under the so-called ‘Model-X’ assumption where one supposes that a model is known (or at least estimable to high precision) for the full distribution of Algorithm-agnostic significance testing in supervised learning with multimodal data (23), given Algorithm-agnostic significance testing in supervised learning with multimodal data (24) [10, 11]. Given the difficulty of learning conditional distributions such an assumption is rarely tenable. Algorithm-agnostic variable importance measures have also been developed with statistically optimal estimators [12, 13]. However, efficient estimation of an importance measure does not necessarily translate to an optimal test to distinguish between conditional dependence and independence [see, e.g. the introduction of 14].

In this paper, we describe a family of significance tests referred to collectively as COvariance MEasure Tests (COMETs) that are algorithm-agnostic and valid (in the sense of controlling the probability of false positives) as long as the algorithms employed are sufficiently predictive. We will primarily focus on the Generalised Covariance Measure (GCM) test [15], which we think of as an ‘all-purpose’ test that should be well-behaved in most scenarios, and the more complicated Projected Covariance Measure (PCM) test, which is more flexible but may require a more careful choice of algorithms. Figure 1 gives an overview of the proposed algorithm-agnostic significance testing framework based on COMETs and the types of applications that are presented in this manuscript. The main contribution of this work is to illustrate the use of the GCM and PCM test in the context of multimodal, non-tabular data.

Open in a separate window

Figure 1

Overview of the proposed algorithm-agnostic significance testing framework for multimodal data using COMETs. Variable significance: differential gene expression can be assessed in presence of the potentially high-dimensional/non-tabular confounder Algorithm-agnostic significance testing in supervised learning with multimodal data (26). Modality selection: entire modalities can be subjected to significance testing, which lends itself to modality selection in multi-omics applications. MSPE: mean squared prediction error.

Methods

In this section, we first provide some background on conditional independence. We then move on to describe the computation of the GCM and PCM tests in addition to the assumptions required for their validity. Finally, we describe the datasets that we will analyze in Results.

Background on conditional independence

For a real-valued response Algorithm-agnostic significance testing in supervised learning with multimodal data (27) and features Algorithm-agnostic significance testing in supervised learning with multimodal data (28) and Algorithm-agnostic significance testing in supervised learning with multimodal data (29), we say that Algorithm-agnostic significance testing in supervised learning with multimodal data (30) is conditionally independent of Algorithm-agnostic significance testing in supervised learning with multimodal data (31), given Algorithm-agnostic significance testing in supervised learning with multimodal data (32) and write Algorithm-agnostic significance testing in supervised learning with multimodal data (33) if

Algorithm-agnostic significance testing in supervised learning with multimodal data (34)

(1)

That is, for any transformation Algorithm-agnostic significance testing in supervised learning with multimodal data (35) of Algorithm-agnostic significance testing in supervised learning with multimodal data (36), the best predictor (in a mean-squared error sense) of Algorithm-agnostic significance testing in supervised learning with multimodal data (37) using both Algorithm-agnostic significance testing in supervised learning with multimodal data (38) and Algorithm-agnostic significance testing in supervised learning with multimodal data (39) is equal to the best predictor using just Algorithm-agnostic significance testing in supervised learning with multimodal data (40).(An alternative characterization, when Algorithm-agnostic significance testing in supervised learning with multimodal data (41) has a conditional density given Algorithm-agnostic significance testing in supervised learning with multimodal data (42) and Algorithm-agnostic significance testing in supervised learning with multimodal data (43) denoted by Algorithm-agnostic significance testing in supervised learning with multimodal data (44), is given by: Algorithm-agnostic significance testing in supervised learning with multimodal data (45) if and only if Algorithm-agnostic significance testing in supervised learning with multimodal data (46) is independent of Algorithm-agnostic significance testing in supervised learning with multimodal data (47), given Algorithm-agnostic significance testing in supervised learning with multimodal data (48).).

A helpful starting point for the construction of a conditional independence test is to consider the product of a population residual from a Algorithm-agnostic significance testing in supervised learning with multimodal data (49) on Algorithm-agnostic significance testing in supervised learning with multimodal data (50) regression Algorithm-agnostic significance testing in supervised learning with multimodal data (51) and, for now considering a one-dimensional Algorithm-agnostic significance testing in supervised learning with multimodal data (52), from an Algorithm-agnostic significance testing in supervised learning with multimodal data (53) on Algorithm-agnostic significance testing in supervised learning with multimodal data (54) regression Algorithm-agnostic significance testing in supervised learning with multimodal data (55). As these are population residuals, Algorithm-agnostic significance testing in supervised learning with multimodal data (56) is no longer helpful in predicting their values, so Algorithm-agnostic significance testing in supervised learning with multimodal data (57) and similarly Algorithm-agnostic significance testing in supervised learning with multimodal data (58). When Algorithm-agnostic significance testing in supervised learning with multimodal data (59), we can say more: the product of the residuals is also mean zero since

Algorithm-agnostic significance testing in supervised learning with multimodal data (60)

(2)

where the second equality uses that Algorithm-agnostic significance testing in supervised learning with multimodal data (61) is perfectly predicted using Algorithm-agnostic significance testing in supervised learning with multimodal data (62) and Algorithm-agnostic significance testing in supervised learning with multimodal data (63), the third equality uses (1) with Algorithm-agnostic significance testing in supervised learning with multimodal data (64) and the final equality uses Algorithm-agnostic significance testing in supervised learning with multimodal data (65). The GCM test is based on testing whether Algorithm-agnostic significance testing in supervised learning with multimodal data (66) and we will describe the details of how to compute it in Covariance measure tests. For the GCM test to perform well, it is important to determine when we can expect Algorithm-agnostic significance testing in supervised learning with multimodal data (67) to be non-zero under conditional dependence. When Algorithm-agnostic significance testing in supervised learning with multimodal data (68) follows a partially linear model given Algorithm-agnostic significance testing in supervised learning with multimodal data (69) and Algorithm-agnostic significance testing in supervised learning with multimodal data (70), that is, Algorithm-agnostic significance testing in supervised learning with multimodal data (71) for some function Algorithm-agnostic significance testing in supervised learning with multimodal data (72), then Algorithm-agnostic significance testing in supervised learning with multimodal data (73) exactly when Algorithm-agnostic significance testing in supervised learning with multimodal data (74) and the magnitude of Algorithm-agnostic significance testing in supervised learning with multimodal data (75) is proportional to Algorithm-agnostic significance testing in supervised learning with multimodal data (76). This includes as a special case the linear model for Algorithm-agnostic significance testing in supervised learning with multimodal data (77), given Algorithm-agnostic significance testing in supervised learning with multimodal data (78) and Algorithm-agnostic significance testing in supervised learning with multimodal data (79). There is a natural generalization of (2) to the case where Algorithm-agnostic significance testing in supervised learning with multimodal data (80) is a Algorithm-agnostic significance testing in supervised learning with multimodal data (81)-dimensional vector, where the equation is interpreted component-wise in Algorithm-agnostic significance testing in supervised learning with multimodal data (82). Although the GCM is also defined in these settings, computing the test involves many regressions when Algorithm-agnostic significance testing in supervised learning with multimodal data (83) is high-dimensional, which can be impractical (see Comparison of the GCM and PCM tests).

Unfortunately, it is not difficult to come up with examples where Algorithm-agnostic significance testing in supervised learning with multimodal data (84) but Algorithm-agnostic significance testing in supervised learning with multimodal data (85). For instance, if Algorithm-agnostic significance testing in supervised learning with multimodal data (86) and Algorithm-agnostic significance testing in supervised learning with multimodal data (87) are independent and standard normally distributed and Algorithm-agnostic significance testing in supervised learning with multimodal data (88), then Algorithm-agnostic significance testing in supervised learning with multimodal data (89) (since Algorithm-agnostic significance testing in supervised learning with multimodal data (90) carries no information about Algorithm-agnostic significance testing in supervised learning with multimodal data (91) so the best predictor is just the mean of Algorithm-agnostic significance testing in supervised learning with multimodal data (92)); hence,

Algorithm-agnostic significance testing in supervised learning with multimodal data (93)

using that Algorithm-agnostic significance testing in supervised learning with multimodal data (94) for a standard normal variable. A more elaborate example is given in Fig. 2 (left and middle panel) and even more examples exist when Algorithm-agnostic significance testing in supervised learning with multimodal data (95) and Algorithm-agnostic significance testing in supervised learning with multimodal data (96) are dependent (see [14], Section 6 and [16], Section 3.1.2). We now describe a test that can detect such dependencies.

Open in a separate window

Figure 2

Illustration of the GCM and PCM test under the alternative that Algorithm-agnostic significance testing in supervised learning with multimodal data (98) is not conditionally independent of Algorithm-agnostic significance testing in supervised learning with multimodal data (99), given Algorithm-agnostic significance testing in supervised learning with multimodal data (100), where Algorithm-agnostic significance testing in supervised learning with multimodal data (101) and Algorithm-agnostic significance testing in supervised learning with multimodal data (102), Algorithm-agnostic significance testing in supervised learning with multimodal data (103). The GCM test first computes the residual for the regression of Algorithm-agnostic significance testing in supervised learning with multimodal data (104) on Algorithm-agnostic significance testing in supervised learning with multimodal data (105), which shows no correlation with Algorithm-agnostic significance testing in supervised learning with multimodal data (106). Thus, the GCM fails to reject (correlation coefficient Algorithm-agnostic significance testing in supervised learning with multimodal data (107)). The PCM, in addition, learns the optimal transformation of Algorithm-agnostic significance testing in supervised learning with multimodal data (108) (depending on Algorithm-agnostic significance testing in supervised learning with multimodal data (109)) to test conditional mean independence of Algorithm-agnostic significance testing in supervised learning with multimodal data (110) and Algorithm-agnostic significance testing in supervised learning with multimodal data (111), given Algorithm-agnostic significance testing in supervised learning with multimodal data (112). Thus, in this example, the PCM test correctly rejects (Algorithm-agnostic significance testing in supervised learning with multimodal data (113)). Although the residuals in the second panel are clearly not independent, it is not valid to conclude conditional dependence from rejecting an independence test here [see 15, Example 1].

A more ambitious target is to detect whenever an arbitrary (e.g. non-tabular) Algorithm-agnostic significance testing in supervised learning with multimodal data (114) is helpful for the prediction of Algorithm-agnostic significance testing in supervised learning with multimodal data (115) in the presence of Algorithm-agnostic significance testing in supervised learning with multimodal data (116) measured in terms of mean-squared error. To achieve this goal, we can use the fact, derived in the same way as (2), that

Algorithm-agnostic significance testing in supervised learning with multimodal data (117)

whenever Algorithm-agnostic significance testing in supervised learning with multimodal data (118). The GCM targets the quantity involving the function Algorithm-agnostic significance testing in supervised learning with multimodal data (119). However, by instead using Algorithm-agnostic significance testing in supervised learning with multimodal data (120) (which depends on the joint distribution of Algorithm-agnostic significance testing in supervised learning with multimodal data (121) and Algorithm-agnostic significance testing in supervised learning with multimodal data (122)), we obtain that

Algorithm-agnostic significance testing in supervised learning with multimodal data (123)

(3)

This quantity is strictly greater than Algorithm-agnostic significance testing in supervised learning with multimodal data (124) if and only if Algorithm-agnostic significance testing in supervised learning with multimodal data (125) is helpful for the prediction of Algorithm-agnostic significance testing in supervised learning with multimodal data (126) in the presence of Algorithm-agnostic significance testing in supervised learning with multimodal data (127). The PCM test is based on testing whether Algorithm-agnostic significance testing in supervised learning with multimodal data (128) and we will describe the details of how to compute it in Covariance measure tests. In fact, the PCM is based on an alternative Algorithm-agnostic significance testing in supervised learning with multimodal data (129), given by Algorithm-agnostic significance testing in supervised learning with multimodal data (130) that turns out to result in a more powerful test [see Fig. 2 and 14, Section 1.1]. An added benefit of tests targeting Algorithm-agnostic significance testing in supervised learning with multimodal data (131) is that no regressions are needed with Algorithm-agnostic significance testing in supervised learning with multimodal data (132) as the response, which can vastly reduce the computational burden when compared to tests that target Algorithm-agnostic significance testing in supervised learning with multimodal data (133).

The targets mentioned above rely intimately on population quantities that are unknown and hence need to be estimated when computing tests in practice. To ensure that the estimation errors do not interfere with the performance of the tests, we need to be able to learn the functions to a sufficient degree of accuracy. These requirements put restrictions on when the GCM and PCM are valid tests but such restrictions are not unique to these tests. In fact, unless Algorithm-agnostic significance testing in supervised learning with multimodal data (134) is discrete, it is impossible to construct an assumption-free conditional independence test that simultaneously controls the probability of false rejections and is able to detect dependence [15, 17]. This result implies that additional assumptions need to be imposed to ensure the feasibility of testing for conditional independence.

Covariance measure tests

We now describe the specifics of computing the GCM and the PCM. For the remainder of this section, we assume that we have a dataset consisting of Algorithm-agnostic significance testing in supervised learning with multimodal data (135) independent observations of a real-valued response Algorithm-agnostic significance testing in supervised learning with multimodal data (136) and some additional features or modalities Algorithm-agnostic significance testing in supervised learning with multimodal data (137) and Algorithm-agnostic significance testing in supervised learning with multimodal data (138).

GCM test

The GCM test is based on (2) but to compute the test in practice, we need to form an empirical version of the equation. For simplicity, we consider, for now, Algorithm-agnostic significance testing in supervised learning with multimodal data (139). Let Algorithm-agnostic significance testing in supervised learning with multimodal data (140) denote the residual for the Algorithm-agnostic significance testing in supervised learning with multimodal data (141)th observation from regressing Algorithm-agnostic significance testing in supervised learning with multimodal data (142) on Algorithm-agnostic significance testing in supervised learning with multimodal data (143) and similarly Algorithm-agnostic significance testing in supervised learning with multimodal data (144) from regressing Algorithm-agnostic significance testing in supervised learning with multimodal data (145) on Algorithm-agnostic significance testing in supervised learning with multimodal data (146). We now test Algorithm-agnostic significance testing in supervised learning with multimodal data (147) by comparing

Algorithm-agnostic significance testing in supervised learning with multimodal data (148)

(4)

to a Algorithm-agnostic significance testing in supervised learning with multimodal data (149) distribution. The term inside the square in the numerator is Algorithm-agnostic significance testing in supervised learning with multimodal data (150) times an estimate of (2) while the denominator standardizes the variance of the test statistic. The test statistic in (4) is approximately Algorithm-agnostic significance testing in supervised learning with multimodal data (151) for large enough sample sizes if the regression methods employed are sufficiently predictive and Algorithm-agnostic significance testing in supervised learning with multimodal data (152) [15, Theorem 6]. Note that the procedure above did not use anything special about Algorithm-agnostic significance testing in supervised learning with multimodal data (153) other than the existence of a regression method that can approximate the conditional expectations of Algorithm-agnostic significance testing in supervised learning with multimodal data (154) and Algorithm-agnostic significance testing in supervised learning with multimodal data (155), given Algorithm-agnostic significance testing in supervised learning with multimodal data (156). The computations above naturally generalize to settings where Algorithm-agnostic significance testing in supervised learning with multimodal data (157) and we summarize the general procedure in Algorithm 1.

Algorithm-agnostic significance testing in supervised learning with multimodal data (158)

PCM test

The computation of the PCM test is more challenging than the computation of the GCM test since the PCM requires learning Algorithm-agnostic significance testing in supervised learning with multimodal data (159) to be able to estimate Algorithm-agnostic significance testing in supervised learning with multimodal data (160). Furthermore, Algorithm-agnostic significance testing in supervised learning with multimodal data (161) cannot be learned on the same observations that are used to compute the test statistic as this would potentially result in dependence between the residuals constituting the test statistic and thus in many false rejections when Algorithm-agnostic significance testing in supervised learning with multimodal data (162).

The first step when computing the test statistic of the PCM test is therefore to split the dataset in two halves Algorithm-agnostic significance testing in supervised learning with multimodal data (163) and Algorithm-agnostic significance testing in supervised learning with multimodal data (164) of equal size (for simplicity, we assume that we have Algorithm-agnostic significance testing in supervised learning with multimodal data (165) observations, so both Algorithm-agnostic significance testing in supervised learning with multimodal data (166) and Algorithm-agnostic significance testing in supervised learning with multimodal data (167) are of size Algorithm-agnostic significance testing in supervised learning with multimodal data (168)). On Algorithm-agnostic significance testing in supervised learning with multimodal data (169), we compute an estimate Algorithm-agnostic significance testing in supervised learning with multimodal data (170) of Algorithm-agnostic significance testing in supervised learning with multimodal data (171) by first regressing Algorithm-agnostic significance testing in supervised learning with multimodal data (172) on Algorithm-agnostic significance testing in supervised learning with multimodal data (173) and Algorithm-agnostic significance testing in supervised learning with multimodal data (174) yielding an estimate Algorithm-agnostic significance testing in supervised learning with multimodal data (175) and regressing Algorithm-agnostic significance testing in supervised learning with multimodal data (176) on Algorithm-agnostic significance testing in supervised learning with multimodal data (177) yielding an estimate Algorithm-agnostic significance testing in supervised learning with multimodal data (178). We then regress Algorithm-agnostic significance testing in supervised learning with multimodal data (179) on Algorithm-agnostic significance testing in supervised learning with multimodal data (180) on Algorithm-agnostic significance testing in supervised learning with multimodal data (181) yielding an estimate of Algorithm-agnostic significance testing in supervised learning with multimodal data (182), which we denote Algorithm-agnostic significance testing in supervised learning with multimodal data (183). We now set Algorithm-agnostic significance testing in supervised learning with multimodal data (184) and, working on Algorithm-agnostic significance testing in supervised learning with multimodal data (185), we regress Algorithm-agnostic significance testing in supervised learning with multimodal data (186) on Algorithm-agnostic significance testing in supervised learning with multimodal data (187) yielding a residual for the Algorithm-agnostic significance testing in supervised learning with multimodal data (188)th observation Algorithm-agnostic significance testing in supervised learning with multimodal data (189) and we regress Algorithm-agnostic significance testing in supervised learning with multimodal data (190) on Algorithm-agnostic significance testing in supervised learning with multimodal data (191) yielding a residual Algorithm-agnostic significance testing in supervised learning with multimodal data (192). Finally, we compute

Algorithm-agnostic significance testing in supervised learning with multimodal data (193)

(5)

and reject the null by comparing to a standard normal distribution. In fact, as the target of Algorithm-agnostic significance testing in supervised learning with multimodal data (194) in (3) is positive under conditional dependence, we perform a one-sided test which rejects when Algorithm-agnostic significance testing in supervised learning with multimodal data (195) is large. The test statistic in (5) is approximately standard Gaussian if the regression methods employed for the Algorithm-agnostic significance testing in supervised learning with multimodal data (196) on Algorithm-agnostic significance testing in supervised learning with multimodal data (197) and Algorithm-agnostic significance testing in supervised learning with multimodal data (198) on Algorithm-agnostic significance testing in supervised learning with multimodal data (199) are sufficiently predictive, the estimates Algorithm-agnostic significance testing in supervised learning with multimodal data (200) are not too complicated and Algorithm-agnostic significance testing in supervised learning with multimodal data (201) [14, Theorem 4]. The test is powerful against alternatives where Algorithm-agnostic significance testing in supervised learning with multimodal data (202) is correlated with the true Algorithm-agnostic significance testing in supervised learning with multimodal data (203) and the aforementioned regression methods remain powerful [14, Theorem 5]. We summarize the procedure in Algorithm 2 below. (In this description and in Algorithm 2, we have omitted a few minor corrections to the estimation of Algorithm-agnostic significance testing in supervised learning with multimodal data (204) that are done for numerical stability or as finite sample corrections. The full version of the algorithm with these additions is given in [14, Algorithm 1].)

Algorithm-agnostic significance testing in supervised learning with multimodal data (205)

Due to the sample splitting, the Algorithm-agnostic significance testing in supervised learning with multimodal data (206)-value of the PCM is a random quantity. We can compute the PCM on several different splits to produce multiple Algorithm-agnostic significance testing in supervised learning with multimodal data (207)-values that can be dealt with using standard corrections for multiple testing. In practice, we follow the recommendation of the original paper and compute the Algorithm-agnostic significance testing in supervised learning with multimodal data (208)-value as in step 9 of Algorithm 2 but instead using the average of the test statistics from the different splits. We denote the number of different splits by Algorithm-agnostic significance testing in supervised learning with multimodal data (209) and use 5–10 in the applications. The resulting test should be conservative that results in a power loss; however, the test averaged from different splits should still be more powerful than a single application of the PCM due to more efficient use of the data. If one desires a perfectly calibrated Algorithm-agnostic significance testing in supervised learning with multimodal data (210)-value from multiple splits, it is possible to use the method in [18] but we do not pursue this further here.

Comparison of the GCM and PCM tests

The GCM and PCM tests not only differ in terms of their target quantities, but also regarding computational aspects. The GCM test requires the regression of Algorithm-agnostic significance testing in supervised learning with multimodal data (211) on Algorithm-agnostic significance testing in supervised learning with multimodal data (212) and Algorithm-agnostic significance testing in supervised learning with multimodal data (213) on Algorithm-agnostic significance testing in supervised learning with multimodal data (214). This prohibits the use of the GCM in settings where Algorithm-agnostic significance testing in supervised learning with multimodal data (215) is a high-dimensional or non-tabular data modality and can not be represented as or reduced to a low-dimensional tabular modality. The PCM test, on the other hand, does not require regressing Algorithm-agnostic significance testing in supervised learning with multimodal data (216) on Algorithm-agnostic significance testing in supervised learning with multimodal data (217). Thus, the PCM test allows the end-to-end use of non-tabular data modalities, such as images or text, for instance, via the use of deep neural networks. In contrast to the GCM, the PCM relies on sample splitting and requires more regressions and may thus be less data-efficient. This is addressed, in parts, by repeating the PCM test with multiple random splits, as described above.

Data sets

Variable significance testing: CCLE data

We consider a subset of the anti-cancer drug dataset from the Cancer Cell Line Encyclopedia [CCLE, 19] that contains the response to the PLX4720 drug as a one-dimensional, continuous summary measure obtained from a dose-response curve and a set of Algorithm-agnostic significance testing in supervised learning with multimodal data (218) mutations (absence/presence coded as 0/1, respectively) in Algorithm-agnostic significance testing in supervised learning with multimodal data (219) cancer cell lines. To obtain comparable results, we follow the pre-processing steps in [20] and [21] by screening for mutations that are marginally correlated with drug response Algorithm-agnostic significance testing in supervised learning with multimodal data (220), which leaves Algorithm-agnostic significance testing in supervised learning with multimodal data (221) mutations. See Variable significance testing for a discussion of data-driven pre-screening of mutations on type-I error control.

Modality selection: TCGA data

We consider the openly available TCGA HCC multiomics data set used in [22, 23]. The preprocessed data consist of survival times for Algorithm-agnostic significance testing in supervised learning with multimodal data (222) patients with liver cancer together with RNA-seq (Algorithm-agnostic significance testing in supervised learning with multimodal data (223)), miRNA (Algorithm-agnostic significance testing in supervised learning with multimodal data (224)), and DNA methylation (Algorithm-agnostic significance testing in supervised learning with multimodal data (225)) modalities. Pre-processing involved the removal of features and samples that contained more than 20% missing values and imputation of the remaining missing values. Further detail can be found in [22].

Modality selection with imaging: MIMIC data

We consider the MIMIC Chest X-Ray data set [24, 25], which contains the race (Algorithm-agnostic significance testing in supervised learning with multimodal data (226); with levels ‘white’, ‘black’, ‘asian’), sex (Algorithm-agnostic significance testing in supervised learning with multimodal data (227); with levels ‘male’, ‘female’), age (Algorithm-agnostic significance testing in supervised learning with multimodal data (228), in years), pre-trained embeddings of chest X-rays (Algorithm-agnostic significance testing in supervised learning with multimodal data (229)) and (among other response variables) whether a pleural effusion (Algorithm-agnostic significance testing in supervised learning with multimodal data (230)) was visible on the X-ray for Algorithm-agnostic significance testing in supervised learning with multimodal data (231) patients. The dimension of the image embedding was reduced by using the first 111 components of a singular value decomposition, which explain 98% of the variance.

Computational details

All analyses were carried out using the R language for statistical computing [26]. The COMETs are implemented in comets [27], which relies on ranger [28] and glmnet [29] for the random forest (RF) and LASSO regressions, respectively. Code for reproducing all results is available at https://github.com/LucasKook/comets. In the following, unless specified otherwise, GCM and PCM tests are run with RFs for all regressions. LASSO regressions are used for analyzing the TCGA data in Modality selection. A Python implementation of COMETs, the pycomets library [30], is available on GitHub https://github.com/shimenghuang/pycomets.

3 Results

With our analyses, we aim to show how testing with covariance measures can be used to tackle two of the most common supervised learning problems in biomedical applications with multimodal data: Variable significance testing and modality selection (see Fig. 1). Throughout, we compare COMETs with existing methods (if applicable) on openly available real data sets (see Data sets for an overview of the data sets).

Variable significance testing

We apply COMETs to the anti-cancer drug dataset from the Cancer Cell Line Encyclopedia [19] and compare with the results obtained using the CRT [10] GCIT [20], and DGCIT [21]. See Introduction for information on the CRT and Model-X based tests. The null hypotheses Algorithm-agnostic significance testing in supervised learning with multimodal data (232) are tested for Algorithm-agnostic significance testing in supervised learning with multimodal data (233) to detect mutations that are significantly associated with PLX4720 drug response.

COMETs identify mutations associated with PLX4720 drug activity

Table 1 summarizes the results for the GCIT, DGCIT, GCM, and PCM test and the 10 selected mutations in ([20], Fig. 4). Overall, there is large agreement between all tests that all reject the null hypothesis for the BRAF_V600E, BRAF_MC, HIP1, FLT3, THBS3, and DNMT1 mutations, corroborating previously reported results. For the PRKD1, PIP5K1A, and MAP3K5 mutations, the PCM test rejects, while the GCM test does not, which is consistent with the PCM test having power against a larger class of alternatives (Fig. 2).

Table 1

Results for the CCLE data in Variable significance testing. The table shows variable importance ranks and Algorithm-agnostic significance testing in supervised learning with multimodal data (234)-values for the relation of mutations of 10 genes with the response to PLX4720 conditional on the 465 other mutations in the data. The PCM test was run with Algorithm-agnostic significance testing in supervised learning with multimodal data (235) random splits. The variable importance ranks (obtained via random forests, RF, or elastic net regression, EN) and the CRT, GCIT, and DGCIT results were obtained from [20] and [21].

Method Gene mutations
BRAF_V600EBRAF_MCHIP1FLT3CDC42BPATHBS3DNMT1PRKD1PIP5K1AMAP3K5
EN1345789101978
RF12314834281879
CRTAlgorithm-agnostic significance testing in supervised learning with multimodal data (236)Algorithm-agnostic significance testing in supervised learning with multimodal data (237)Algorithm-agnostic significance testing in supervised learning with multimodal data (238)0.0170.0090.0170.0220.0020.0240.012
GCITAlgorithm-agnostic significance testing in supervised learning with multimodal data (239)Algorithm-agnostic significance testing in supervised learning with multimodal data (240)Algorithm-agnostic significance testing in supervised learning with multimodal data (241)0.5210.0500.0130.0200.0020.001Algorithm-agnostic significance testing in supervised learning with multimodal data (242)
DGCITAlgorithm-agnostic significance testing in supervised learning with multimodal data (243)Algorithm-agnostic significance testing in supervised learning with multimodal data (244)Algorithm-agnostic significance testing in supervised learning with multimodal data (245)Algorithm-agnostic significance testing in supervised learning with multimodal data (246)Algorithm-agnostic significance testing in supervised learning with multimodal data (247)Algorithm-agnostic significance testing in supervised learning with multimodal data (248)Algorithm-agnostic significance testing in supervised learning with multimodal data (249)Algorithm-agnostic significance testing in supervised learning with multimodal data (250)Algorithm-agnostic significance testing in supervised learning with multimodal data (251)0.794
GCM0.0300.0330.0100.0050.0040.0420.0100.1650.4640.504
PCM0.0010.0120.0080.0090.0140.0270.0140.0110.0220.022
GCM (no screening)0.0030.0210.0060.0020.0020.0680.0070.0070.2230.216
PCM (no screening)0.0020.0070.0820.1510.1860.1340.1380.1080.1980.122

Open in a separate window

Open in a separate window

Figure 4

Computation times (in seconds; y-axis) for the GCM and PCM test using random forest regressions for varying dimensionality of Algorithm-agnostic significance testing in supervised learning with multimodal data (253) (panels) and sample size (x-axis).

COMETs detect relevant mutations without pre-screening

Prior results rely on pre-screening genes based on their marginal correlation with the drug response. However, marginal correlation cannot inform subsequent conditional independence tests in general and the data-driven pre-screening may have lead to inflated false positive rates [31]. However, the GCM and PCM test can be applied without pre-screening and still consistently reject the null hypothesis of conditional independence for the BRAF_V600E and BRAF_MC mutations (see rows in Table 1 with ‘no screening’). When correcting (Holm) the Algorithm-agnostic significance testing in supervised learning with multimodal data (254)-values to attain a family-wise error rate of Algorithm-agnostic significance testing in supervised learning with multimodal data (255) for the 10 mutations of interest, the GCM and PCM still reject the null hypothesis for BRAF_V600E (Algorithm-agnostic significance testing in supervised learning with multimodal data (256) for the GCM test and Algorithm-agnostic significance testing in supervised learning with multimodal data (257) for the PCM test). This rejection is expected because PLX4720 was designed as a BRAF inhibitor [19].

Modality selection

The goal of our analysis is to identify modalities among RNA-seq, miRNA and DNA methylation that are important for predicting survival of liver cancer patients by testing if the event is independent of the modality Algorithm-agnostic significance testing in supervised learning with multimodal data (258), given the other modalities Algorithm-agnostic significance testing in supervised learning with multimodal data (259), Algorithm-agnostic significance testing in supervised learning with multimodal data (260). This is a challenging problem due to the high dimensionality of both the candidate modality Algorithm-agnostic significance testing in supervised learning with multimodal data (261) and the conditioning variables in Algorithm-agnostic significance testing in supervised learning with multimodal data (262).

Evidence of DNA methylation being important for predicting survival in liver cancer patients

Table 2 (PCM-RF) shows Algorithm-agnostic significance testing in supervised learning with multimodal data (263)-values for the PCM test (Algorithm-agnostic significance testing in supervised learning with multimodal data (264) different splits) testing for significance of the RNA-seq, miRNA, and DNA methylation modalities conditional on the remaining two without pre-screening features in any of the modalities using a RF regression. There is some evidence that the DNA methylation modality is important for predicting death in liver cancer patients. Conversely, the PCM test does not provide evidence that survival depends on the RNA-seq or miRNA modalities, when already conditioning on the DNA methylation data. Comparable results are obtained when substituting the RF regression for Algorithm-agnostic significance testing in supervised learning with multimodal data (265) on Algorithm-agnostic significance testing in supervised learning with multimodal data (266), Algorithm-agnostic significance testing in supervised learning with multimodal data (267), and Algorithm-agnostic significance testing in supervised learning with multimodal data (268) with a cross-validated LASSO regression using the optimal tuning parameter: after a multiple testing correction (Holm), both PCM tests reject the null hypothesis only for the DNA methylation modality.

Table 2

Results (Algorithm-agnostic significance testing in supervised learning with multimodal data (269)-values) for the multiomics application in Modality selection using the PCM with Algorithm-agnostic significance testing in supervised learning with multimodal data (270) random splits once using an RF for the regression of Algorithm-agnostic significance testing in supervised learning with multimodal data (271) on Algorithm-agnostic significance testing in supervised learning with multimodal data (272), Algorithm-agnostic significance testing in supervised learning with multimodal data (273) and Algorithm-agnostic significance testing in supervised learning with multimodal data (274), and once a cross-validated high-dimensional linear regression (LASSO).

Null hypothesisPCM-RFPCM-LASSO
Algorithm-agnostic significance testing in supervised learning with multimodal data (275)0.1780.066
Algorithm-agnostic significance testing in supervised learning with multimodal data (276)0.1650.044
Algorithm-agnostic significance testing in supervised learning with multimodal data (277)0.0140.002

Open in a separate window

Modality selection with imaging data

Using deep learning methods, [32] provide evidence that both the race and the response (pleural effusion) can be predicted from the X-ray embedding with high accuracy. The goal of our analysis is to test, whether race helps predict the response when already conditioning on age, sex, and the X-rays and, vice versa, whether the X-rays contain information for predicting pleural effusion given sex, age, and race.

Strong evidence for X-ray imaging and race being important for predicting pleural effusion

There is strong evidence against the null hypotheses of pleural effusion being independent of either X-ray imaging or race, given the other and, additionally, sex and age of a patient (Table 3).

Table 3

Results (Algorithm-agnostic significance testing in supervised learning with multimodal data (278)-transformed Algorithm-agnostic significance testing in supervised learning with multimodal data (279)-values) for the GCM and PCM applied to the full MIMIC data set in Modality selection with imaging data. Both tests reject both hypotheses. See Fig. 3 for an uncertainty assessment.

Null hypothesisGCMPCM
Algorithm-agnostic significance testing in supervised learning with multimodal data (280)6.15877.762
Algorithm-agnostic significance testing in supervised learning with multimodal data (281)13805.8021270.361

Open in a separate window

To gauge the uncertainty in the results of the COMETs, we repeat the tests on 75 random (non-overlapping) subsamples of different sample sizes (150, 600, 2400) of the data. Only the PCM rejects the null hypothesis of pleural effusion (PE) being independent of race given the X-ray, sex, and age of a patient at any of the considered sample sizes, which provides evidence that Algorithm-agnostic significance testing in supervised learning with multimodal data (282) is close to zero yet Algorithm-agnostic significance testing in supervised learning with multimodal data (283) still varies non-linearly with Algorithm-agnostic significance testing in supervised learning with multimodal data (284). At full sample size, the GCM does reject, indicating the presence of a weak linear signal (estimated correlations between pleural effusion and race residuals are smaller than Algorithm-agnostic significance testing in supervised learning with multimodal data (285)). It is somewhat unsurprising to see both COMETs reject the null hypothesis at such large sample sizes ([33], Modality selection with imaging data).

Open in a separate window

Figure 3

Results (Algorithm-agnostic significance testing in supervised learning with multimodal data (287)-transformed Algorithm-agnostic significance testing in supervised learning with multimodal data (288)-values) for the GCM and PCM applied to 75 random non-overlapping splits of different sample sizes (Algorithm-agnostic significance testing in supervised learning with multimodal data (289)) of the MIMIC data set in Modality selection with imaging data. Splitting the data enables an analysis of the uncertainty in the tests’ rejections and the strength of evidence against the null.

Both tests reject the null hypothesis of pleural effusion (PE) being independent of X-ray given race, sex and age of a patient at any sample size but in fact the GCM produces smaller Algorithm-agnostic significance testing in supervised learning with multimodal data (290)-values. This indicates that there is a significant component in Algorithm-agnostic significance testing in supervised learning with multimodal data (291) varying linearly with Algorithm-agnostic significance testing in supervised learning with multimodal data (292); in these cases, the PCM will not outperform the GCM for a fixed sample size.

Computation times

The computation time of the GCM and PCM test depends on the dimensionality Algorithm-agnostic significance testing in supervised learning with multimodal data (293) of Algorithm-agnostic significance testing in supervised learning with multimodal data (294) and sample size Algorithm-agnostic significance testing in supervised learning with multimodal data (295) and the chosen regression methods. For low-dimensional Algorithm-agnostic significance testing in supervised learning with multimodal data (296), the PCM test requires more regressions than the GCM test which results in slower computation times (see Fig. 4). However, for higher-dimensional Algorithm-agnostic significance testing in supervised learning with multimodal data (297), the GCM test requires more regressions resulting in longer computation times. For moderate dimensions (Algorithm-agnostic significance testing in supervised learning with multimodal data (298) and Algorithm-agnostic significance testing in supervised learning with multimodal data (299)), the computation times are similar.

Discussion

We present COMETs for algorithm-agnostic significance testing with multi-modal, potentially non-tabular data, which relies on tests of conditional independence based on covariance measures. The versatility of the GCM and PCM tests is shown in several applications involving variable significance testing and modality selection in the presence of high-dimensional conditioning variables. In the following, we discuss the applications in more detail and end with a discussion of computational aspects and recommendations for using COMETs in supervised learning applications with multimodal data.

Variable significance testing

The GCM and PCM test show comparable results to competing methods and can be applied without relying on data-driven pre-screening which, otherwise, can invalidate Algorithm-agnostic significance testing in supervised learning with multimodal data (300)-values and lead to inflated type I error rates. Type I error control additionally suffers from the performed number of tests. After correcting for multiple testing, the COMETs provide evidence that BRAF_V600E is associated with PLX4720 activity while controlling for all other mutations. As highlighted before, this is expected since PLX4720 was designed as a BRAF inhibitor.

Modality selection

The PCM test is applied to the TCGA data set to test which modalities (RNAseq, miRNA, DNAm) are important (conditional on the others) for predicting survival in liver cancer patients and rejects the null hypothesis for the DNA methylation modality. Failure to reject the null hypothesis for the RNA-seq and miRNA modalities may be due to the low sample size and extremely high dimensionality of the problem and ought to be interpreted as lack of evidence that RNA-seq and miRNA data contain information for predicting survival beyond DNA methylation in the data at hand. Taken together, this application demonstrates that the PCM test can be used for modality selection with high-dimensional candidate and conditioning modalities. COMETs could, for instance, be used to trade off the economic cost of measuring an omics (or imaging, as in the MIMIC application) modality with the gain in predictive power at a given significance level. It is worth noting that a naive test based on the comparison of cross-validated mean-squared errors using all variables and all but one variable does not result in a valid statistical test [12, 14]. Lastly, the validity of conditional independence tests applied to the TCGA data depends on the validity of the imputation procedure used during data pre-processing.

Modality selection with imaging data

The large and openly available MIMIC data set serves as an example application of how image and other non-tabular modalities may enter an analysis based on COMETs. The PCM does not require pre-trained embeddings and could, in principle, also be used in combination with deep convolutional neural networks if the raw imaging data is available. The 111-dimensional embedding further enables the use of the GCM test to serve as a benchmark. However, it is important to properly choose the regressions involved in COMETs as the tests rely on their quality and asymptotic properties [14, 15]. Nevertheless, to the best of our knowledge, no other tests exist with theoretical guarantees that also permit testing Algorithm-agnostic significance testing in supervised learning with multimodal data (301) when Algorithm-agnostic significance testing in supervised learning with multimodal data (302) is a non-tabular modality.

Recommendations and outlook

As outlined in Comparison of the GCM and PCM tests, the regression of Algorithm-agnostic significance testing in supervised learning with multimodal data (303) on Algorithm-agnostic significance testing in supervised learning with multimodal data (304) required by the GCM can become computationally challenging if Algorithm-agnostic significance testing in supervised learning with multimodal data (305) is high-dimensional (which is why the GCM test is not applied in Modality selection for modality selection) or non-tabular (this was circumvented by using the relatively low-dimensional tabular embedding of the chest X-ray images in Modality selection with imaging data; see also the computation times in Computation times). The PCM test, in contrast, does not rely on this regression and is thus directly applicable in cases where Algorithm-agnostic significance testing in supervised learning with multimodal data (306) and Algorithm-agnostic significance testing in supervised learning with multimodal data (307) are high-dimensional or non-tabular modalities. The GCM has further been adapted to settings with functional outcomes [34], continuous time stochastic processes [35], censored outcomes [36], and extended to powerful weighted [16] and kernel-based [37] versions. These are all COMETs proposed in the literature and we leave their applicability in biomedical contexts as a topic for future work.

In the applications presented in this paper, RF and LASSO regressions were used. RFs are computationally fast and require little hyperparameter tuning to obtain well-performing regression estimates. However, for very high-dimensional applications in which the number of features exceeds the number of observations, the LASSO is a fast and computationally stable alternative.

Overall, we believe that COMETs provide a useful tool for bioinformaticians to assess significance in applications with high-dimensional and potentially non-tabular omics and biomedical data while appropriately controlling error probabilities. The increasing familiarity of data analysts with supervised learning methods, on which COMETs rely, help safeguard the validity of the statistical inference. Further, the algorithm-agnostic nature of the procedures makes COMETs easily adaptable to future developments in predictive modeling.

Key Points

  • We show how COvariance MEasure Tests (COMETs) for conditional independence can be applied for the ubiquitous tasks of variable significance testing and modality selection in high-dimensional multimodal and non-tabular datasets.

  • The algorithm-agnostic nature of the COMETs allow the data analyst to control for complex high-dimensional confounders with potentially non-linear confounding mechanisms.

  • Using COMETs, we (i) screen for the significance of mutations in predicting PLX4720 drug activity in the CCLE dataset, (ii) select entire omics modalities for predicting survival in liver cancer patients in the TCGA dataset, and (iii) investigate the significance of image and tabular modalities for predicting the presence of pleural effusion in the MIMIC dataset.

  • We provide a user-friendly open source implementation of several covariance measure tests in both R and Python to foster their use and usability in the bioinformatics community. We give recommondations for choosing and tuning the supervised learning algorithms used in COMETs.

Acknowledgments

We thank Niklas Pfister and David Rügamer for helpful discussions. We thank Klemens Fröhlich, Witold Wolski, and Shimeng Huang for helpful comments on the manuscript.

Contributor Information

Lucas Kook, Institute for Statistics and Mathematics, Vienna University of Economics and Business, Welthandelsplatz 1, AT-1020 Vienna, Austria.

Anton Rask Lundborg, Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen, Denmark.

Funding

L.K. was supported by the Swiss National Science Foundation (grant no. 214457). A.R.L. was supported by a research grant (0069071) from Novo Nordisk Fonden.

References

1. Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 2019; 35:i446–54. 10.1093/bioinformatics/btz342. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

2. Ahmed KT, Sun J, Cheng S. et al... Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38:179–86. 10.1093/bioinformatics/btab608. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

3. Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:bbab569. 10.1093/bib/bbab569. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

4. Hastie T, Tibshirani R, Friedman JH. et al... The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Vol. 2.New York: Springer, 2009. [Google Scholar]

5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521:436–44. 10.1038/nature14539. [PubMed] [CrossRef] [Google Scholar]

6. Smucler E, Rotnitzky A. A note on efficient minimum cost adjustment sets in causal graphical models. J Causal Inference 2022; 10:174–89. 10.1515/jci-2022-0015. [CrossRef] [Google Scholar]

7. Shah RD, Bühlmann P. Double-estimation-friendly inference for high-dimensional misspecified models. Stat Sci 2023; 38:68–91. 10.1214/22-STS850. [CrossRef] [Google Scholar]

8. Zhang K, Peters J, Janzing D. et al... Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI'11). AUAI Press, Arlington, Virginia, USA, 804–813.

9. Strobl EV, Zhang K, Visweswaran S. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. J Causal Inference 2019; 7:20180017. 10.1515/jci-2018-0017. [CrossRef] [Google Scholar]

10. Candès E, Fan Y, Janson L. et al... Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection. J R Stat Soc Series B Stat Methodology 2018; 80:551–77. 10.1111/rssb.12265. [CrossRef] [Google Scholar]

11. Berrett TB, Wang Y, Barber RF. et al... The conditional permutation test for independence while controlling for confounders. J R Stat Soc Series B Stat Methodology 2019; 82:175–97. [Google Scholar]

12. Williamson BD, Gilbert PB, Carone M. et al... Nonparametric variable importance assessment using machine learning techniques. Biometrics 2021; 77:9–22. 10.1111/biom.13392. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

13. Williamson BD, Gilbert PB, Simon NR. et al... A general framework for inference on algorithm-agnostic variable importance. J Am Stat Assoc 2023; 118:1645–58. 10.1080/01621459.2021.2003200. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

14. Lundborg AR, Kim I, Shah RD. et al... The Projected Covariance Measure for assumption-lean variable significance testing. arXiv preprint 2211.02039. 2022.

15. Shah RD, Peters J. The hardness of conditional independence testing and the Generalised Covariance Measure. Ann Stat 2020; 48:1514–38. [Google Scholar]

16. Scheidegger C, Hörrmann J, Bühlmann P. The weighted Generalised Covariance Measure. J Mach Learn Res 2022; 23:12517–84. [Google Scholar]

17. Kim I, Neykov M, Balakrishnan S. et al... Local permutation tests for conditional independence. Ann Stat 2022; 50:3388–414. [Google Scholar]

18. Guo FR, Shah RD. Rank-transformed subsampling: Inference for multiple data splitting and exchangeable p-values. J R Stat Soc Series B Stat Methodology 2024. 10.1093/jrsssb/qkae091 [CrossRef]

19. Barretina J, Caponigro G, Stransky N. et al... The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012; 483:603–7. 10.1038/nature11003. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

20. Bellot A, van der Schaar. Conditional independence testing using generative adversarial networks. In: Wallach H, Larochelle H, Beygelzimer A. etal. (eds.), Advances in Neural Information Processing Systems, Vol. 32.Curran Associates, Inc., 2019. [Google Scholar]

21. Shi C, Xu T, Bergsma W. et al... Double generative adversarial networks for conditional independence testing. J Mach Learn Res 2021; 22:13029–60. [Google Scholar]

22. Chaudhary K, Poirion OB, Lu L. et al... Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018; 24:1248–59. 10.1158/1078-0432.CCR-17-0853. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

23. Poirion OB, Jing Z, Chaudhary K. et al... Deepprog: An ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med 2021; 13:1–15. 10.1186/s13073-021-00930-x. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

24. Johnson AE, Pollard TJ, Greenbaum NR. et al... MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint 1901.07042. 2019. [PMC free article] [PubMed]

25. Sellergren AB, Chen C, Nabulsi Z. et al... Simplified transfer learning for chest radiography models using less data. Radiology 2022; 305:454–65. 10.1148/radiol.212482. [PubMed] [CrossRef] [Google Scholar]

26. R Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria:R Foundation for Statistical Computing, 2021. [Google Scholar]

27. Kook L. COMETs: Covariance Measure Tests for Conditional Independence 2024. R package version 0.0–2. 10.32614/CRAN.package.comets. [CrossRef]

28. Wright MN, Ziegler A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 2017; 77:1–17. 10.18637/jss.v077.i01. [CrossRef] [Google Scholar]

29. Tay JK, Narasimhan B, Hastie T. Elastic net regularization paths for all generalized linear models. J Stat Softw 2023; 106:1–31. 10.18637/jss.v106.i01. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

30. Huang S, Kook L. Pycomets: Covariance Measure Tests for Conditional Independence.Python library, 2024. https://github.com/shimenghuang/pycomets [Google Scholar]

31. Berk R, Brown L, Buja A. et al... Valid post-selection inference. Ann Stat 2013; 41:802–37. 10.1214/12-AOS1077. [CrossRef] [Google Scholar]

32. Glocker B, Jones C, Roschewitz M. et al... Risk of bias in chest radiography deep learning foundation models. Radiology: Artif Intell 2023; 5:e230060. 10.1148/ryai.230060. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

33. Greenland S. Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values. Am Stat 2019; 73:106–14. 10.1080/00031305.2018.1529625. [CrossRef] [Google Scholar]

34. Lundborg AR, Shah RD, Peters J. Conditional independence testing in hilbert spaces with applications to functional data analysis. J R Stat Soc Series B Stat Methodology 2022; 84:1821–50. 10.1111/rssb.12544. [CrossRef] [Google Scholar]

35. Christgau AM, Petersen L, Hansen NR. Nonparametric conditional local independence testing. Ann Stat 2023; 51:2116–44. [Google Scholar]

36. Kook L, Saengkyongam S, Lundborg AR. et al... Model-based causal feature selection for general response types. J Am Stat Assoc just-accepted 2024;1–23. 10.1080/01621459.2024.2395588. [CrossRef]

37. Fernández T, Rivera N. A general framework for the analysis of kernel-based tests. J Mach Learn Res 2024; 25:1–40. [Google Scholar]

Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

Algorithm-agnostic significance testing in supervised learning with multimodal data (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 5845

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.