Statistics Theory
See recent articles
Showing new listings for Wednesday, 22 January 2025
- [1] arXiv:2501.10897 [pdf, html, other]
-
Title: Unfolding Tensors to Identify the Graph in Discrete Latent Bipartite Graphical ModelsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG)
We use a tensor unfolding technique to prove a new identifiability result for discrete bipartite graphical models, which have a bipartite graph between an observed and a latent layer. This model family includes popular models such as Noisy-Or Bayesian networks for medical diagnosis and Restricted Boltzmann Machines in machine learning. These models are also building blocks for deep generative models. Our result on identifying the graph structure enjoys the following nice properties. First, our identifiability proof is constructive, in which we innovatively unfold the population tensor under the model into matrices and inspect the rank properties of the resulting matrices to uncover the graph. This proof itself gives a population-level structure learning algorithm that outputs both the number of latent variables and the bipartite graph. Second, we allow various forms of nonlinear dependence among the variables, unlike many continuous latent variable graphical models that rely on linearity to show identifiability. Third, our identifiability condition is interpretable, only requiring each latent variable to connect to at least two "pure" observed variables in the bipartite graph. The new result not only brings novel advances in algebraic statistics, but also has useful implications for these models' trustworthy applications in scientific disciplines and interpretable machine learning.
- [2] arXiv:2501.10898 [pdf, html, other]
-
Title: High-dimensional Sobolev tests on hyperspheresComments: 24 pages, 5 figures, 4 tablesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We derive the limit null distribution of the class of Sobolev tests of uniformity on the hypersphere when the dimension and the sample size diverge to infinity at arbitrary rates. The limiting non-null behavior of these tests is obtained for a sequence of integrated von Mises-Fisher local alternatives. The asymptotic results are applied to test for high-dimensional rotational symmetry and spherical symmetry. Numerical experiments illustrate the derived behavior of the uniformity and spherically symmetry tests under the null and under local and fixed alternatives.
- [3] arXiv:2501.11208 [pdf, html, other]
-
Title: On Testing Kronecker Product Structure in Tensor Factor ModelsComments: 48 pages, 0 figuresSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We propose a test for testing the Kronecker product structure of a factor loading matrix implied by a tensor factor model with Tucker decomposition in the common component. Through defining a Kronecker product structure set, we define if a tensor time series response $\{\mathcal{Y}_t\}$ has a Kronecker product structure, equivalent to the ability to decompose $\{\mathcal{Y}_t\}$ according to a tensor factor model. Our test is built on analysing and comparing the residuals from fitting a full tensor factor model, and the residuals from fitting a (tensor) factor model on a reshaped version of the data. In the most extreme case, the reshaping is the vectorisation of the tensor data, and the factor loading matrix in such a case can be general if there is no Kronecker product structure present. Theoretical results are developed through asymptotic normality results on estimated residuals. Numerical experiments suggest that the size of the tests gets closer to the pre-set nominal value as the sample size or the order of the tensor gets larger, while the power increases with mode dimensions and the number of combined modes. We demonstrate out tests through a NYC taxi traffic data and a Fama-French matrix portfolio of returns.
- [4] arXiv:2501.11280 [pdf, html, other]
-
Title: Empirical Bayes Estimation for Lasso-Type Regularizers: Analysis of Automatic Relevance DeterminationComments: 8 pages, 1 figureSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG)
This paper focuses on linear regression models with non-conjugate sparsity-inducing regularizers such as lasso and group lasso. Although empirical Bayes approach enables us to estimate the regularization parameter, little is known on the properties of the estimators. In particular, there are many unexplained aspects regarding the specific conditions under which the mechanism of automatic relevance determination (ARD) occurs. In this paper, we derive the empirical Bayes estimators for the group lasso regularized linear regression models with a limited number of parameters. It is shown that the estimators diverge under a certain condition, giving rise to the ARD mechanism. We also prove that empirical Bayes methods can produce ARD mechanism in general regularized linear regression models and clarify the conditions under which models such as ridge, lasso, and group lasso can produce ARD mechanism.
- [5] arXiv:2501.12179 [pdf, html, other]
-
Title: Block Adaptive Progressive Type-II Censored Sampling for the Inverted Exponentiated Pareto Distribution: Parameter Inference and Reliability AssessmentComments: 26 pages, 13 figuresSubjects: Statistics Theory (math.ST)
This article explores the estimation of unknown parameters and reliability characteristics under the assumption that the lifetimes of the testing units follow an Inverted Exponentiated Pareto (IEP) distribution. Here, both point and interval estimates are calculated by employing the classical maximum likelihood and a pivotal estimation methods. Also, existence and uniqueness of the maximum likelihood estimates are verified. Further, asymptotic confidence intervals are derived by using the asymptotic normality property of the maximum likelihood estimator. Moreover, generalized confidence intervals are obtained by utilizing the pivotal quantities. Additionally, some mathematical developments of the IEP distribution are discussed based on the concept of order statistics. Furthermore, all the estimations are performed on the basis of the block censoring procedure, where an adaptive progressive Type-II censoring is employed to every block. In this regard, the performances of two estimation methods, namely maximum likelihood estimation and pivotal estimation, is evaluated and compared through a simulation study. Finally, a real data is illustrated to demonstrate the flexibility of the proposed IEP model.
New submissions (showing 5 of 5 entries)
- [6] arXiv:2501.10538 (cross-list from cs.LG) [pdf, html, other]
-
Title: Universality of Benign Overfitting in Binary Linear ClassificationComments: 66 pages, 5 figuresSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in various classical statistical models. For linear maximum margin classifiers, benign overfitting has been established theoretically in a class of mixture models with very strong assumptions on the covariate distribution. However, even in this simple setting, many questions remain open. For instance, most of the existing literature focuses on the noiseless case where all true class labels are observed without errors, whereas the more interesting noisy case remains poorly understood. We provide a comprehensive study of benign overfitting for linear maximum margin classifiers. We discover a phase transition in test error bounds for the noisy model which was previously unknown and provide some geometric intuition behind it. We further considerably relax the required covariate assumptions in both, the noisy and noiseless case. Our results demonstrate that benign overfitting of maximum margin classifiers holds in a much wider range of scenarios than was previously known and provide new insights into the underlying mechanisms.
- [7] arXiv:2501.10660 (cross-list from math.NA) [pdf, html, other]
-
Title: Blind free deconvolution over one-parameter sparse families via eigenmatrixSubjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
This note considers the blind free deconvolution problems of sparse spectral measures from one-parameter families. These problems pose significant challenges since they involve nonlinear sparse recovery. The main technical tool is the eigenmatrix method for solving unstructured sparse recovery problems. The key idea is to turn the nonlinear inverse problem into a linear inverse problem by leveraging the R-transform for free addition and the S-transform for free product. The resulting linear problem is solved with the eigenmatrix method tailored to the domain of the parametric family. Numerical results are provided for both the additive and multiplicative free deconvolutions.
- [8] arXiv:2501.10815 (cross-list from cs.LG) [pdf, html, other]
-
Title: An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended VersionComments: This is the extended version of a paper accepted at 2025 SIAM International Conference on Data Mining (SDM'25)Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
A fundamental task in statistical learning is quantifying the joint dependence or association between two continuous random variables. We introduce a novel, fully non-parametric measure that assesses the degree of association between continuous variables $X$ and $Y$, capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of $X$ is ignored in predicting $Y$. This measure is bounded within the interval [0,1] and is equal to zero if and only if $X$ and $Y$ are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.
- [9] arXiv:2501.11210 (cross-list from math.LO) [pdf, html, other]
-
Title: Schnorr Randomness and Effective Bayesian Consistency and InconsistencySubjects: Logic (math.LO); Statistics Theory (math.ST)
We study Doob's Consistency Theorem and Freedman's Inconsistency Theorem from the vantage point of computable probability and algorithmic randomness. We show that the Schnorr random elements of the parameter space are computably consistent, when there is a map from the sample space to the parameter space satisfying many of the same properties as limiting relative frequencies. We show that the generic inconsistency in Freedman's Theorem is effectively generic, which implies the existence of computable parameters which are not computably consistent. Taken together, this work provides a computability-theoretic solution to Diaconis and Freedman's problem of ``know[ing] for which [parameters] the rule [Bayes' rule] is consistent'', and it strengthens recent similar results of Takahashi on Martin-Löf randomness in Cantor space.
- [10] arXiv:2501.11314 (cross-list from math.PR) [pdf, html, other]
-
Title: A Bayesian sequential soft classification problem for a Brownian motion's driftComments: 12 pages, 2 figuresSubjects: Probability (math.PR); Statistics Theory (math.ST)
In this note we introduce and solve a soft classification version of the famous Bayesian sequential testing problem for a Brownian motion's drift. We establish that the value function is the unique non-trivial solution to a free boundary problem, and that the continuation region is characterized by two boundaries which may coincide if the observed signal is not strong enough. By exploiting the solution structure we are able to characterize the functional dependence of the stopping boundaries on the signal-to-noise ratio. We illustrate this relationship and compare our stopping boundaries to those derived in the classical setting.
- [11] arXiv:2501.11421 (cross-list from cs.LG) [pdf, html, other]
-
Title: Online Clustering with Bandit InformationSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST)
We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. In this multi-armed bandit problem, we have $M$ arms, each providing i.i.d. samples that follow a multivariate Gaussian distribution with an {\em unknown} mean and a known unit covariance. The arms are grouped into $K$ clusters based on the distance between their means using the Single Linkage (SLINK) clustering algorithm on the means of the arms. Since the true means are unknown, the objective is to obtain the above clustering of the arms with the minimum number of samples drawn from the arms, subject to an upper bound on the error probability. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that this algorithm is order optimal, meaning that the upper bound on its expected sample complexity for given error probability $\delta$ is within a factor of 2 of an instance-dependent lower bound as $\delta \rightarrow 0$. Furthermore, we propose a computationally more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification. Simulation results demonstrate that the performance of LUCBBOC is comparable to that of ATBOC. We numerically assess the effectiveness of the proposed algorithms through numerical experiments on both synthetic datasets and the real-world MovieLens dataset. To the best of our knowledge, this is the first work on bandit online clustering that allows arms with different means in a cluster and $K$ greater than 2.
- [12] arXiv:2501.11689 (cross-list from cs.LG) [pdf, html, other]
-
Title: Randomness, exchangeability, and conformal predictionComments: 14 pages, 1 figureSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
This note continues development of the functional theory of randomness, a modification of the algorithmic theory of randomness getting rid of unspecified additive constants. It introduces new kinds of confidence predictors, including randomness predictors (the most general confidence predictors based on the assumption of IID observations) and exchangeability predictors (the most general confidence predictors based on the assumption of exchangeable observations). The main result implies that both are close to conformal predictors and quantifies the difference between them.
- [13] arXiv:2501.11773 (cross-list from stat.ML) [pdf, html, other]
-
Title: Can Bayesian Neural Networks Make Confident Predictions?Comments: Mathematics of Modern Machine Learning Workshop at NeurIPS 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Bayesian inference promises a framework for principled uncertainty quantification of neural network predictions. Barriers to adoption include the difficulty of fully characterizing posterior distributions on network parameters and the interpretability of posterior predictive distributions. We demonstrate that under a discretized prior for the inner layer weights, we can exactly characterize the posterior predictive distribution as a Gaussian mixture. This setting allows us to define equivalence classes of network parameter values which produce the same likelihood (training error) and to relate the elements of these classes to the network's scaling regime -- defined via ratios of the training sample size, the size of each layer, and the number of final layer parameters. Of particular interest are distinct parameter realizations that map to low training error and yet correspond to distinct modes in the posterior predictive distribution. We identify settings that exhibit such predictive multimodality, and thus provide insight into the accuracy of unimodal posterior approximations. We also characterize the capacity of a model to "learn from data" by evaluating contraction of the posterior predictive in different scaling regimes.
- [14] arXiv:2501.11868 (cross-list from stat.ME) [pdf, html, other]
-
Title: Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-EstimandsSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating debiased estimation and inference procedures in causal inference and semiparametric statistics, our framework enables practitioners to construct valid estimators for complex parameters without requiring specialized expertise. The framework supports Neyman-orthogonal loss functions with unknown nuisance parameters requiring data-driven estimation, as well as vector-valued M-estimands involving simultaneous loss minimization across multiple Hilbert space models. We formalize the class of parameters efficiently estimable by autoDML as a novel class of nonparametric projection parameters, defined via orthogonal minimum loss objectives. We introduce three autoDML estimators based on one-step estimation, targeted minimum loss-based estimation, and the method of sieves. For data-driven model selection, we derive a novel decomposition of model approximation error for smooth functionals of M-estimands and propose adaptive debiased machine learning estimators that are superefficient and adaptive to the functional form of the M-estimand. Finally, we illustrate the flexibility of our framework by constructing autoDML estimators for the long-term survival under a beta-geometric model.
- [15] arXiv:2501.12212 (cross-list from stat.ML) [pdf, other]
-
Title: Quantitative Error Bounds for Scaling Limits of Stochastic Iterative AlgorithmsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
Stochastic iterative algorithms, including stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD), are widely utilized for optimization and sampling in large-scale and high-dimensional problems in machine learning, statistics, and engineering. Numerous works have bounded the parameter error in, and characterized the uncertainty of, these approximations. One common approach has been to use scaling limit analyses to relate the distribution of algorithm sample paths to a continuous-time stochastic process approximation, particularly in asymptotic setups. Focusing on the univariate setting, in this paper, we build on previous work to derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation using an infinite-dimensional version of Stein's method of exchangeable pairs. We show that this bound implies weak convergence under modest additional assumptions and leads to a bound on the error of the variance of the iterate averages of the algorithm. Furthermore, we use our main result to construct error bounds in terms of two common metrics: the Lévy-Prokhorov and bounded Wasserstein distances. Our results provide a foundation for developing similar error bounds for the multivariate setting and for more sophisticated stochastic approximation algorithms.
Cross submissions (showing 10 of 10 entries)
- [16] arXiv:1901.09214 (replaced) [pdf, html, other]
-
Title: On the unification of zero-adjusted cure survival modelsFrancisco Louzada, Pedro Luiz Ramos, Hayala C. C. Souza, Lawal Oyeneyin, Gleici da Silva Castro PerdonaSubjects: Statistics Theory (math.ST); Applications (stat.AP)
This paper proposes a unified version of survival models that accounts for both zero-adjustment and cure proportions in various latent competing causes, useful in data where survival times may be zero or cure proportions are present. These models are particularly relevant in scenarios like childbirth duration in sub-Saharan Africa. Different competing cause distributions were considered, including Binomial, Geometric, Poisson, and Negative Binomial. The model's maximum likelihood point estimators and asymptotic confidence intervals were evaluated through simulation, demonstrating improved accuracy with larger sample sizes. The model best fits real obstetric data when assuming geometrically distributed causes. This flexible model, capable of considering different distributions for the lifetime of susceptible individuals and competing causes, is an effective tool for adjusting survival data, indicating broad application potential.
- [17] arXiv:2208.11481 (replaced) [pdf, html, other]
-
Title: Some Notes of Inequalities Under $\mathcal{C}$-mixing Conditions and Their Applications to Variance EstimationSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
As a mixing condition including many interesting dynamic systems as special cases, $\mathcal{C}$-mixing condition has drawn significant attention in recent years. This paper aims to do some contributions on the following points. First, we show a Bernstein-type inequality under $\mathcal{C}$-mixing conditions. Compared with the pioneering work on this point, \citeA{hang2017bernstein}, our inequality is sharper under more general assumptions. Second, since the general definition of $\mathcal{C}$-mixing condition is based on a covariance inequality whose upper bound relies on some given $\mathcal{C}$-norm (see Definition \ref{def 1}), a natural difficulty arises when the $\mathcal{C}$-norm is infinitely large. Under this circumstances, we show some inequalities bounding the variance of partial sums without requiring finite $\mathcal{C}$-norms. Finally, up to our knowledge, there is few literature discussing central limit theorem under $C$-mixing conditions as general as that of \citeA{hang2017bernstein}. Thus, under \citeauthor{hang2017bernstein}'s $\mathcal{C}$-mixing conditions, we take one step forward on this point by deriving a central limit theorem with mild moment conditions. As for the applications, we apply the previously mentioned results to show Bahadur representation and asymptotic normality of weighted $M$-estimators.
- [18] arXiv:2305.10416 (replaced) [pdf, html, other]
-
Title: Minimax rate for multivariate data under componentwise local differential privacy constraintsSubjects: Statistics Theory (math.ST)
Our research delves into the balance between maintaining privacy and preserving statistical accuracy when dealing with multivariate data that is subject to \textit{componentwise local differential privacy} (CLDP). With CLDP, each component of the private data is made public through a separate privacy channel. This allows for varying levels of privacy protection for different components or for the privatization of each component by different entities, each with their own distinct privacy policies. We develop general techniques for establishing minimax bounds that shed light on the statistical cost of privacy in this context, as a function of the privacy levels $\alpha_1, ... , \alpha_d$ of the $d$ components. We demonstrate the versatility and efficiency of these techniques by presenting various statistical applications. Specifically, we examine nonparametric density and covariance estimation under CLDP, providing upper and lower bounds that match up to constant factors, as well as an associated data-driven adaptive procedure. Furthermore, we quantify the probability of extracting sensitive information from one component by exploiting the fact that, on another component which may be correlated with the first, a smaller degree of privacy protection is guaranteed.
- [19] arXiv:2306.00476 (replaced) [pdf, html, other]
-
Title: From sparse to dense functional data in high dimensions: Revisiting phase transitions from a non-asymptotic perspectiveSubjects: Statistics Theory (math.ST)
Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most frequently used. Zhang and Wang (2016) explored different types of asymptotic properties of the estimation, which reveal interesting phase transition phenomena based on the relative order of the average sampling frequency per subject $T$ to the number of subjects $n$, partitioning the data into three categories: "sparse", "semi-dense", and "ultra-dense". In an increasingly available high-dimensional scenario, where the number of functional variables $p$ is large in relation to $n$, we revisit this open problem from a non-asymptotic perspective by deriving comprehensive concentration inequalities for the local linear smoothers. Besides being of interest by themselves, our non-asymptotic results lead to elementwise maximum rates of $L_2$ convergence and uniform convergence serving as a fundamentally important tool for further convergence analysis when $p$ grows exponentially with $n$ and possibly $T$. With the presence of extra $\log p$ terms to account for the high-dimensional effect, we then investigate the scaled phase transitions and the corresponding elementwise maximum rates from sparse to semi-dense to ultra-dense functional data in high dimensions. We also discuss a couple of applications of our theoretical results. Finally, numerical studies are carried out to confirm the established theoretical properties.
- [20] arXiv:2310.01970 (replaced) [pdf, other]
-
Title: Functional Data-Driven Quantile Model Averaging with Application to CryptocurrenciesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Given the high volatility and susceptibility to extreme events in the cryptocurrency market, forecasting tail risk is of paramount importance. Value-at-Risk (VaR), a quantile-based risk measure, is widely used for assessing tail risk and is central to monitoring financial market stability. In data-rich environments, functional data from various domains are employed to forecast conditional quantiles. However, the infinite-dimensional nature of functional data introduces uncertainty. This paper addresses this uncertainty problem by proposing a novel data-driven conditional quantile model averaging (MA) approach. With a set of candidate models varying by the number of components, MA assigns weights to each model determined by a K-fold cross-validation criterion. We prove the asymptotic optimality of the selected weights in terms of minimizing the excess final prediction error when all candidate models are misspecified. Additionally, when the true regression relationship belongs to the set of candidate models, we provide consistency results for the averaged estimators. Numerical studies indicate that, in most cases, the proposed method outperforms other model selection and averaging methods, particularly for extreme quantiles in cryptocurrency markets.
- [21] arXiv:2311.04318 (replaced) [pdf, html, other]
-
Title: Estimation for multistate models subject to reporting delays and incomplete event adjudicationSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Complete observation of event histories is often impossible due to sampling effects such as right-censoring and left-truncation, but also due to reporting delays and incomplete event adjudication. This is for example the case for health insurance claims and during interim stages of clinical trials. In this paper, we develop a parametric method that takes the aforementioned effects into account, treating the latter two as partially exogenous. The method, which takes the form of a two-step M-estimation procedure, is applicable to multistate models in general, including competing risks and recurrent event models. The effect of reporting delays is derived via thinning, offering an alternative to existing results for Poisson models. To address incomplete event adjudication, we propose an imputed likelihood approach which, compared to existing methods, has the advantage of allowing for dependencies between the event history and adjudication processes as well as allowing for unreported events and multiple event types. We establish consistency and asymptotic normality under standard identifiability, integrability, and smoothness conditions, and we demonstrate the validity of the percentile bootstrap. Finally, a simulation study shows favorable finite sample performance of our method compared to other alternatives, while an application to disability insurance data illustrates its practical potential.
- [22] arXiv:2407.04194 (replaced) [pdf, other]
-
Title: Regularization Using Synthetic Data in High-Dimensional ModelsComments: 98 pages, 12 figuresSubjects: Statistics Theory (math.ST)
To overcome challenges in fitting complex models with small samples, catalytic priors were recently proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. The resulting Maximum A Posteriori (MAP) estimator is a regularized method that maximizes the weighted likelihood of the combined data. While this estimator is computationally straightforward and empirically promising, its theoretical properties are unexplored. This paper provides a theoretical analysis of this MAP estimator in generalized linear models, focusing on logistic regression. We first establish the existence and stability, even in high dimensions. We then prove the consistency when the dimension of covariates diverges. Furthermore, we use the convex Gaussian min-max theorem to characterize the asymptotic behavior of the MAP estimator when the dimension grows linearly with the sample size. Our theory clarifies the role of the tuning parameters and provides practical guidance, particularly for high-dimensional inference tasks such as constructing confidence intervals and performing variable selection. We demonstrate the effectiveness of our methods on simulations and real-world data. Our work provides a theoretically justified framework for enhancing statistical inference using synthetic data.
- [23] arXiv:2410.16004 (replaced) [pdf, html, other]
-
Title: Are Bayesian networks typically faithful?Subjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
Faithfulness is a ubiquitous assumption in causal inference, often motivated by the fact that the faithful parameters of linear Gaussian and discrete Bayesian networks are typical, and the folklore belief that this should also hold for other classes of Bayesian networks. We address this open question by showing that among all Bayesian networks over a given DAG, the faithful Bayesian networks are indeed `typical': they constitute a dense, open set with respect to the total variation metric. However, this does not imply that faithfulness is typical in restricted classes of Bayesian networks, as are often considered in statistical applications. To this end we consider the class of Bayesian networks parametrised by conditional exponential families, for which we show that under mild regularity conditions, the faithful parameters constitute a dense, open set and the unfaithful parameters have Lebesgue measure zero, extending the existing results for linear Gaussian and discrete Bayesian networks. Finally, we show that the aforementioned results also hold for Bayesian networks with latent variables.
- [24] arXiv:2501.07571 (replaced) [pdf, html, other]
-
Title: Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networksSubjects: Statistics Theory (math.ST)
The topic of nonparametric estimation of smooth boundaries is extensively studied in the conventional setting where pairs of single covariate and response variable are observed. However, this traditional setting often suffers from the cost of data collection. Recent years have witnessed the consistent development of learning algorithms for binary classification problems where one can instead observe paired covariates and binary variable representing the statistical relationship between the covariates. In this work, we theoretically study the question of whether multiple smooth boundaries are learnable if the pairwise binary classification setting is considered. We investigate the question with the statistical dependence of paired covariates to develop a learning algorithm using vector-valued functions. The main theorem shows that there is an empirical risk minimization algorithm in a class of deep ReLU networks such that it produces a consistent estimator for indicator functions defined with smooth boundaries. We also discuss how the pairwise binary classification setting is different from the conventional settings, focusing on the structural condition of function classes. As a by-product, we apply the main theorem to a multiclass nonparametric classification problem where the estimation performance is measured by the excess risk in terms of misclassification.
- [25] arXiv:2206.14275 (replaced) [pdf, other]
-
Title: Dynamic CoVaR Modeling and EstimationSubjects: Econometrics (econ.EM); Statistics Theory (math.ST); Risk Management (q-fin.RM); Methodology (stat.ME)
The popular systemic risk measure CoVaR (conditional Value-at-Risk) and its variants are widely used in economics and finance. In this article, we propose joint dynamic forecasting models for the Value-at-Risk (VaR) and CoVaR. The CoVaR version we consider is defined as a large quantile of one variable (e.g., losses in the financial system) conditional on some other variable (e.g., losses in a bank's shares) being in distress. We introduce a two-step M-estimator for the model parameters drawing on recently proposed bivariate scoring functions for the pair (VaR, CoVaR). We prove consistency and asymptotic normality of our parameter estimator and analyze its finite-sample properties in simulations. Finally, we apply a specific subclass of our dynamic forecasting models, which we call CoCAViaR models, to log-returns of large US banks. A formal forecast comparison shows that our CoCAViaR models generate CoVaR predictions which are superior to forecasts issued from current benchmark models.
- [26] arXiv:2401.07876 (replaced) [pdf, html, other]
-
Title: Characterization of the asymptotic behavior of $U$-statistics on row-column exchangeable matricesSubjects: Probability (math.PR); Statistics Theory (math.ST)
We consider $U$-statistics on row-column exchangeable matrices. We present a new decomposition based on orthogonal projections onto probability spaces generated by sets of Aldous-Hoover-Kallenberg variables. These sets are indexed by bipartite graphs, enabling the application of graph-theoretic concepts to describe the decomposition. This framework provides new insights into the characterization of $U$-statistics on row-column exchangeable matrices, particularly their asymptotic behavior, including in degenerate cases. Notably, the limit distribution depends only on specific terms in the decomposition, corresponding to non-zero components indexed by the smallest graphs, namely the principal support graphs. We show that the asymptotic behavior of a $U$-statistic is characterized by the properties of its principal support graphs. The number of nodes in these graphs dictates the convergence rate to the limit distribution, with degeneracy occurring if and only if this number is strictly greater than 1. Furthermore, when the principal support graphs are connected, the limit distribution is Gaussian, even in degenerate cases. Applications to network analysis illustrate these findings.
- [27] arXiv:2405.09003 (replaced) [pdf, other]
-
Title: Nonparametric Inference on Dose-Response Curves Without the Positivity ConditionComments: Substantial revision with some corrected identification conditions, improved convergence rates, and added experiments. The updated version has 80 pages (27 pages for the main paper), 5 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
Existing statistical methods in causal inference often assume the positivity condition, where every individual has some chance of receiving any treatment level regardless of covariates. This assumption could be violated in observational studies with continuous treatments. In this paper, we develop identification and estimation theories for causal effects with continuous treatments (i.e., dose-response curves) without relying on the positivity condition. Our approach identifies and estimates the derivative of the treatment effect for each observed sample, integrating it to the treatment level of interest to mitigate bias from the lack of positivity. The method is grounded in a weaker assumption, satisfied by additive confounding models. We propose a fast and reliable numerical recipe for computing our integral estimator in practice and derive its asymptotic properties. To enable valid inference on the dose-response curve and its derivative, we use the nonparametric bootstrap and establish its consistency. The performances of our proposed estimators are validated through simulation studies and an analysis of the effect of air pollution exposure (PM$_{2.5}$) on cardiovascular mortality rates.
- [28] arXiv:2410.24163 (replaced) [pdf, html, other]
-
Title: Improve the Precision of Area Under the Curve Estimation for Recurrent Events Through Covariate AdjustmentSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
The area under the curve (AUC) of the mean cumulative function (MCF) has recently been introduced as a novel estimand for evaluating treatment effects in recurrent event settings, capturing a totality of evidence in relation to disease progression. While the Lin-Wei-Yang-Ying (LWYY) model is commonly used for analyzing recurrent events, it relies on the proportional rate assumption between treatment arms, which might be violated in practice. In contrast, the AUC under MCFs does not depend on such proportionality assumptions and offers a clinically interpretable measure of treatment effect. To improve the precision of the AUC estimation while preserving its unconditional interpretability, we propose a nonparametric covariate adjustment approach. This approach guarantees efficiency gain compared to unadjusted analysis, as demonstrated by theoretical asymptotic distributions, and is universally applicable to various randomization schemes, including both simple and covariate-adaptive designs. Extensive simulations across different scenarios further support its advantage in increasing statistical power. Our findings highlight the importance of covariate adjustment for the analysis of AUC in recurrent event settings, offering practical guidance for its application in randomized clinical trials.
- [29] arXiv:2411.17109 (replaced) [pdf, html, other]
-
Title: On the maximal correlation of some stochastic processesSubjects: Probability (math.PR); Statistics Theory (math.ST)
We consider the maximal correlation coefficient $R(X,Y)$ between two stochastic processes $X$ and $Y$. When $(X,Y)$ is a random walk, the result is a consequence of Csáki-Fischer identity and lower-semi continuity of $\text{Law}(X,Y)\to R(X,Y)$. When $(X,Y)$ are two-dimensional Lévy processes, we give an expression of $R(X,Y)$ via the covariance $\Sigma$ and the Lévy measure $\nu$ appeared in the Lévy-Khinchine formula. As a consequence, for two-dimensional $\alpha$-stable random variables $(X,Y)$ with $0<\alpha<2$, we give an expression of $R(X,Y)$ via the stability index $\alpha$ and the spectral measure $\tau$ of the $\alpha$-stable distribution. We also prove analogs and generalizations of Dembo-Kagan-Shepp-Yu inequality and Madiman-Barron inequality. Roughly speaking, we investigate the maximal correlation coefficient between two random selected "subvectors" $Y$ and $Z$ of a common random vector $X$. Besides, by using above new results, we recover several classical results.
- [30] arXiv:2412.17779 (replaced) [pdf, html, other]
-
Title: Ergodic Network Stochastic Differential EquationsSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
We propose a novel framework for Network Stochastic Differential Equations (N-SDE), where each node in a network is governed by an SDE influenced by interactions with its neighbors. The evolution of each node is driven by the interplay of three key components: the node's intrinsic dynamics (\emph{momentum effect}), feedback from neighboring nodes (\emph{network effect}), and a \emph{stochastic volatility} term modeled by Brownian motion. Our primary objective is to estimate the parameters of the N-SDE system from high-frequency discrete-time observations. The motivation behind this model lies in its ability to analyze very high-dimensional time series by leveraging the inherent sparsity of the underlying network graph. We consider two distinct scenarios: \textit{i) known network structure}: the graph is fully specified, and we establish conditions under which the parameters can be identified, considering the linear growth of the parameter space with the number of edges. \textit{ii) unknown network structure}: the graph must be inferred from the data. For this, we develop an iterative procedure using adaptive Lasso, tailored to a specific subclass of N-SDE models. In this work, we assume the network graph is oriented, paving the way for novel applications of SDEs in causal inference, enabling the study of cause-effect relationships in dynamic systems. Through extensive simulation studies, we demonstrate the performance of our estimators across various graph topologies in high-dimensional settings. We also showcase the framework's applicability to real-world datasets, highlighting its potential for advancing the analysis of complex networked systems.
- [31] arXiv:2501.08330 (replaced) [pdf, html, other]
-
Title: Gradient Equilibrium in Online Learning: Theory and ApplicationsComments: Code available at this https URLSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by nor implies sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.