Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

Kassani, Peyman H.; Lu, Fred; Guen, Yann Le; He, Zihuai

Computer Science > Machine Learning

arXiv:2109.14719 (cs)

[Submitted on 29 Sep 2021]

Title:Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

Authors:Peyman H. Kassani, Fred Lu, Yann Le Guen, Zihuai He

View PDF

Abstract:Deep neural networks (DNN) have been used successfully in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. In this paper, we consider the problem of scalable, robust variable selection in DNN for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNN due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control false discovery rate; (3) hierarchical layers to substantially reduce the number of weight parameters and activations to improve computational efficiency; (4) de-randomized feature selection to stabilize identified signals. We evaluated the proposed method in extensive simulation studies and applied it to the analysis of Alzheimer disease genetics. We showed that the proposed method, when compared to conventional linear and nonlinear methods, can lead to substantially more discoveries.

Subjects:	Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:2109.14719 [cs.LG]
	(or arXiv:2109.14719v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.14719

Submission history

From: Peyman Hosseinzadeh Kassani [view email]
[v1] Wed, 29 Sep 2021 20:57:48 UTC (895 KB)

Computer Science > Machine Learning

Title:Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators