Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer

Khatri, Kisan; Levy, Ronald M.; Haldane, Allan

Physics > Biological Physics

arXiv:2503.00289 (physics)

[Submitted on 1 Mar 2025]

Title:Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer

Authors:Kisan Khatri, Ronald M. Levy, Allan Haldane

View PDF HTML (experimental)

Abstract:Recent generative learning models applied to protein multiple sequence alignment (MSA) datasets include simple and interpretable physics-based Potts covariation models and other machine learning models such as MSA-Transformer (MSA-T). The best models accurately reproduce MSA statistics induced by the biophysical constraints within proteins, raising the question of which functional forms best model the underlying physics. The Potts model is usually specified by an effective potential including pairwise residue-residue interaction terms, but it has been suggested that MSA-T can capture the effects induced by effective potentials which include more than pairwise interactions and implicitly account for phylogenetic structure in the MSA. Here we compare the ability of the Potts model and MSA-T to reconstruct higher-order sequence statistics reflecting complex biological sequence constraints. We find that the model performance depends greatly on the treatment of phylogenetic relationships between the sequences, which can induce non-biophysical mutational covariation in MSAs. When using explicit corrections for phylogenetic dependencies, we find the Potts model outperforms MSA-T in detecting epistatic interactions of biophysical origin.

Comments:	7 pages, 5 figures, Also presented in BPS2025 Annual Meeting, Los Angeles, California
Subjects:	Biological Physics (physics.bio-ph); Populations and Evolution (q-bio.PE)
Cite as:	arXiv:2503.00289 [physics.bio-ph]
	(or arXiv:2503.00289v1 [physics.bio-ph] for this version)
	https://doi.org/10.48550/arXiv.2503.00289

Submission history

From: Kisan Khatri [view email]
[v1] Sat, 1 Mar 2025 01:43:49 UTC (94 KB)

Physics > Biological Physics

Title:Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Biological Physics

Title:Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators