Quantitative Biology
See recent articles
Showing new listings for Thursday, 6 March 2025
- [1] arXiv:2503.02981 [pdf, other]
-
Title: Modeling Iodine DeficiencyComments: 46 pages, 8 figures,Subjects: Tissues and Organs (q-bio.TO); Biomolecules (q-bio.BM)
This paper presents a four-unit, four-component mathematical model of iodine metabolism and its impact on thyroid hormone levels in the body. We focus on the relationships between iodine (I-), triiodothyronine (T3), thyroxine (T4), and thyroid-stimulating hormone (TSH) through the mixer, thyroid, sensor (pituitary gland), and metabolism. Iodine plays a fundamental role in maintaining metabolic homeostasis, as it is essential for the synthesis of T3 and T4, which regulate weight, energy, and other physiological functions. Iodine deficiency, which is one of the most common nutrient deficiencies in the world, can lead to hypothyroidism, a condition characterized by fatigue, weight gain, and cognitive impairments [1].
Our model tracks the movement of iodine through dietary intake, thyroid absorption, hormone synthesis, feedback regulation via TSH, and deiodization in metabolism. By evaluating different forms of the accounting equation governing these processes, we have concluded three key results. 1) Iodide availability directly impacts levels of T3 and T4 production, with both component flow rates declining at day 70 in a mildly diseased state, and day 60 in a severely diseased state. 2) TSH is an early diagnostic indicator of thyroid issues, with TSH levels rising to magnitudes of 25x in just ten days, nearly six times quicker than other biological indicators. 3) There is a 5 day difference in iodine storage depletion between mild and severe iodide deficiency. Understanding these results quantitatively provides insights into thyroid disorders and informs strategies for managing iodine deficiency on both individual and public health levels. - [2] arXiv:2503.02997 [pdf, html, other]
-
Title: Enabling Fast, Accurate, and Efficient Real-Time Genome Analysis via New Algorithms and TechniquesComments: PhD Thesis submitted to ETH ZurichSubjects: Genomics (q-bio.GN); Hardware Architecture (cs.AR); Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET)
The advent of high-throughput sequencing technologies has revolutionized genome analysis by enabling the rapid and cost-effective sequencing of large genomes. Despite these advancements, the increasing complexity and volume of genomic data present significant challenges related to accuracy, scalability, and computational efficiency. These challenges are mainly due to various forms of unwanted and unhandled variations in sequencing data, collectively referred to as noise. In this dissertation, we address these challenges by providing a deep understanding of different types of noise in genomic data and developing techniques to mitigate the impact of noise on genome analysis.
First, we introduce BLEND, a noise-tolerant hashing mechanism that quickly identifies both exactly matching and highly similar sequences with arbitrary differences using a single lookup of their hash values. Second, to enable scalable and accurate analysis of noisy raw nanopore signals, we propose RawHash, a novel mechanism that effectively reduces noise in raw nanopore signals and enables accurate, real-time analysis by proposing the first hash-based similarity search technique for raw nanopore signals. Third, we extend the capabilities of RawHash with RawHash2, an improved mechanism that 1) provides a better understanding of noise in raw nanopore signals to reduce it more effectively and 2) improves the robustness of mapping decisions. Fourth, we explore the broader implications and new applications of raw nanopore signal analysis by introducing Rawsamble, the first mechanism for all-vs-all overlapping of raw signals using hash-based search. Rawsamble enables the construction of de novo assemblies directly from raw signals without basecalling, which opens up new directions and uses for raw nanopore signal analysis. - [3] arXiv:2503.03001 [pdf, html, other]
-
Title: Multicellular self-organization in Escherichia coliComments: 19 pages, 5 figuresSubjects: Cell Behavior (q-bio.CB)
Escherichia coli has long been a trusty companion, maintaining health in our guts and advancing biological knowledge in the laboratory. In light of recent findings, we discuss multicellular self-organization in E. coli and develop general ideas for multicellularity, including the necessity for multicellular dynamics and interpretation by dynamic graphs, applicable to both unicellular and multicellular organisms. In this context, we next discuss the documented behaviors of E. coli self-organization (rosette formation, multicellular extension, and attached dormancy) and two potential behaviors (internal communication and mating). Finally, by comparing the dynamic graphs for different communities, we develop principles relevant to the theory of multicellularity.
- [4] arXiv:2503.03131 [pdf, html, other]
-
Title: Spatially-Structured Models of Viral Dynamics: A Scoping ReviewSubjects: Quantitative Methods (q-bio.QM)
There is growing recognition in both the experimental and modelling literature of the importance of spatial structure to the dynamics of viral infections in tissues. Aided by the evolution of computing power and motivated by recent biological insights, there has been an explosion of new, spatially-explicit models for within-host viral dynamics in recent years. This development has only been accelerated in the wake of the COVID-19 pandemic. Spatially-structured models offer improved biological realism and can account for dynamics which cannot be well-described by conventional, mean-field approaches. However, despite their growing popularity, spatially-structured models of viral dynamics are underused in biological applications. One major obstacle to the wider application of such models is the huge variety in approaches taken, with little consensus as to which features should be included and how they should be implemented for a given biological context. Previous reviews of the field have focused on specific modelling frameworks or on models for particular viral species. Here, we instead apply a scoping review approach to the literature of spatially-structured viral dynamics models as a whole to provide an exhaustive update of the state of the field. Our analysis is structured along two axes, methodology and viral species, in order to examine the breadth of techniques used and the requirements of different biological applications. We then discuss the contributions of mathematical and computational modelling to our understanding of key spatially-structured aspects of viral dynamics, and suggest key themes for future model development to improve robustness and biological utility.
- [5] arXiv:2503.03310 [pdf, html, other]
-
Title: Optimal virulence strategies in epidemiological models with asymptomatic transmissionSubjects: Populations and Evolution (q-bio.PE)
Asymptomatic infection has gained notoriety as an important feature of infectious disease dynamics. Despite increasing attention, there have been few rigorous examinations of how asymptomatic transmission influences pathogen evolution. In this study, we apply evolutionary invasion analysis to compute optimal strategies for viruses evolving in a system with a distinct asymptomatic transmission stage. We ask how pathogens would evolve under three conditions: with an increase in the mean infectious period in the symptomatic state, with an increase in the mean infectious period in the asymptomatic stage, and an increase in proportion proceeding through the ``mild recovery route" (where the symptomatic state was bypassed entirely). We find that an increased proportion of cases moving through a ``mild recovery route" -- which can occur with different host susceptibility or increased public health intervention -- leads to a model structure in which mutant pathogens are transmitted largely through the asymptomatic route, with slightly increased evolved virulence levels. In addition, we find that an increase in the mean infectious period of the symptomatic state has a small overall influence on the fitness of the pathogen, when effective transmission can occur via the asymptomatic route. Further, we find that virulence levels change very slightly for both the asymptomatic and symptomatic populations. In sum, our results highlight the evolutionary implications of variation in host susceptibility and public health interventions in the context of asymptomatic transmission. More generally, the findings speak to the need for more nuanced interrogations of subtle routes of transmission, as they can have profound implications in disease evolution, ecology, and epidemiology.
- [6] arXiv:2503.03503 [pdf, html, other]
-
Title: Collaborative Expert LLMs Guided Multi-Objective Molecular OptimizationSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Molecular optimization is a crucial yet complex and time-intensive process that often acts as a bottleneck for drug development. Traditional methods rely heavily on trial and error, making multi-objective optimization both time-consuming and resource-intensive. Current AI-based methods have shown limited success in handling multi-objective optimization tasks, hampering their practical utilization. To address this challenge, we present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization. MultiMol comprises two agents, including a data-driven worker agent and a literature-guided research agent. The data-driven worker agent is a large language model being fine-tuned to learn how to generate optimized molecules considering multiple objectives, while the literature-guided research agent is responsible for searching task-related literature to find useful prior knowledge that facilitates identifying the most promising optimized candidates. In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate, in sharp contrast to the 27.50% success rate of current strongest methods. To further validate its practical impact, we tested MultiMol on two real-world challenges. First, we enhanced the selectivity of Xanthine Amine Congener (XAC), a promiscuous ligand that binds both A1R and A2AR, successfully biasing it towards A1R. Second, we improved the bioavailability of Saquinavir, an HIV-1 protease inhibitor with known bioavailability limitations. Overall, these results indicate that MultiMol represents a highly promising approach for multi-objective molecular optimization, holding great potential to accelerate the drug development process and contribute to the advancement of pharmaceutical research.
New submissions (showing 6 of 6 entries)
- [7] arXiv:2503.02923 (cross-list from physics.bio-ph) [pdf, other]
-
Title: Electron spin dynamics guide cell motilityKai Wang, Gabrielle Gilmer, Matheus Candia Arana, Hirotaka Iijima, Juliana Bergmann, Antonio Woollard, Boris Mesits, Meghan McGraw, Brian Zoltowski, Paola Cappellaro, Alex Ungar, David Pekker, David H. Waldeck, Sunil Saxena, Seth Lloyd, Fabrisia AmbrosioComments: Article with supplementary materialSubjects: Biological Physics (physics.bio-ph); Cell Behavior (q-bio.CB); Quantum Physics (quant-ph)
Diverse organisms exploit the geomagnetic field (GMF) for migration. Migrating birds employ an intrinsically quantum mechanical mechanism for detecting the geomagnetic field: absorption of a blue photon generates a radical pair whose two electrons precess at different rates in the magnetic field, thereby sensitizing cells to the direction of the GMF. In this work, using an in vitro injury model, we discovered a quantum-based mechanism of cellular migration. Specifically, we show that migrating cells detect the GMF via an optically activated, electron spin-based mechanism. Cell injury provokes acute emission of blue photons, and these photons sensitize muscle progenitor cells to the magnetic field. We show that the magnetosensitivity of muscle progenitor cells is (a) activated by blue light, but not by green or red light, and (b) disrupted by the application of an oscillatory field at the frequency corresponding to the energy of the electron-spin/magnetic field interaction. A comprehensive analysis of protein expression reveals that the ability of blue photons to promote cell motility is mediated by activation of calmodulin calcium sensors. Collectively, these data suggest that cells possess a light-dependent magnetic compass driven by electron spin dynamics.
- [8] arXiv:2503.03126 (cross-list from physics.bio-ph) [pdf, html, other]
-
Title: Controlling tissue size by active fractureComments: 6+10 pages, 3+4 figures, +1 tableSubjects: Biological Physics (physics.bio-ph); Cell Behavior (q-bio.CB); Quantitative Methods (q-bio.QM); Tissues and Organs (q-bio.TO)
Groups of cells, including clusters of cancerous cells, multicellular organisms, and developing organs, may both grow and break apart. What physical factors control these fractures? In these processes, what sets the eventual size of clusters? We develop a framework for understanding cell clusters that can fragment due to cell motility using an active particle model. We compute analytically how the break rate of cell-cell junctions depends on cell speed, cell persistence, and cell-cell junction properties. Next, we find the cluster size distributions, which differ depending on whether all cells can divide or only the cells on the edge of the cluster divide. Cluster size distributions depend solely on the ratio of the break rate to the growth rate - allowing us to predict how cluster size and variability depend on cell motility and cell-cell mechanics. Our results suggest that organisms can achieve better size control when cell division is restricted to the cluster boundaries or when fracture can be localized to the cluster center. Our results link the general physics problem of a collective active escape over a barrier to size control, providing a quantitative measure of how motility can regulate organ or organism size.
- [9] arXiv:2503.03152 (cross-list from eess.IV) [pdf, html, other]
-
Title: UnPuzzle: A Unified Framework for Pathology Image AnalysisDankai Liao, Sicheng Chen, Nuwa Xi, Qiaochu Xue, Jieyu Li, Lingxuan Hou, Zeyu Liu, Chang Han Low, Yufeng Wu, Yiling Liu, Yanqin Jiang, Dandan Li, Yueming Jin, Shangqing LyuComments: 11 pages,2 figuresSubjects: Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
Pathology image analysis plays a pivotal role in medical diagnosis, with deep learning techniques significantly advancing diagnostic accuracy and research. While numerous studies have been conducted to address specific pathological tasks, the lack of standardization in pre-processing methods and model/database architectures complicates fair comparisons across different approaches. This highlights the need for a unified pipeline and comprehensive benchmarks to enable consistent evaluation and accelerate research progress. In this paper, we present UnPuzzle, a novel and unified framework for pathological AI research that covers a broad range of pathology tasks with benchmark results. From high-level to low-level, upstream to downstream tasks, UnPuzzle offers a modular pipeline that encompasses data pre-processing, model composition,taskconfiguration,this http URL, it facilitates efficient benchmarking for both Whole Slide Images (WSIs) and Region of Interest (ROI) tasks. Moreover, the framework supports variouslearningparadigms,includingself-supervisedlearning,multi-task learning,andmulti-modallearning,enablingcomprehensivedevelopment of pathology AI models. Through extensive benchmarking across multiple datasets, we demonstrate the effectiveness of UnPuzzle in streamlining pathology AI research and promoting reproducibility. We envision UnPuzzle as a cornerstone for future advancements in pathology AI, providing a more accessible, transparent, and standardized approach to model evaluation. The UnPuzzle repository is publicly available at this https URL.
- [10] arXiv:2503.03199 (cross-list from eess.IV) [pdf, html, other]
-
Title: PathRWKV: Enabling Whole Slide Prediction with Recurrent-TransformerSicheng Chen, Tianyi Zhang, Dankai Liao, Dandan Li, Low Chang Han, Yanqin Jiang, Yueming Jin, Shangqing LyuComments: 11 pages, 2 figuresSubjects: Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
Pathological diagnosis plays a critical role in clinical practice, where the whole slide images (WSIs) are widely applied. Through a two-stage paradigm, recent deep learning approaches enhance the WSI analysis with tile-level feature extracting and slide-level feature modeling. Current Transformer models achieved improvement in the efficiency and accuracy to previous multiple instance learning based approaches. However, three core limitations persist, as they do not: (1) robustly address the modeling on variable scales for different slides, (2) effectively balance model complexity and data availability, and (3) balance training efficiency and inference performance. To explicitly address them, we propose a novel model for slide modeling, PathRWKV. Via a recurrent structure, we enable the model for dynamic perceptible tiles in slide-level modeling, which novelly enables the prediction on all tiles in the inference stage. Moreover, we employ linear attention instead of conventional matrix multiplication attention to reduce model complexity and overfitting problem. Lastly, we hinge multi-task learning to enable modeling on versatile tasks simultaneously, improving training efficiency, and asynchronous structure design to draw an effective conclusion on all tiles during inference, enhancing inference performance. Experimental results suggest that PathRWKV outperforms the current state-of-the-art methods in various downstream tasks on multiple datasets. The code and datasets are publicly available.
- [11] arXiv:2503.03246 (cross-list from physics.chem-ph) [pdf, html, other]
-
Title: Time-dependent DFT-based study of bacteriochlorophyll a optical properties within the B800 part of Rhodoblastus acidophilus light-harvesting complexSubjects: Chemical Physics (physics.chem-ph); Optics (physics.optics); Biomolecules (q-bio.BM)
We use time-dependent density functional theory-based approaches, TD-DFT and TD-DFTB, to investigate the optical absorption of B800 part of Rhodoblastus acidophilus light-harvesting complex 2 (LH2). Both methods are shown to give qualitative agreement with experimental spectra for a single BChl a molecule and for the optimized structure of B800 complex containing nine of such molecules. We proved the absence of any sizable effects originating from the interaction between adjacent molecules, thus optical features of B800 LH2 part should not be attributed to the structural organization of pigments. In addition, time-dependent procedure itself was found to be crucial for the correct description of BChl a absorption spectrum.
- [12] arXiv:2503.03485 (cross-list from cs.LG) [pdf, html, other]
-
Title: TEDDY: A Family Of Foundation Models For Understanding Single Cell BiologyAlexis Chevalier, Soumya Ghosh, Urvi Awasthi, James Watkins, Julia Bieniewska, Nichita Mitrea, Olga Kotova, Kirill Shkura, Andrew Noble, Michael Steinbaugh, Julien Delile, Christoph Meier, Leonid Zhukov, Iya Khalil, Srayanta Mukherjee, Judith MuellerSubjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Understanding the biological mechanism of disease is critical for medicine, and in particular drug discovery. AI-powered analysis of genome-scale biological data hold great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models either do not improve or only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving the state-of-the-art. First, we scaled the pre-training dataset to 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the TEDDY family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on two downstream evaluation tasks -- identifying the underlying disease state of held-out donors not seen during training and distinguishing healthy cells from diseased ones for disease conditions and donors not seen during training. Scaling experiments showed that performance improved predictably with both data volume and parameter count. Our models showed substantial improvement over existing work on the first task and more muted improvements on the second.
- [13] arXiv:2503.03540 (cross-list from math.DS) [pdf, html, other]
-
Title: An SIRS model with hospitalizations: economic impact by disease severitySubjects: Dynamical Systems (math.DS); Populations and Evolution (q-bio.PE)
We introduce a two-timescale SIRS-type model in which a fraction $\theta$ of infected individuals experiences a severe course of the disease, requiring hospitalization. During hospitalization, these individuals do not contribute to further infections. We analyze the model's equilibria, perform a bifurcation analysis, and explore its two-timescale nature (using techniques from Geometric Singular Perturbation Theory). Our main result provides an explicit expression for the value of $\theta$ that maximizes the total number of hospitalized individuals for long times, revealing that this optimal fraction can be lower than 1. This highlights the interesting effect that a severe disease, by necessitating widespread hospitalization, can indirectly suppress contagions and, consequently, reduce hospitalizations. Numerical simulations illustrate the growth in the number of hospitalizations for short times. The model can also be interpreted as a scenario where only a fraction $\theta$ of infected individuals develops symptoms and self-quarantines.
- [14] arXiv:2503.03688 (cross-list from physics.bio-ph) [pdf, html, other]
-
Title: A model for boundary-driven tissue morphogenesisDaniel S. Alber, Shiheng Zhao, Alexandre O. Jacinto, Eric F. Wieschaus, Stanislav Y. Shvartsman, Pierre A. HaasComments: 18 pages, 9 figures, supplemental movie available on requestSubjects: Biological Physics (physics.bio-ph); Soft Condensed Matter (cond-mat.soft); Tissues and Organs (q-bio.TO)
Tissue deformations during morphogenesis can be active, driven by internal processes, or passive, resulting from stresses applied at their boundaries. Here, we introduce the Drosophila hindgut primordium as a model for studying boundary-driven tissue morphogenesis. We characterize its deformations and show that its complex shape changes can be a passive consequence of the deformations of the active regions of the embryo that surround it. First, we find an intermediate characteristic triangular shape in the 3D deformations of the hindgut. We construct a minimal model of the hindgut primordium as an elastic ring deformed by active midgut invagination and germ band extension on an ellipsoidal surface, which robustly captures the symmetry-breaking into this triangular shape. We then quantify the 3D kinematics of the tissue by a set of contours and discover that the hindgut deforms in two stages: an initial translation on the curved embryo surface followed by a rapid breaking of shape symmetry. We extend our model to show that the contour kinematics in both stages are consistent with our passive picture. Our results suggest that the role of in-plane deformations during hindgut morphogenesis is to translate the tissue to a region with anisotropic embryonic curvature and show that uniform boundary conditions are sufficient to generate the observed nonuniform shape change. Our work thus provides a possible explanation for the various characteristic shapes of blastopore-equivalents in different organisms and a framework for the mechanical emergence of global morphologies in complex developmental systems.
Cross submissions (showing 8 of 8 entries)
- [15] arXiv:2309.00061 (replaced) [pdf, other]
-
Title: GeneFEAST: the pivotal, gene-centric step in functional enrichment analysis interpretationComments: This article has been accepted for publication in Bioiformatics Published by Oxford University Press. This version has been peer-reviewed, is the Version of Record, and replaces the previous version deposited here. Main text: 5 pages, 2 figures. Supplementary information is available at Bioinformatics onlineJournal-ref: Bioinformatics, 2025, btaf100.Subjects: Quantitative Methods (q-bio.QM)
Summary: GeneFEAST, implemented in Python, is a gene-centric functional enrichment analysis summarisation and visualisation tool that can be applied to large functional enrichment analysis (FEA) results arising from upstream FEA pipelines. It produces a systematic, navigable HTML report, making it easy to identify sets of genes putatively driving multiple enrichments and to explore gene-level quantitative data first used to identify input genes. Further, GeneFEAST can compare FEA results from multiple studies, making it possible, for example, to highlight patterns of gene expression amongst genes commonly differentially expressed in two sets of conditions, and giving rise to shared enrichments under those conditions. GeneFEAST offers a novel, effective way to address the complexities of linking up many overlapping FEA results to their underlying genes and data, advancing gene-centric hypotheses, and providing pivotal information for downstream validation experiments.
Availability: GeneFEAST is available at this https URL
Contact: this http URL@well.this http URL - [16] arXiv:2406.13839 (replaced) [pdf, html, other]
-
Title: RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone DesignRishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon V. Mathis, Kieran Didi, Bryan Hooi, Pietro LiòComments: Oral presentation at Machine Learning in Computational Biology (MLCB), 2024. Also presented as an Oral at ICML 2024 Structured Probabilistic Inference & Generative Modeling Workshop, and a Spotlight at ICML 2024 AI4Science WorkshopSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Genomics (q-bio.GN)
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: this https URL
- [17] arXiv:2408.07618 (replaced) [pdf, html, other]
-
Title: Accounting for the geometry of the respiratory tract in viral infectionsSubjects: Quantitative Methods (q-bio.QM)
Increasingly, experimentalists and modellers alike have come to recognise the important role of spatial structure in infection dynamics. Almost invariably, spatial computational models of viral infections - as with in vitro experimental systems - represent the tissue as wide and flat, which is often assumed to be representative of entire affected tissue within the host. However, this assumption fails to take into account the distinctive geometry of the respiratory tract in the context of viral infections. The respiratory tract is characterised by a tubular, branching structure, and moreover is spatially heterogeneous: deeper regions of the lung are composed of far narrower airways and are associated with more severe infection. Here, we extend a typical multicellular model of viral dynamics to account for two essential features of the geometry of the respiratory tract: the tubular structure of airways, and the branching process between airway generations. We show that, with this more realistic tissue geometry, the dynamics of infection are substantially changed compared to standard computational and experimental approaches, and that the resulting model is equipped to tackle important biological phenomena that do not arise in a flat host tissue, including viral lineage dynamics, and heterogeneity in immune responses to infection in different regions of the respiratory tree. Our findings suggest aspects of viral dynamics which current in vitro systems may be insufficient to describe, and point to several features of respiratory infections which can be experimentally assessed.
- [18] arXiv:2411.15684 (replaced) [pdf, html, other]
-
Title: Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide SequencingSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established system DeepNovo-DIA by from 25% to 81%, averaging 48%, for amino acid recall, and by from 27% to 89%, averaging 57%, for peptide recall, by equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.
- [19] arXiv:2412.20245 (replaced) [pdf, other]
-
Title: Machine Learning-Enabled Multidimensional Data Utilization Through Multi-Resonance Architecture: A Pathway to Enhanced Accuracy in BiosensingComments: 37 pagesSubjects: Quantitative Methods (q-bio.QM); Signal Processing (eess.SP)
A novel framework is proposed that combines multi-resonance biosensors with machine learning (ML) to significantly enhance the accuracy of parameter prediction in biosensing. Unlike traditional single-resonance systems, which are limited to one-dimensional datasets, this approach leverages multi-dimensional data generated by a custom-designed nanostructure, a periodic array of silicon nanorods with a triangular cross-section over an aluminum reflector. High bulk sensitivity values are achieved for this multi-resonant structure, with certain resonant peaks reaching up to 1706 nm/RIU. The field analysis reveals Mie resonances as the physical reason behind the peaks. The predictive power of multiple resonant peaks from transverse magnetic (TM) and transverse electric (TE) polarizations is evaluated using Ridge Regression modeling. Systematic analysis reveals that incorporating multiple resonances yields up to three orders of magnitude improvement in refractive index detection precision compared to single-peak analyses. This precision enhancement is achieved without modifications to the biosensor hardware, highlighting the potential of data-centric strategies in biosensing. The findings establish a new paradigm in biosensing, demonstrating that the synergy between multi-resonance data acquisition and ML-based analysis can significantly enhance detection accuracy. This study provides a scalable pathway for advancing high-precision biosensing technologies.
- [20] arXiv:2312.10892 (replaced) [pdf, html, other]
-
Title: Deep Learning-based MRI Reconstruction with Artificial Fourier Transform Network (AFTNet)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
Deep complex-valued neural networks (CVNNs) provide a powerful way to leverage complex number operations and representations and have succeeded in several phase-based applications. However, previous networks have not fully explored the impact of complex-valued networks in the frequency domain. Here, we introduce a unified complex-valued deep learning framework-Artificial Fourier Transform Network (AFTNet)-which combines domain-manifold learning and CVNNs. AFTNet can be readily used to solve image inverse problems in domain transformation, especially for accelerated magnetic resonance imaging (MRI) reconstruction and other applications. While conventional methods typically utilize magnitude images or treat the real and imaginary components of k-space data as separate channels, our approach directly processes raw k-space data in the frequency domain, utilizing complex-valued operations. This allows for a mapping between the frequency (k-space) and image domain to be determined through cross-domain learning. We show that AFTNet achieves superior accelerated MRI reconstruction compared to existing approaches. Furthermore, our approach can be applied to various tasks, such as denoised magnetic resonance spectroscopy (MRS) reconstruction and datasets with various contrasts. The AFTNet presented here is a valuable preprocessing component for different preclinical studies and provides an innovative alternative for solving inverse problems in imaging and spectroscopy. The code is available at: this https URL.
- [21] arXiv:2406.09983 (replaced) [pdf, html, other]
-
Title: Epidemic-induced local awareness behavior inferred from surveys and genetic sequence dataSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI); Populations and Evolution (q-bio.PE)
Behavior-disease models suggest that pandemics can be contained cost-effectively if individuals take preventive actions when disease prevalence rises among their close contacts. However, assessing local awareness behavior in real-world datasets remains a challenge. Through the analysis of mutation patterns in clinical genetic sequence data, we propose an efficient approach to quantify the impact of local awareness by identifying superspreading events and assigning containment scores to them.
We validate the proposed containment score as a proxy for local awareness in simulation experiments, and find that it was correlated positively with policy stringency during the COVID-19 pandemic. Finally, we observe a temporary drop in the containment score during the Omicron wave in the United Kingdom, matching a survey experiment we carried out in Hungary during the corresponding period of the pandemic. Our findings bring important insight into the field of awareness modeling through the analysis of large-scale genetic sequence data, one of the most promising data sources in epidemics research. - [22] arXiv:2410.05327 (replaced) [pdf, html, other]
-
Title: Investigating the Trade-off between Infections and Social Interactions Using a Compact Model of Endemic Infections on NetworksComments: 19 pages; 7 figuresSubjects: Physics and Society (physics.soc-ph); Populations and Evolution (q-bio.PE)
In many epidemiological and ecological contexts, there is a trade-off between infections and interactions. This arises because the links between individuals capable of spreading infections are also often associated with beneficial activities. Here, we consider how the presence of explicit network structure changes the optimal solution of a class of infection-interaction trade-offs. In order to do this, we develop and analyse a low-dimensional dynamical system approximating the network SIS epidemic. We find that network structure in the form of heterogeneous numbers of contacts can have a significant impact on the optimal number of contacts that comes out of a trade-off model.
- [23] arXiv:2411.05316 (replaced) [pdf, html, other]
-
Title: Aligning Large Language Models and Geometric Deep Models for Protein RepresentationComments: 37 pages, 10 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Biomolecules (q-bio.BM)
Latent representation alignment has become a foundational technique for constructing multimodal large language models (MLLM) by mapping embeddings from different modalities into a shared space, often aligned with the embedding space of large language models (LLMs) to enable effective cross-modal understanding. While preliminary protein-focused MLLMs have emerged, they have predominantly relied on heuristic approaches, lacking a fundamental understanding of optimal alignment practices across representations. In this study, we explore the alignment of multimodal representations between LLMs and Geometric Deep Models (GDMs) in the protein domain. We comprehensively evaluate three state-of-the-art LLMs (Gemma2-2B, LLaMa3.1-8B, and LLaMa3.1-70B) with four protein-specialized GDMs (GearNet, GVP, ScanNet, GAT). Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process. Our key findings reveal that GDMs incorporating both graph and 3D structural information align better with LLMs, larger LLMs demonstrate improved alignment capabilities, and protein rarity significantly impacts alignment performance. We also find that increasing GDM embedding dimensions, using two-layer projection heads, and fine-tuning LLMs on protein-specific data substantially enhance alignment quality. These strategies offer potential enhancements to the performance of protein-related multimodal models. Our code and data are available at this https URL.
- [24] arXiv:2501.18945 (replaced) [pdf, other]
-
Title: Solving Inverse Problem for Multi-armed Bandits via Convex OptimizationSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Optimization and Control (math.OC); Neurons and Cognition (q-bio.NC)
We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.