Electrical Engineering and Systems Science
See recent articles
Showing new listings for Thursday, 21 November 2024
- [1] arXiv:2411.12755 [pdf, html, other]
-
Title: SAM-I2I: Unleash the Power of Segment Anything Model for Medical Image TranslationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Medical image translation is crucial for reducing the need for redundant and expensive multi-modal imaging in clinical field. However, current approaches based on Convolutional Neural Networks (CNNs) and Transformers often fail to capture fine-grain semantic features, resulting in suboptimal image quality. To address this challenge, we propose SAM-I2I, a novel image-to-image translation framework based on the Segment Anything Model 2 (SAM2). SAM-I2I utilizes a pre-trained image encoder to extract multiscale semantic features from the source image and a decoder, based on the mask unit attention module, to synthesize target modality images. Our experiments on multi-contrast MRI datasets demonstrate that SAM-I2I outperforms state-of-the-art methods, offering more efficient and accurate medical image translation.
- [2] arXiv:2411.12756 [pdf, other]
-
Title: FedCL-Ensemble Learning: A Framework of Federated Continual Learning with Ensemble Transfer Learning Enhanced for Alzheimer's MRI Classifications while Preserving PrivacyRishit Kapoor (1), Jesher Joshua (2), Muralidharan Vijayarangan (3), Natarajan B (4) ((1) Vellore Institute of Technology, (2) Vellore Institute of Technology, (3) Vellore Institute of Technology, (4) Vellore Institute of Technology)Comments: 6 pages, 4 figuresSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
This research work introduces a novel approach to the classification of Alzheimer's disease by using the advanced deep learning techniques combined with secure data processing methods. This research work primary uses transfer learning models such as ResNet, ImageNet, and VNet to extract high-level features from medical image data. Thereafter, these pre-trained models were fine-tuned for Alzheimer's related subtle patterns such that the model is capable of robust feature extraction over varying data sources. Further, the federated learning approaches were incorporated to tackle a few other challenges related to classification, aimed to provide better prediction performance and protect data privacy. The proposed model was built using federated learning without sharing sensitive patient data. This way, the decentralized model benefits from the large and diversified dataset that it is trained upon while ensuring confidentiality. The cipher-based encryption mechanism is added that allows us to secure the transportation of data and further ensure the privacy and integrity of patient information throughout training and classification. The results of the experiments not only help to improve the accuracy of the classification of Alzheimer's but at the same time provides a framework for secure and collaborative analysis of health care data.
- [3] arXiv:2411.12776 [pdf, html, other]
-
Title: Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video TransmissionHaixiao Gao, Mengying Sun, Xiaodong Xu, Bingxuan Xu, Shujun Han, Bizhu Wang, Sheng Jiang, Chen Dong, Ping ZhangSubjects: Image and Video Processing (eess.IV); Cryptography and Security (cs.CR); Multimedia (cs.MM)
In this paper, we propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission, incorporating feature extraction, encoding, encryption, cyclic redundancy check (CRC), and retransmission processes to achieve compatibility between semantic communication and traditional communication systems. Additionally, we propose an adaptive cross-layer transmission mechanism that dynamically adjusts CRC, channel coding, and retransmission schemes based on the importance of semantic information. This ensures that important information is prioritized under poor transmission conditions. To verify the aforementioned framework, we also design an end-to-end adaptive panoramic video semantic transmission (APVST) network that leverages a deep joint source-channel coding (Deep JSCC) structure and attention mechanism, integrated with a latitude adaptive module that facilitates adaptive semantic feature extraction and variable-length encoding of panoramic videos. The proposed CLESC is also applicable to the transmission of other modal data. Simulation results demonstrate that the proposed CLESC effectively achieves compatibility and adaptation between semantic communication and traditional communication systems, improving both transmission efficiency and channel adaptability. Compared to traditional cross-layer transmission schemes, the CLESC framework can reduce bandwidth consumption by 85% while showing significant advantages under low signal-to-noise ratio (SNR) conditions.
- [4] arXiv:2411.12812 [pdf, html, other]
-
Title: DIETS: Diabetic Insulin Management System in Everyday LifeSubjects: Systems and Control (eess.SY)
People with diabetes need insulin delivery to effectively manage their blood glucose levels, especially after meals, because their bodies either do not produce enough insulin or cannot fully utilize it. Accurate insulin delivery starts with estimating the nutrients in meals and is followed by developing a detailed, personalized insulin injection strategy. These tasks are particularly challenging in daily life, especially without professional guidance. Existing solutions usually assume the prior knowledge of nutrients in meals and primarily rely on feedback from professional clinicians or simulators to develop Reinforcement Learning-based models for insulin management, leading to extensive consumption of medical resources and difficulties in adapting the models to new patients due to individual differences. In this paper, we propose DIETS, a novel diabetic insulin management framework built on the transformer architecture, to help people with diabetes effectively manage insulin delivery in everyday life. Specifically, DIETS tailors a Large Language Model (LLM) to estimate the nutrients in meals and employs a titration model to generate recommended insulin injection strategies, which are further validated by a glucose prediction model to prevent potential risks of hyperglycemia or hypoglycemia. DIETS has been extensively evaluated on three public datasets, and the results show it achieves superior performance in providing effective insulin delivery recommendation to control blood glucose levels.
- [5] arXiv:2411.12830 [pdf, html, other]
-
Title: Class-Incremental Learning for Sound Event Localization and DetectionSubjects: Audio and Speech Processing (eess.AS)
This paper investigates the feasibility of class-incremental learning (CIL) for Sound Event Localization and Detection (SELD) tasks. The method features an incremental learner that can learn new sound classes independently while preserving knowledge of old classes. The continual learning is achieved through a mean square error-based distillation loss to minimize output discrepancies between subsequent learners. The experiments are conducted on the TAU-NIGENS Spatial Sound Events 2021 dataset, which includes 12 different sound classes and demonstrate the efficacy of proposed method. We begin by learning 8 classes and introduce the 4 new classes at next stage. After the incremental phase, the system is evaluated on the full set of learned classes. Results show that, for this realistic dataset, our proposed method successfully maintains baseline performance across all metrics.
- [6] arXiv:2411.12833 [pdf, html, other]
-
Title: Efficient Medicinal Image Transmission and Resolution Enhancement via GANSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
While X-ray imaging is indispensable in medical diagnostics, it inherently carries with it those noises and limitations on resolution that mask the details necessary for diagnosis. B/W X-ray images require a careful balance between noise suppression and high-detail preservation to ensure clarity in soft-tissue structures and bone edges. While traditional methods, such as CNNs and early super-resolution models like ESRGAN, have enhanced image resolution, they often perform poorly regarding high-frequency detail preservation and noise control for B/W imaging. We are going to present one efficient approach that improves the quality of an image with the optimization of network transmission in the following paper. The pre-processing of X-ray images into low-resolution files by Real-ESRGAN, a version of ESRGAN elucidated and improved, helps reduce the server load and transmission bandwidth. Lower-resolution images are upscaled at the receiving end using Real-ESRGAN, fine-tuned for real-world image degradation. The model integrates Residual-in-Residual Dense Blocks with perceptual and adversarial loss functions for high-quality upscaled images with low noise. We further fine-tune Real-ESRGAN by adapting it to the specific B/W noise and contrast characteristics. This suppresses noise artifacts without compromising detail. The comparative evaluation conducted shows that our approach achieves superior noise reduction and detail clarity compared to state-of-the-art CNN-based and ESRGAN models, apart from reducing network bandwidth requirements. These benefits are confirmed both by quantitative metrics, including Peak Signal-to-Noise Ratio and Structural Similarity Index, and by qualitative assessments, which indicate the potential of Real-ESRGAN for diagnostic-quality X-ray imaging and for efficient medical data transmission.
- [7] arXiv:2411.12852 [pdf, html, other]
-
Title: Enhanced Cross-Dataset Electroencephalogram-based Emotion Recognition using Unsupervised Domain AdaptationComments: In press: Computers in Biology and MedicineSubjects: Signal Processing (eess.SP)
Emotion recognition has significant potential in healthcare and affect-sensitive systems such as brain-computer interfaces (BCIs). However, challenges such as the high cost of labeled data and variability in electroencephalogram (EEG) signals across individuals limit the applicability of EEG-based emotion recognition models across domains. These challenges are exacerbated in cross-dataset scenarios due to differences in subject demographics, recording devices, and presented stimuli. To address these issues, we propose a novel approach to improve cross-domain EEG-based emotion classification. Our method, Gradual Proximity-guided Target Data Selection (GPTDS), incrementally selects reliable target domain samples for training. By evaluating their proximity to source clusters and the models confidence in predicting them, GPTDS minimizes negative transfer caused by noisy and diverse samples. Additionally, we introduce Prediction Confidence-aware Test-Time Augmentation (PC-TTA), a cost-effective augmentation technique. Unlike traditional TTA methods, which are computationally intensive, PC-TTA activates only when model confidence is low, improving inference performance while drastically reducing computational costs. Experiments on the DEAP and SEED datasets validate the effectiveness of our approach. When trained on DEAP and tested on SEED, our model achieves 67.44% accuracy, a 7.09% improvement over the baseline. Conversely, training on SEED and testing on DEAP yields 59.68% accuracy, a 6.07% improvement. Furthermore, PC-TTA reduces computational time by a factor of 15 compared to traditional TTA methods. Our method excels in detecting both positive and negative emotions, demonstrating its practical utility in healthcare applications. Code available at: this https URL
- [8] arXiv:2411.12869 [pdf, html, other]
-
Title: Omnidirectional Wireless Power Transfer for Millimetric Magnetoelectric Biomedical ImplantsWei Wang, Zhanghao Yu, Yiwei Zou, Joshua E Woods, Prahalad Chari, Yumin Su, Jacob T Robinson, Kaiyuan YangComments: 13 pages, 27 figuresJournal-ref: IEEE Journal of Solid-State Circuits, Volume: 59, Issue: 11, Page(s): 3599 - 3611, November 2024Subjects: Systems and Control (eess.SY); Medical Physics (physics.med-ph)
Miniature bioelectronic implants promise revolutionary therapies for cardiovascular and neurological disorders. Wireless power transfer (WPT) is a significant method for miniaturization, eliminating the need for bulky batteries in devices. Despite successful demonstrations of millimetric battery free implants in animal models, the robustness and efficiency of WPT are known to degrade significantly under misalignment incurred by body movements, respiration, heart beating, and limited control of implant orientation during surgery. This article presents an omnidirectional WPT platform for millimetric bioelectronic implants, employing the emerging magnetoelectric (ME) WPT modality, and magnetic field steering technique based on multiple transmitter (TX) coils. To accurately sense the weak coupling in a miniature implant and adaptively control the multicoil TX array in a closed loop, we develop an active echo (AE) scheme using a tiny coil on the implant. Our prototype comprises a fully integrated 14.2 mm3 implantable stimulator embedding a custom low power system on chip (SoC) powered by an ME film, a TX with a custom three channel AE RX chip, and a multicoil TX array with mutual inductance cancellation. The AE RX achieves negative 161 dBm per Hz input referred noise with 64 dB gain tuning range to reliably sense the AE signal, and offers fast polarity detection for driver control. AE simultaneously enhances the robustness, efficiency, and charging range of ME WPT. Under 90 degree rotation from the ideal position, our omnidirectional WPT system achieves 6.8x higher power transfer efficiency (PTE) than a single coil baseline. The tracking error of AE negligibly degrades the PTE by less than 2 percent from using ideal control.
- [9] arXiv:2411.12874 [pdf, html, other]
-
Title: Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor ClassificationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning has proven very promising for interpreting MRI in brain tumor diagnosis. However, deep learning models suffer from a scarcity of brain MRI datasets for effective training. Self-supervised learning (SSL) models provide data-efficient and remarkable solutions to limited dataset problems. Therefore, this paper introduces a generative SSL model for brain tumor classification in two stages. The first stage is designed to pre-train a Residual Vision Transformer (ResViT) model for MRI synthesis as a pretext task. The second stage includes fine-tuning a ResViT-based classifier model as a downstream task. Accordingly, we aim to leverage local features via CNN and global features via ViT, employing a hybrid CNN-transformer architecture for ResViT in pretext and downstream tasks. Moreover, synthetic MRI images are utilized to balance the training set. The proposed model performs on public BraTs 2023, Figshare, and Kaggle datasets. Furthermore, we compare the proposed model with various deep learning models, including A-UNet, ResNet-9, pix2pix, pGAN for MRI synthesis, and ConvNeXtTiny, ResNet101, DenseNet12, Residual CNN, ViT for classification. According to the results, the proposed model pretraining on the MRI dataset is superior compared to the pretraining on the ImageNet dataset. Overall, the proposed model attains the highest accuracy, achieving 90.56% on the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the Kaggle brain tumor datasets. As a result, the proposed model demonstrates a robust, effective, and successful approach to handling insufficient dataset challenges in MRI analysis by incorporating SSL, fine-tuning, data augmentation, and combining CNN and ViT.
- [10] arXiv:2411.12899 [pdf, other]
-
Title: Adaptive Control Barrier Functions with Vanishing Conservativeness Under Persistency of ExcitationComments: 8 pages, 11 figures , submitted for conferenceSubjects: Systems and Control (eess.SY)
This article presents a closed-form adaptive controlbarrier-function (CBF) approach for satisfying state constraints in systems with parametric uncertainty. This approach uses a sampled-data recursive-least-squares algorithm to estimate the unknown model parameters and construct a nonincreasing upper bound on the norm of the estimation error. Together, this estimate and upper bound are used to construct a CBF-based constraint that has nonincreasing conservativeness. Furthermore, if a persistency of excitation condition is satisfied, then the CBFbased constraint has vanishing conservativeness in the sense that the CBF-based constraint converges to the ideal constraint corresponding to the case where the uncertainty is known. In addition, the approach incorporates a monotonically improving estimate of the unknown model parameters thus, this estimate can be effectively incorporated into a desired control law. We demonstrate constraint satisfaction and performance using 2 two numerical examples, namely, a nonlinear pendulum and a nonholonomic robot.
- [11] arXiv:2411.12906 [pdf, html, other]
-
Title: Experimental Study of Underwater Acoustic Reconfigurable Intelligent Surfaces with In-Phase and Quadrature ModulationComments: 12 pages, 17 figuresSubjects: Systems and Control (eess.SY)
This paper presents an underwater acoustic reconfigurable intelligent surfaces (UA-RIS) designed for long-range, high-speed, and environmentally friendly communication in oceanic environments. The proposed UA-RIS comprises multiple pairs of acoustic reflectors that utilize in-phase and quadrature (IQ) modulation to flexibly control the amplitude and phase of reflected waves. This capability enables precise beam steering to enhance or attenuate sound levels in specific directions. A prototype UA-RIS with 4*6 acoustic reflection units is constructed and tested in both tank and lake environments to evaluate performance. The experimental results indicate that the prototype is capable of effectively pointing reflected waves to targeted directions while minimizing side lobes using passive IQ modulation. Field tests reveal that deploying the UA-RIS on the sender side considerably extends communication ranges by 28% in deep water and 46% in shallow waters. Furthermore, with a fixed communication distance, positioning the UA-RIS at the transmitter side substantially boosts data rates, with an average increase of 63.8% and peaks up to 96%. When positioned on the receiver side, the UA-RIS can expand the communication range in shallow and deep water environments by 40.6% and 66%, respectively. Moreover, placing the UA-RIS close to the receiver enhances data rates by an average of 80.3%, reaching up to 163% under certain circumstances.
- [12] arXiv:2411.12919 [pdf, html, other]
-
Title: Enhancing Deep Learning-Driven Multi-Coil MRI Reconstruction via Self-Supervised DenoisingSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We examine the effect of incorporating self-supervised denoising as a pre-processing step for training deep learning (DL) based reconstruction methods on data corrupted by Gaussian noise. K-space data employed for training are typically multi-coil and inherently noisy. Although DL-based reconstruction methods trained on fully sampled data can enable high reconstruction quality, obtaining large, noise-free datasets is impractical. We leverage Generalized Stein's Unbiased Risk Estimate (GSURE) for denoising. We evaluate two DL-based reconstruction methods: Diffusion Probabilistic Models (DPMs) and Model-Based Deep Learning (MoDL). We evaluate the impact of denoising on the performance of these DL-based methods in solving accelerated multi-coil magnetic resonance imaging (MRI) reconstruction. The experiments were carried out on T2-weighted brain and fat-suppressed proton-density knee scans. We observed that self-supervised denoising enhances the quality and efficiency of MRI reconstructions across various scenarios. Specifically, employing denoised images rather than noisy counterparts when training DL networks results in lower normalized root mean squared error (NRMSE), higher structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) across different SNR levels, including 32dB, 22dB, and 12dB for T2-weighted brain data, and 24dB, 14dB, and 4dB for fat-suppressed knee data. Overall, we showed that denoising is an essential pre-processing technique capable of improving the efficacy of DL-based MRI reconstruction methods under diverse conditions. By refining the quality of input data, denoising can enable the training of more effective DL networks, potentially bypassing the need for noise-free reference MRI scans.
- [13] arXiv:2411.12935 [pdf, html, other]
-
Title: Improving Low-Fidelity Models of Li-ion Batteries via Hybrid Sparse Identification of Nonlinear DynamicsSamuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Prashanth Ramesh, Marcello CanovaComments: 6 pagesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Accurate modeling of lithium ion (li-ion) batteries is essential for enhancing the safety, and efficiency of electric vehicles and renewable energy systems. This paper presents a data-inspired approach for improving the fidelity of reduced-order li-ion battery models. The proposed method combines a Genetic Algorithm with Sequentially Thresholded Ridge Regression (GA-STRidge) to identify and compensate for discrepancies between a low-fidelity model (LFM) and data generated either from testing or a high-fidelity model (HFM). The hybrid model, combining physics-based and data-driven methods, is tested across different driving cycles to demonstrate the ability to significantly reduce the voltage prediction error compared to the baseline LFM, while preserving computational efficiency. The model robustness is also evaluated under various operating conditions, showing low prediction errors and high Pearson correlation coefficients for terminal voltage in unseen environments.
- [14] arXiv:2411.12939 [pdf, html, other]
-
Title: Stabilization of Switched Affine Systems With Dwell-Time ConstraintComments: 12 pages, 10 figuresSubjects: Systems and Control (eess.SY)
This paper addresses the problem of stabilization of switched affine systems under dwell-time constraint, giving guarantees on the bound of the quadratic cost associated with the proposed state switching control law. Specifically, two switching rules are presented relying on the solution of differential Lyapunov inequalities and Lyapunov-Metzler inequalities, from which the stability conditions are expressed. The first one allows to regulate the state of linear switched systems to zero, whereas the second one is designed for switched affine systems proving practical stability of the origin. In both cases, the determination of a guaranteed cost associated with each control strategy is shown. In the cases of linear and affine systems, the existence of the solution for the Lyapunov-Metzler condition is discussed and guidelines for the selection of a solution ensuring suitable performance of the system evolution are provided. The theoretical results are finally assessed by means of three examples.
- [15] arXiv:2411.12955 [pdf, other]
-
Title: Matrix-Scheduling of QSR-Dissipative SystemsComments: Submitted to IEEE Transactions on Automatic Control (TAC)Subjects: Systems and Control (eess.SY)
This paper considers gain-scheduling of QSR-dissipative subsystems using scheduling matrices. The corresponding QSR-dissipative properties of the overall matrix-gain-scheduled system, which depends on the QSR properties of the subsystems scheduled, are explicitly derived. The use of scheduling matrices is a generalization of the scalar scheduling signals used in the literature, and allows for greater design freedom when scheduling systems, such as in the case of gain-scheduled control. Furthermore, this work extends the existing gain-scheduling results to a broader class of QSR-dissipative systems. The matrix-scheduling of important special cases, such as passive, input strictly passive, output strictly passive, finite L2 gain, very strictly passive, and conic systems are presented. The proposed gain-scheduling architecture is used in the context of controlling a planar three-link robot subject to model uncertainty. A novel control synthesis technique is used to design QSR-dissipative subcontrollers that are gain-scheduled using scheduling matrices. Numerical simulation results highlight the greater design freedom of scheduling matrices, leading to improved performance.
- [16] arXiv:2411.12963 [pdf, html, other]
-
Title: Probabilistic Dynamic Line Rating Forecasting with Line Graph Convolutional LSTMComments: 5 pages, 5 figuresSubjects: Systems and Control (eess.SY)
Dynamic line rating (DLR) is a promising solution to increase the utilization of transmission lines by adjusting ratings based on real-time weather conditions. Accurate DLR forecast at the scheduling stage is thus necessary for system operators to proactively optimize power flows, manage congestion, and reduce the cost of grid operations. However, the DLR forecast remains challenging due to weather uncertainty. To reliably predict DLRs, we propose a new probabilistic forecasting model based on line graph convolutional LSTM. Like standard LSTM networks, our model accounts for temporal correlations between DLRs across the planning horizon. The line graph-structured network additionally allows us to leverage the spatial correlations of DLR features across the grid to improve the quality of predictions. Simulation results on the synthetic Texas 123-bus system demonstrate that the proposed model significantly outperforms the baseline probabilistic DLR forecasting models regarding reliability and sharpness while using the fewest parameters.
- [17] arXiv:2411.12985 [pdf, html, other]
-
Title: Disco Intelligent Omni-Surfaces: 360-degree Fully-Passive Jamming AttacksHuan Huang, Hongliang Zhang, Jide Yuan, Luyao Sun, Yitian Wang, Weidong Mei, Boya Di, Yi Cai, Zhu HanComments: This paper has been submitted to IEEE TWC for possible publicationSubjects: Signal Processing (eess.SP)
Intelligent omni-surfaces (IOSs) with 360-degree electromagnetic radiation significantly improves the performance of wireless systems, while an adversarial IOS also poses a significant potential risk for physical layer security. In this paper, we propose a "DISCO" IOS (DIOS) based fully-passive jammer (FPJ) that can launch omnidirectional fully-passive jamming attacks. In the proposed DIOS-based FPJ, the interrelated refractive and reflective (R&R) coefficients of the adversarial IOS are randomly generated, acting like a "DISCO" that distributes wireless energy radiated by the base station. By introducing active channel aging (ACA) during channel coherence time, the DIOS-based FPJ can perform omnidirectional fully-passive jamming without neither jamming power nor channel knowledge of legitimate users (LUs). To characterize the impact of the DIOS-based PFJ, we derive the statistical characteristics of DIOS-jammed channels based on two widely-used IOS models, i.e., the constant-amplitude model and the variable-amplitude model. Consequently, the asymptotic analysis of the ergodic achievable sum rates under the DIOS-based omnidirectional fully-passive jamming is given based on the derived stochastic characteristics for both the two IOS models. Based on the derived analysis, the omnidirectional jamming impact of the proposed DIOS-based FPJ implemented by a constant-amplitude IOS does not depend on either the quantization number or the stochastic distribution of the DIOS coefficients, while the conclusion does not hold on when a variable-amplitude IOS is used. Numerical results based on one-bit quantization of the IOS phase shifts are provided to verify the effectiveness of the derived theoretical analysis. The proposed DIOS-based FPJ can not only launch omnidirectional fully-passive jamming, but also improve the jamming impact by about 55% at 10 dBm transmit power per LU.
- [18] arXiv:2411.12999 [pdf, html, other]
-
Title: From Signal Space To STP-CSSubjects: Systems and Control (eess.SY)
Under the assumption that a finite signal with different sampling lengths or different sampling frequencies is considered as equivalent, the signal space is considered as the quotient space of $\mathbb{R}^{\infty}$ over equivalence. The topological structure and the properties of signal space are investigated. Using them some characteristics of semi-tensor product based compressed sensing (STP-CS) are revealed. Finally, a systematic analysis of the construction of sensing matrix based on balanced incomplete block design (BIBD) is presented.
- [19] arXiv:2411.13006 [pdf, other]
-
Title: Automating Sonologists USG Commands with AI and Voice InterfaceSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
This research presents an advanced AI-powered ultrasound imaging system that incorporates real-time image processing, organ tracking, and voice commands to enhance the efficiency and accuracy of diagnoses in clinical practice. Traditional ultrasound diagnostics often require significant time and introduce a degree of subjectivity due to user interaction. The goal of this innovative solution is to provide Sonologists with a more predictable and productive imaging procedure utilizing artificial intelligence, computer vision, and voice technology. The functionality of the system employs computer vision and deep learning algorithms, specifically adopting the Mask R-CNN model from Detectron2 for semantic segmentation of organs and key landmarks. This automation improves diagnostic accuracy by enabling the extraction of valuable information with minimal human input. Additionally, it includes a voice recognition feature that allows for hands-free operation, enabling users to control the system with commands such as freeze or liver, all while maintaining their focus on the patient. The architecture comprises video processing and real-time segmentation modules that prepare the system to perform essential imaging functions, such as freezing and zooming in on frames. The liver histopathology module, optimized for detecting fibrosis, achieved an impressive accuracy of 98.6%. Furthermore, the organ segmentation module produces output confidence levels between 50% and 95%, demonstrating its efficacy in organ detection.
- [20] arXiv:2411.13022 [pdf, html, other]
-
Title: Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRISubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Physics-driven deep learning (PD-DL) approaches have become popular for improved reconstruction of fast magnetic resonance imaging (MRI) scans. Even though PD-DL offers higher acceleration rates compared to existing clinical fast MRI techniques, their use has been limited outside specialized MRI centers. One impediment for their deployment is the difficulties with generalization to pathologies or population groups that are not well-represented in training sets. This has been noted in several studies, and fine-tuning on target populations to improve reconstruction has been suggested. However, current approaches for PD-DL training require access to raw k-space measurements, which is typically only available at specialized MRI centers that have research agreements for such data access. This is especially an issue for rural and underserved areas, where commercial MRI scanners only provide access to a final reconstructed image. To tackle these challenges, we propose Compressibility-inspired Unsupervised Learning via Parallel Imaging Fidelity (CUPID) for high-quality PD-DL training, using only routine clinical reconstructed images exported from an MRI scanner. CUPID evaluates the goodness of the output with a compressibility-based approach, while ensuring that the output stays consistent with the clinical parallel imaging reconstruction through well-designed perturbations. Our results show that CUPID achieves similar quality compared to well-established PD-DL training strategies that require raw k-space data access, while outperforming conventional compressed sensing (CS) and state-of-the-art generative methods. We also demonstrate its effectiveness in a zero-shot training setup for retrospectively and prospectively sub-sampled acquisitions, attesting to its minimal training burden.
- [21] arXiv:2411.13033 [pdf, html, other]
-
Title: LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image CompressionComments: IEEE VCIP 2024 posterSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58\% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at this https URL.
- [22] arXiv:2411.13108 [pdf, other]
-
Title: Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and WeldingDavid Mascareñas, Andre Green, Ashlee Liao, Michael Torrez, Alessandro Cattaneo, Amber Black, John Bernardin, Garrett KenyonComments: This work is a derivative work of a conference proceedings paper submitted to the International Modal Analysis Conference 2024, and is subject to some copyright restrictions associated with the Society of Experimental Mechanics. A variation of this paper is also published in the Weapons Engineering Symposium and Journal (WESJ) which is not publically accessibleJournal-ref: International Modal Analysis Conference, Orlando, FL, 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We demonstrate the suitability of high dynamic range, high-speed, neuromorphic event-based, dynamic vision sensors for metallic additive manufacturing and welding for in-process monitoring applications. In-process monitoring to enable quality control of mission critical components produced using metallic additive manufacturing is of high interest. However, the extreme light environment and high speed dynamics of metallic melt pools have made this a difficult environment in which to make measurements. Event-based sensing is an alternative measurement paradigm where data is only transmitted/recorded when a measured quantity exceeds a threshold resolution. The result is that event-based sensors consume less power and less memory/bandwidth, and they operate across a wide range of timescales and dynamic ranges. Event-driven driven imagers stand out from conventional imager technology in that they have a very high dynamic range of approximately 120 dB. Conventional 8 bit imagers only have a dynamic range of about 48 dB. This high dynamic range makes them a good candidate for monitoring manufacturing processes that feature high intensity light sources/generation such as metallic additive manufacturing and welding. In addition event based imagers are able to capture data at timescales on the order of 100 {\mu}s, which makes them attractive to capturing fast dynamics in a metallic melt pool. In this work we demonstrate that event-driven imagers have been shown to be able to observe tungsten inert gas (TIG) and laser welding melt pools. The results of this effort suggest that with additional engineering effort, neuromorphic event imagers should be capable of 3D geometry measurements of the melt pool, and anomaly detection/classification/prediction.
- [23] arXiv:2411.13140 [pdf, html, other]
-
Title: Robust Convergency Indicator using High-dimension PID Controller in the presence of disturbanceComments: 12 pages, 11 figuresSubjects: Systems and Control (eess.SY)
The PID controller currently occupies a prominent position as the most prevalent control architecture, which has achieved groundbreaking success across extensive implications. However, its parameters online regulation remains a formidable challenge. The majority of existing theories hinge on the linear constant system structure, contemplating only Single-Input, Single-Output (SISO) scenarios. Restricted research has been conducted on the intricate PID control problem within high-dimensional, Multi-Input, Multi-Output (MIMO) nonlinear systems that incorporate disturbances. This research, providing insights on the velocity form of nonlinear system, aims to bolster the controller's robustness. It establishes a quantitative metric to assess the robustness of high-dimensional PID controller, elucidates the pivotal theory regarding robustness's impact on error exponential convergence, and introduces a localized compensation strategy to optimize the robustness indicator. Guided by these theoretical insights, we exploit a robust high-dimensional PID (RH-PID) controller without the crutch of oversimplifying assumptions. Experimental results demonstrate the controller's commendable exponential stabilization efficacy and the controller exhibits exceptional robustness under the robust indicator's guidance. Notably, the robust convergence indicator can also effectively evaluate the comprehensive performance.
- [24] arXiv:2411.13172 [pdf, other]
-
Title: Enhanced average for event-related potential analysis using dynamic time warpingComments: 11 pagesJournal-ref: Biomed. Signal Proces. and Cont.,Vol.87,2024Subjects: Signal Processing (eess.SP)
Electroencephalography (EEG) provides a way to understand, and evaluate neurotransmission. In this context, time-locked EEG activity or event-related potentials (ERPs) are often used to capture neural activity related to specific mental processes. Normally, they are considered on the basis of averages across a number of trials. However, there exist notable variability in latency jitter, jitter, and amplitude, across trials, and, also, across users; this causes the average ERP waveform to blur, and, furthermore, diminish the amplitude of underlying waves. For these reasons, a strategy is proposed for obtaining ERP waveforms based on dynamic time warping (DTW) to adapt, and adjust individual trials to the averaged ERP, previously calculated, to build an enhanced average by making use of these warped signals. At the sight of the experiments carried out on the behaviour of the proposed scheme using publicly available datasets, this strategy reduces the attenuation in amplitude of ERP components thanks to the reduction of the influence of variability of latency and jitter, and, thus, improves the averaged ERP waveforms.
- [25] arXiv:2411.13184 [pdf, html, other]
-
Title: Quantitative Fairness -- A Framework For The Design Of Equitable Cybernetic SocietiesSubjects: Systems and Control (eess.SY)
Advancements in computer science, artificial intelligence, and control systems of the recent have catalyzed the emergence of cybernetic societies, where algorithms play a significant role in decision-making processes affecting the daily life of humans in almost every aspect. Algorithmic decision-making expands into almost every industry, government processes critical infrastructure, and shapes the life-reality of people and the very fabric of social interactions and communication. Besides the great potentials to improve efficiency and reduce corruption, missspecified cybernetic systems harbor the threat to create societal inequities, systematic discrimination, and dystopic, totalitarian societies. Fairness is a crucial component in the design of cybernetic systems, to promote cooperation between selfish individuals, to achieve better outcomes at the system level, to confront public resistance, to gain trust and acceptance for rules and institutions, to perforate self-reinforcing cycles of poverty through social mobility, to incentivize motivation, contribution and satisfaction of people through inclusion, to increase social-cohesion in groups, and ultimately to improve life quality. Quantitative descriptions of fairness are crucial to reflect equity into algorithms, but only few works in the fairness literature offer such measures; the existing quantitative measures in the literature are either too application-specific, suffer from undesirable characteristics, or are not ideology-agnostic. Therefore, this work proposes a quantitative, transactional, distributive fairness framework, which enables systematic design of socially feasible decision-making systems. Moreover, it emphasizes the importance of fairness and transparency when designing algorithms for equitable, cybernetic societies.
- [26] arXiv:2411.13188 [pdf, html, other]
-
Title: Coexistence of Radar and Communication with Rate-Splitting Wireless AccessSubjects: Signal Processing (eess.SP)
This work investigates the coexistence of sensing and communication functionalities in a base station (BS) serving a communication user in the uplink and simultaneously detecting a radar target with the same frequency resources. To address inter-functionality interference, we employ rate-splitting (RS) at the communication user and successive interference cancellation (SIC) at the joint radar-communication receiver at the BS. This approach is motivated by RS's proven effectiveness in mitigating inter-user interference among communication users. Building on the proposed system model based on RS, we derive inner bounds on performance in terms of ergodic data information rate for communication and ergodic radar estimation information rate for sensing. Additionally, we present a closed-form solution for the optimal power split in RS that maximizes the communication user's performance. The bounds achieved with RS are compared to conventional methods, including spectral isolation and full spectral sharing with SIC. We demonstrate that RS offers a superior performance trade-off between sensing and communication functionalities compared to traditional approaches. Pertinently, while the original concept of RS deals only with digital signals, this work brings forward RS as a general method for including non-orthogonal access for sensing signals. As a consequence, the work done in this paper provides a systematic and parametrized way to effectuate non-orthogonal sensing and communication waveforms.
- [27] arXiv:2411.13191 [pdf, html, other]
-
Title: Experimental Assessment of Human Blockage at sub-THz and mmWave Frequency BandsJuan E. Galeote-Cazorla, Alejandro Ramírez-Arroyo, José-María Molina-García-Pardo, María-Teresa Martínez-Inglés, Juan F. Valenzuela ValdésSubjects: Signal Processing (eess.SP)
The fifth generation (5G) of mobile communications relies on extremely high data transmissions using a large variety of frequency bands, such as FR1 (sub-6 GHz) and FR2 (mmWave). Future mobile communications envisage using electromagnetic spectrum beyond FR2, i.e. above 100 GHz, known as sub-THz band. These new frequencies open up challenging scenarios where communications shall rely on a major contribution such as the line-of-sight (LoS) component. To the best of the authors' knowledge, for the first time in literature this work studies the human blockage effects over an extremely wide frequency band from 75 GHz to 215 GHz given: (i) the distance between the blocker and the antennas and (ii) the body orientation. Furthermore, the obtained results are modeled with the classical path loss models and compared to 3GPP alternatives. The average losses increase from 42 dB to 56 dB when frequency rises from 75 GHz to 215 GHz. In terms of distance, a 18 dB increment in the received power is found when the Tx--Rx separation is increased from 1 m to 2.5 m. Finally, the blocker orientation induces variations of up to 4.6 dB.
- [28] arXiv:2411.13192 [pdf, html, other]
-
Title: Coexistence of Real-Time Source Reconstruction and Broadband Services Over Wireless NetworksAnup Mishra, Nikolaos Pappas, Čedomir Stefanović, Onur Ayan, Xueli An, Yiqun Wu, Petar Popovski, Israel Leyva-MayorgaSubjects: Signal Processing (eess.SP)
Achieving a flexible and efficient sharing of wireless resources among a wide range of novel applications and services is one of the major goals of the sixth-generation of mobile systems (6G). Accordingly, this work investigates the performance of a real-time system that coexists with a broadband service in a frame-based wireless channel. Specifically, we consider real-time remote tracking of an information source, where a device monitors its evolution and sends updates to a base station (BS), which is responsible for real-time source reconstruction and, potentially, remote actuation. To achieve this, the BS employs a grant-free access mechanism to serve the monitoring device together with a broadband user, which share the available wireless resources through orthogonal or non-orthogonal multiple access schemes. We analyse the performance of the system with time-averaged reconstruction error, time-averaged cost of actuation error, and update-delivery cost as performance metrics. Furthermore, we analyse the performance of the broadband user in terms of throughput and energy efficiency. Our results show that an orthogonal resource sharing between the users is beneficial in most cases where the broadband user requires maximum throughput. However, sharing the resources in a non-orthogonal manner leads to a far greater energy efficiency.
- [29] arXiv:2411.13198 [pdf, html, other]
-
Title: Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT SegmentationComments: 10 pages,6 figures,3 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In the field of medical image segmentation, challenges such as indistinct lesion features, ambiguous boundaries,and multi-scale characteristics have long revailed. This paper proposes an improved method named Intensity-Spatial Dual Masked AutoEncoder (ISD-MAE). Based on the tissue-contrast semi-masked autoencoder, a Masked AutoEncoder (MAE) branch is introduced to perform intensity masking and spatial masking operations on chest CT images for multi-scale feature learning and segmentation tasks. The model utilizes a dual-branch structure and contrastive learning to enhance the ability to learn tissue features and boundary details. Experiments are conducted on multiple 2D and 3D datasets. The results show that ISD-MAE significantly outperforms other methods in 2D pneumonia and mediastinal tumor segmentation tasks. For example, the Dice score reaches 90.10% on the COVID19 LESION dataset, and the performance is relatively stable. However, there is still room for improvement on 3D datasets. In response to this, improvement directions are proposed, including optimizing the loss function, using enhanced 3D convolution blocks, and processing datasets from multiple this http URL code is available at:this https URL.
- [30] arXiv:2411.13213 [pdf, html, other]
-
Title: Identification of Black-Box Inverter-Based Resource Control Using Hammerstein-Wiener ModelsComments: 7 pages, 14 figures, conference paperSubjects: Systems and Control (eess.SY)
The development of more complex inverter-based resources (IBRs) control is becoming essential as a result of the growing share of renewable energy sources in power systems. Given the diverse range of control schemes, grid operators are typically provided with black-box models of IBRs from various equipment manufacturers. As such, they are integrated into simulation models of the entire power system for analysis, and due to their nature, they can only be simulated in the time domain. Other system analysis approaches, like eigenvalue analysis, cannot be applied, making the comprehensive analysis of defined systems more challenging. This work introduces an approach for identification of three-phase IBR models for grid-forming and grid-following inverters using Hammerstein-Wiener models. To this end, we define a simulation framework for the identification process, and select suitable evaluation metrics for the results. Finally, we evaluate the approach on generic grid-forming and grid-following inverter models showing good identification results.
- [31] arXiv:2411.13217 [pdf, other]
-
Title: Energy-based features and bi-LSTM neural network for EEG-based music and voice classificationComments: 12 pagesJournal-ref: Neural Comput and Applic 36, 791-802, 2024Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
The human brain receives stimuli in multiple ways; among them, audio constitutes an important source of relevant stimuli for the brain regarding communication, amusement, warning, etc. In this context, the aim of this manuscript is to advance in the classification of brain responses to music of diverse genres and to sounds of different nature: speech and music. For this purpose, two different experiments have been designed to acquiere EEG signals from subjects listening to songs of different musical genres and sentences in various languages. With this, a novel scheme is proposed to characterize brain signals for their classification; this scheme is based on the construction of a feature matrix built on relations between energy measured at the different EEG channels and the usage of a bi-LSTM neural network. With the data obtained, evaluations regarding EEG-based classification between speech and music, different musical genres, and whether the subject likes the song listened to or not are carried out. The experiments unveil satisfactory performance to the proposed scheme. The results obtained for binary audio type classification attain 98.66% of success. In multi-class classification between 4 musical genres, the accuracy attained is 61.59%, and results for binary classification of musical taste rise to 96.96%.
- [32] arXiv:2411.13230 [pdf, html, other]
-
Title: OceanLens: An Adaptive Backscatter and Edge Correction using Deep Learning Model for Enhanced Underwater ImagingComments: Submitted to ICRA 2025Subjects: Image and Video Processing (eess.IV)
Underwater environments pose significant challenges due to the selective absorption and scattering of light by water, which affects image clarity, contrast, and color fidelity. To overcome these, we introduce OceanLens, a method that models underwater image physics-encompassing both backscatter and attenuation-using neural networks. Our model incorporates adaptive backscatter and edge correction losses, specifically Sobel and LoG losses, to manage image variance and luminance, resulting in clearer and more accurate outputs. Additionally, we demonstrate the relevance of pre-trained monocular depth estimation models for generating underwater depth maps. Our evaluation compares the performance of various loss functions against state-of-the-art methods using the SeeThru dataset, revealing significant improvements. Specifically, we observe an average of 65% reduction in Grayscale Patch Mean Angular Error (GPMAE) and a 60% increase in the Underwater Image Quality Metric (UIQM) compared to the SeeThru and DeepSeeColor methods. Further, the results were improved with additional convolution layers that capture subtle image details more effectively with OceanLens. This architecture is validated on the UIEB dataset, with model performance assessed using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) metrics. OceanLens with multiple convolutional layers achieves up to 12-15% improvement in the SSIM.
- [33] arXiv:2411.13252 [pdf, html, other]
-
Title: Unified Performance Control for Non-Square Nonlinear Systems with Relaxed ControllabilityComments: 9 pages,13 figures, submitted to journalSubjects: Systems and Control (eess.SY)
In this paper, we investigate the problem of unified prescribed performance tracking for a class of non-square strict-feedback nonlinear systems in the presence of actuator faults under relaxed controllability conditions. By using a skillful matrix decomposition and introducing some feasible auxiliary matrices, a more generalized controllability condition than the current state of the art is constructed, which can be applied to both square and non-square nonlinear systems subject to actuator faults and unknown yet time-varying control gain. Incorporating the relaxed controllability conditions and the uniform performance specifications into the backstepping design procedure, a prescribed performance fault-tolerant controller is developed that can achieve different performance demands without modifying the controller structure, which is more flexible and practical. In addition, the destruction of the system stability by unknown auxiliary matrices and unknown nonlinearities is circumvented by embedding the available core information of the state-dependent uncertainties into the design procedure. Both theoretical analysis and numerical simulation demonstrate the effectiveness and benefits of the proposed method.
- [34] arXiv:2411.13288 [pdf, other]
-
Title: EEG Signal Denoising Using pix2pix GAN: Enhancing Neurological Data AnalysisComments: 17 pages,6 figuresSubjects: Signal Processing (eess.SP)
Electroencephalography (EEG) is essential in neuroscience and clinical practice, yet it suffers from physiological artifacts, particularly electromyography (EMG), which distort signals. We propose a deep learning model using pix2pixGAN to remove such noise and generate reliable EEG signals. Leveraging the EEGdenoiseNet dataset, we created synthetic datasets with controlled EMG noise levels for model training and testing across a signal-to-noise ratio (SNR) from -7 to 2. Our evaluation metrics included RRMSE and Pearson's CC, assessing both time and frequency domains, and compared our model with others. The pix2pixGAN model excelled, especially under high noise conditions, showing significant improvements in lower RRMSE and higher CC values. This demonstrates the model's superior accuracy and stability in purifying EEG signals, offering a robust solution for EEG analysis challenges and advancing clinical and neuroscience applications.
- [35] arXiv:2411.13295 [pdf, html, other]
-
Title: Efficient Localization with Base Station-Integrated Beyond Diagonal RISComments: 6 pages, 5 figures, conference paper, submitted in IEEESubjects: Signal Processing (eess.SP)
This paper introduces a novel approach to efficient localization in next-generation communication systems through a base station (BS)-enabled passive beamforming utilizing beyond diagonal reconfigurable intelligent surfaces (BD-RISs). Unlike conventional diagonal RISs (D-RISs), which suffer from limited beamforming capability, a BD-RIS provides enhanced control over both phase and amplitude, significantly improving localization accuracy. By conducting a comprehensive Cramér-Rao lower bound (CRLB) analysis across various system parameters in both near-field and far-field scenarios, we establish the BD-RIS structure as a competitive alternative to traditional active antenna arrays. Our results reveal that BD-RISs achieve near active antenna arrays performance in localization precision, overcoming the limitations of D-RISs and underscoring its potential for high-accuracy positioning in future communication networks. This work envisions the use of BD-RIS for enabling passive beamforming-based localization, setting the stage for more efficient and scalable localization strategies in sixth-generation networks and beyond.
- [36] arXiv:2411.13298 [pdf, html, other]
-
Title: A CSI Feedback Framework based on Transmitting the Important Values and Generating the OthersSubjects: Signal Processing (eess.SP)
The application of deep learning (DL)-based channel state information (CSI) feedback frameworks in massive multiple-input multiple-output (MIMO) systems has significantly improved reconstruction accuracy. However, the limited generalization of widely adopted autoencoder-based networks for CSI feedback challenges consistent performance under dynamic wireless channel conditions and varying communication overhead constraints. To enhance the robustness of DL-based CSI feedback across diverse channel scenarios, we propose a novel framework, ITUG, where the user equipment (UE) transmits only a selected portion of critical values in the CSI matrix, while a generative model deployed at the BS reconstructs the remaining values. Specifically, we introduce a scoring algorithm to identify important values based on amplitude and contrast, an encoding algorithm to convert these values into a bit stream for transmission using adaptive bit length and a modified Huffman codebook, and a Transformer-based generative network named TPMVNet to recover the untransmitted values based on the received important values. Experimental results demonstrate that the ITUG framework, equipped with a single TPMVNet, achieves superior reconstruction performance compared to several high-performance autoencoder models across various channel conditions.
- [37] arXiv:2411.13305 [pdf, html, other]
-
Title: Mutual Information-oriented ISAC Beamforming Design under Statistical CSIComments: 14 pages, 5 figures, submitted to IEEE journal for possible publicationSubjects: Signal Processing (eess.SP)
Existing integrated sensing and communication (ISAC) beamforming design were mostly designed under perfect instantaneous channel state information (CSI), limiting their use in practical dynamic environments. In this paper, we study the beamforming design for multiple-input multiple-output (MIMO) ISAC systems based on statistical CSI, with the weighted mutual information (MI) comprising sensing and communication perspectives adopted as the performance metric. In particular, the operator-valued free probability theory is utilized to derive the closed-form expression for the weighted MI under statistical CSI. Subsequently, an efficient projected gradient ascent (PGA) algorithm is proposed to optimize the transmit beamforming matrix with the aim of maximizing the weighted this http URL results validate that the derived closed-form expression matches well with the Monte Carlo simulation results and the proposed optimization algorithm is able to improve the weighted MI significantly. We also illustrate the trade-off between sensing and communication MI.
- [38] arXiv:2411.13307 [pdf, other]
-
Title: Analytic Design of Flat-Wire Inductors for High-Current and Compact DC-DC ConvertersSubjects: Systems and Control (eess.SY)
This paper presents analytic study and design considerations of flat wire inductors with distributed gaps for high-power and compact DC-DC Converters. The focus is eddy current loss components within the conductors due to fringing and leakage fluxes. A magnetic equivalent circuit (MEC) is proposed in which eddy currents are modeled by MMFs opposing the primary flux as well as frequency dependent reluctances, which finally leads to a frequency dependent inductance describing the behavior of the inductor at high frequencies. Three formulations for DC resistance depending on the required accuracy are developed. Calculations of the AC resistance based on vector potential obtained from FEM are provided. To provide an insight into the optimized design of such inductors, components of the magnetic flux and induced eddy currents along with sensitivity of the main inductor quantities such as DCR, ESR, loss components and inductance values to the design parameters are investigated. Finally, an inductor is prototyped and experimentally tested to verify the design.
- [39] arXiv:2411.13310 [pdf, html, other]
-
Title: Moving Horizon Estimation for Simultaneous Localization and Mapping with Robust Estimation Error BoundsComments: 8 pages, 3 figuresSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
This paper presents a robust moving horizon estimation (MHE) approach with provable estimation error bounds for solving the simultaneous localization and mapping (SLAM) problem. We derive sufficient conditions to guarantee robust stability in ego-state estimates and bounded errors in landmark position estimates, even under limited landmark visibility which directly affects overall system detectability. This is achieved by decoupling the MHE updates for the ego-state and landmark positions, enabling individual landmark updates only when the required detectability conditions are met. The decoupled MHE structure also allows for parallelization of landmark updates, improving computational efficiency. We discuss the key assumptions, including ego-state detectability and Lipschitz continuity of the landmark measurement model, with respect to typical SLAM sensor configurations, and introduce a streamlined method for the range measurement model. Simulation results validate the considered method, highlighting its efficacy and robustness to noise.
- [40] arXiv:2411.13339 [pdf, other]
-
Title: Multipath Mitigation Technology-integrated GNSS Direct Position Estimation Plug-in ModuleSubjects: Signal Processing (eess.SP)
Direct position estimation (DPE) is an effective solution to the MP issue at the signal processing level. Unlike two-step positioning (2SP) receivers, DPE directly solves for the receiver position, velocity, and time (PVT) in the navigation domain, without the estimation of intermediate measurements, thus allowing it to provide more robust and accurate PVT estimates in the presence of multipath (MP) and weak signals. But GNSS positioning with DPE is mostly left unapplied commercially, and continuing research into DPE has remained relatively stagnant over the past few years. To encourage further research on DPE by the GNSS community, we propose a DPE plug-in module that can be integrated into the conventional 2SP software-defined receivers (SDRs). Programmed in MATLAB, the proposed DPE plug-in module is aimed for better understanding and familiarity of a practical implementation of DPE. Its plug-in module architecture allows it to be incorporated with 2SP MATLAB SDRs, both vector tracking and scalar tracking with minimum changes, making it easy to use, and provides greater flexibility for researchers using various 2SP SDRs. Since the proposed DPE implementation makes use of tracking observables from 2SP to propagate the channel, we propose to further improve the performance of DPE against MP through using MP-compensated observables generated from Multipath Mitigation Technology (MMT)-aided tracking. Referred to as Multipath Mitigation Technology (MMT)-integrated DPE, it is proposed as a variant of DPE that is better suit for urban environment applications. Results show that while in MP-only conditions, an MMT-integrated 2SP has similar performance with MMT-integrated DPE, the proposed MMT-integrated DPE manages to show great superiority against non-line-of-sight (NLOS), making it the preferable option for applications in urban environments.
- [41] arXiv:2411.13344 [pdf, other]
-
Title: Abstracted Model Reduction: A General Framework for Efficient Interconnected System ReductionComments: 16 pages, 13 figures, to appear in IEEE Transactions on Control Systems TechnologySubjects: Systems and Control (eess.SY)
This paper introduces the concept of abstracted model reduction: a framework to improve the tractability of structure-preserving methods for the complexity reduction of interconnected system models. To effectively reduce high-order, interconnected models, it is usually not sufficient to consider the subsystems separately. Instead, structure-preserving reduction methods should be employed, which consider the interconnected dynamics to select which subsystem dynamics to retain in reduction. However, structure-preserving methods are often not computationally tractable. To overcome this issue, we propose to connect each subsystem model to a low-order abstraction of its environment to reduce it both effectively and efficiently. By means of a high-fidelity structural-dynamics model from the lithography industry, we show, on the one hand, significantly increased accuracy with respect to standard subsystem reduction and, on the other hand, similar accuracy to direct application of expensive structure-preserving methods, while significantly reducing computational cost. Furthermore, we formulate a systematic approach to automatically determine sufficient abstraction and reduction orders to preserve stability and guarantee a given frequency-dependent error specification. We apply this approach to the lithography equipment use case and show that the environment model can indeed be reduced by over 80\% without significant loss in the accuracy of the reduced interconnected model.
- [42] arXiv:2411.13345 [pdf, html, other]
-
Title: IoT-Based Coma Patient Monitoring SystemSubjects: Systems and Control (eess.SY); Networking and Internet Architecture (cs.NI)
Continuous monitoring of coma patients is essential but challenging, especially in developing countries with limited resources, staff, and infrastructure. This paper presents a low-cost IoT-based system designed for such environments. It uses affordable hardware and robust software to monitor patients without constant internet access or extensive medical personnel. The system employs cost-effective sensors to track vital signs, including heart rate, body temperature, blood pressure, eye movement, and body position. An energy-efficient microcontroller processes data locally, synchronizing with a central server when network access is available. A locally hosted app provides on-site access to patient data, while a GSM module sends immediate alerts for critical events, even in areas with limited cellular coverage. This solution emphasizes ease of deployment, minimal maintenance, and resilience to power and network disruptions. Using open-source software and widely available hardware, it offers a scalable, adaptable system for resource-limited settings. At under $30, the system is a sustainable, cost-effective solution for continuous patient monitoring, bridging the gap until more advanced healthcare infrastructure is available.
- [43] arXiv:2411.13362 [pdf, html, other]
-
Title: RTSR: A Real-Time Super-Resolution Model for AV1 Compressed ContentSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Super-resolution (SR) is a key technique for improving the visual quality of video content by increasing its spatial resolution while reconstructing fine details. SR has been employed in many applications including video streaming, where compressed low-resolution content is typically transmitted to end users and then reconstructed with a higher resolution and enhanced quality. To support real-time playback, it is important to implement fast SR models while preserving reconstruction quality; however most existing solutions, in particular those based on complex deep neural networks, fail to do so. To address this issue, this paper proposes a low-complexity SR method, RTSR, designed to enhance the visual quality of compressed video content, focusing on resolution up-scaling from a) 360p to 1080p and from b) 540p to 4K. The proposed approach utilizes a CNN-based network architecture, which was optimized for AV1 (SVT)-encoded content at various quantization levels based on a dual-teacher knowledge distillation method. This method was submitted to the AIM 2024 Video Super-Resolution Challenge, specifically targeting the Efficient/Mobile Real-Time Video Super-Resolution competition. It achieved the best trade-off between complexity and coding performance (measured in PSNR, SSIM and VMAF) among all six submissions. The code will be available soon.
- [44] arXiv:2411.13383 [pdf, html, other]
-
Title: Adversarial Diffusion Compression for Real-World Image Super-ResolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Real-world image super-resolution (Real-ISR) aims to reconstruct high-resolution images from low-resolution inputs degraded by complex, unknown processes. While many Stable Diffusion (SD)-based Real-ISR methods have achieved remarkable success, their slow, multi-step inference hinders practical deployment. Recent SD-based one-step networks like OSEDiff and S3Diff alleviate this issue but still incur high computational costs due to their reliance on large pretrained SD models. This paper proposes a novel Real-ISR method, AdcSR, by distilling the one-step diffusion network OSEDiff into a streamlined diffusion-GAN model under our Adversarial Diffusion Compression (ADC) framework. We meticulously examine the modules of OSEDiff, categorizing them into two types: (1) Removable (VAE encoder, prompt extractor, text encoder, etc.) and (2) Prunable (denoising UNet and VAE decoder). Since direct removal and pruning can degrade the model's generation capability, we pretrain our pruned VAE decoder to restore its ability to decode images and employ adversarial distillation to compensate for performance loss. This ADC-based diffusion-GAN hybrid design effectively reduces complexity by 73% in inference time, 78% in computation, and 74% in parameters, while preserving the model's generation capability. Experiments manifest that our proposed AdcSR achieves competitive recovery quality on both synthetic and real-world datasets, offering up to 9.3$\times$ speedup over previous one-step diffusion-based methods. Code and models will be made available.
- [45] arXiv:2411.13404 [pdf, other]
-
Title: Issues with Input-Space Representation in Nonlinear Data-Based Dissipativity EstimationComments: Preprint of conference manuscript, currently under reviewSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
In data-based control, dissipativity can be a powerful tool for attaining stability guarantees for nonlinear systems if that dissipativity can be inferred from data. This work provides a tutorial on several existing methods for data-based dissipativity estimation of nonlinear systems. The interplay between the underlying assumptions of these methods and their sample complexity is investigated. It is shown that methods based on delta-covering result in an intractable trade-off between sample complexity and robustness. A new method is proposed to quantify the robustness of machine learning-based dissipativity estimation. It is shown that this method achieves a more tractable trade-off between robustness and sample complexity. Several numerical case studies demonstrate the results.
- [46] arXiv:2411.13456 [pdf, other]
-
Title: Why Anticipatory Sensing Matters in Commercial ACC Systems under Cut-In Scenarios: A Perspective from Stochastic Safety AnalysisSubjects: Systems and Control (eess.SY)
This study presents an analytical solution for the vehicle state evolution of Adaptive Cruise Control (ACC) systems under cut-in scenarios, incorporating sensing delays and anticipation using the Lambert W function. The theoretical analysis demonstrates that the vehicle state evolution and the corresponding safety of ACC in cut-in situations are influenced by multiple factors, including the original leading vehicle's state, the initial conditions of the cut-in vehicle, subsequent cut-in maneuvers, sensing delays, and the ACC's anticipation capabilities. To quantitatively assess these influences, a series of numerical experiments were conducted to perform a stochastic safety analysis of ACC systems, accounting for embedded sensing delays and anticipation, using empirically calibrated control parameters from real-world data. The experiments revealed that the impact of sensing delays on ACC is multifaceted. Specifically, sensing delays negatively affect ACC stability, with the severity increasing as the delay lengthens. Furthermore, collision risk in cut-in scenarios becomes more significant with sensing delays, particularly when the cut-in vehicle is slower than the following vehicle and when cut-ins are aggressive. However, anticipation plays a crucial role in mitigating these risks. Even with a 0.6-second anticipation, collision risk can be reduced by 91% in highly adverse scenarios. Finally, both sensing delays and anticipation have effects that intensify with their duration. An anticipation period of 2 seconds effectively ensures safety in aggressive cut-in conditions, even in the presence of sensing delays.
- [47] arXiv:2411.13475 [pdf, other]
-
Title: Efficient and Physically-Consistent Modeling of Reconfigurable Electromagnetic StructuresComments: Submitted to a journalSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Reconfigurable electromagnetic structures (REMSs), such as reconfigurable reflectarrays (RRAs) or reconfigurable intelligent surfaces (RISs), hold significant potential to improve wireless communication and sensing systems. Even though several REMS modeling approaches have been proposed in recent years, the literature lacks models that are both computationally efficient and physically consistent. As a result, algorithms that control the reconfigurable elements of REMSs (e.g., the phase shifts of an RIS) are often built on simplistic models that are inaccurate. To enable physically accurate REMS-parameter tuning, we present a new framework for efficient and physically consistent modeling of general REMSs. Our modeling method combines a circuit-theoretic approach with a new formalism that describes a REMS's interaction with the electromagnetic (EM) waves in its far-field region. Our modeling method enables efficient computation of the entire far-field radiation pattern for arbitrary configurations of the REMS reconfigurable elements once a single full-wave EM simulation of the non-reconfigurable parts of the REMS has been performed. The predictions made by the proposed framework align with the physical laws of classical electrodynamics and model effects caused by inter-antenna coupling, non-reciprocal materials, polarization, ohmic losses, matching losses, influence of metallic housings, noise from low-noise amplifiers, and noise arising in or received by antennas. In order to validate the efficiency and accuracy of our modeling approach, we (i) compare our modeling method to EM simulations and (ii) conduct a case study involving a planar RRA that enables simultaneous multiuser beam- and null-forming using a new, computationally efficient, and physically accurate parameter tuning algorithm.
- [48] arXiv:2411.13490 [pdf, html, other]
-
Title: Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative OperationsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Alzheimer's disease (AD) is characterized by progressive neurodegeneration and results in detrimental structural changes in human brains. Detecting these changes is crucial for early diagnosis and timely intervention of disease progression. Jacobian maps, derived from spatial normalization in voxel-based morphometry (VBM), have been instrumental in interpreting volume alterations associated with AD. However, the computational cost of generating Jacobian maps limits its clinical adoption. In this study, we explore alternative methods and propose Sobel kernel angle difference (SKAD) as a computationally efficient alternative. SKAD is a derivative operation that offers an optimized approach to quantifying volumetric alterations through localized analysis of the gradients. By efficiently extracting gradient amplitude changes at critical spatial regions, this derivative operation captures regional volume variations Evaluation of SKAD over various medical datasets demonstrates that it is 6.3x faster than Jacobian maps while still maintaining comparable accuracy. This makes it an efficient and competitive approach in neuroimaging research and clinical practice.
- [49] arXiv:2411.13535 [pdf, other]
-
Title: Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the CervixComments: 15 pages, 4 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The cervix is the narrow end of the uterus that connects to the vagina in the female reproductive system. Abnormal cell growth in the squamous epithelial lining of the cervix leads to cervical cancer in females. A Pap smear is a diagnostic procedure used to detect cervical cancer by gently collecting cells from the surface of the cervix with a small brush and analyzing their changes under a microscope. For population-based cervical cancer screening, visual inspection with acetic acid is a cost-effective method with high sensitivity. However, Pap smears are also suitable for mass screening due to their higher specificity. The current Pap smear analysis method is manual, time-consuming, labor-intensive, and prone to human error. Therefore, an artificial intelligence (AI)-based approach for automatic cell classification is needed. In this study, we aimed to classify cells in Pap smear images into five categories: superficial-intermediate, parabasal, koilocytes, dyskeratotic, and metaplastic. Various machine learning (ML) algorithms, including Gradient Boosting, Random Forest, Support Vector Machine, and k-Nearest Neighbor, as well as deep learning (DL) approaches like ResNet-50, were employed for this classification task. The ML models demonstrated high classification accuracy; however, ResNet-50 outperformed the others, achieving a classification accuracy of 93.06%. This study highlights the efficiency of DL models for cell-level classification and their potential to aid in the early diagnosis of cervical cancer from Pap smear images.
New submissions (showing 49 of 49 entries)
- [50] arXiv:2411.12791 (cross-list from cs.CV) [pdf, html, other]
-
Title: Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality AssessmentSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LMMs are primarily trained for high-level tasks (e.g., image captioning), emphasizing unified image semantics extraction under varied quality. Such semantic-aware yet quality-insensitive perception bias inevitably leads to a heavy reliance on image semantics when those LMMs are forced for quality rating. In this paper, instead of retraining or tuning an LMM costly, we propose a training-free debiasing framework, in which the image quality prediction is rectified by mitigating the bias caused by image semantics. Specifically, we first explore several semantic-preserving distortions that can significantly degrade image quality while maintaining identifiable semantics. By applying these specific distortions to the query or test images, we ensure that the degraded images are recognized as poor quality while their semantics remain. During quality inference, both a query image and its corresponding degraded version are fed to the LMM along with a prompt indicating that the query image quality should be inferred under the condition that the degraded one is deemed poor this http URL prior condition effectively aligns the LMM's quality perception, as all degraded images are consistently rated as poor quality, regardless of their semantic this http URL, the quality scores of the query image inferred under different prior conditions (degraded versions) are aggregated using a conditional probability model. Extensive experiments on various IQA datasets show that our debiasing framework could consistently enhance the LMM performance and the code will be publicly available.
- [51] arXiv:2411.12846 (cross-list from cs.CY) [pdf, html, other]
-
Title: Towards Fairness in AI for Melanoma Detection: Systemic Review and RecommendationsComments: 22 pages, 4 figures, 7 tables,accepted for publication in Future of Information and Communication Conference (FICC) 2025, whose proceedings will be published in 'Lecture Notes in Networks and Systems' by Springer NatureSubjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Image and Video Processing (eess.IV)
Early and accurate melanoma detection is crucial for improving patient outcomes. Recent advancements in artificial intelligence AI have shown promise in this area, but the technologys effectiveness across diverse skin tones remains a critical challenge. This study conducts a systematic review and preliminary analysis of AI based melanoma detection research published between 2013 and 2024, focusing on deep learning methodologies, datasets, and skin tone representation. Our findings indicate that while AI can enhance melanoma detection, there is a significant bias towards lighter skin tones. To address this, we propose including skin hue in addition to skin tone as represented by the LOreal Color Chart Map for a more comprehensive skin tone assessment technique. This research highlights the need for diverse datasets and robust evaluation metrics to develop AI models that are equitable and effective for all patients. By adopting best practices outlined in a PRISMA Equity framework tailored for healthcare and melanoma detection, we can work towards reducing disparities in melanoma outcomes.
- [52] arXiv:2411.12888 (cross-list from cs.IT) [pdf, html, other]
-
Title: An Experimental Multi-Band Channel Characterization in the Upper Mid-BandRoberto Bomfin, Ahmad Bazzi, Hao Guo, Hyeongtaek Lee, Marco Mezzavilla, Sundeep Rangan, Junil Choi, Marwa ChafiiSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The following paper provides a multi-band channel measurement analysis on the frequency range (FR)3. This study focuses on the FR3 low frequencies 6.5 GHz and 8.75 GHz with a setup tailored to the context of integrated sensing and communication (ISAC), where the data are collected with and without the presence of a target. A method based on multiple signal classification (MUSIC) is used to refine the delays of the channel impulse response estimates. The results reveal that the channel at the lower frequency 6.5 GHz has additional distinguishable multipath components in the presence of the target, while the one associated with the higher frequency 8.75 GHz has more blockage. The set of results reported in this paper serves as a benchmark for future multi-band studies in the FR3 spectrum.
- [53] arXiv:2411.12930 (cross-list from cs.LG) [pdf, html, other]
-
Title: LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog CircuitsSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Traditional approaches for designing analog circuits are time-consuming and require significant human expertise. Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes. In our work, we introduce a novel approach, LEDRO, utilizing Large Language Models (LLMs) in conjunction with optimization techniques to iteratively refine the design space for analog circuit sizing. LEDRO is highly generalizable compared to other RL and BO baselines, eliminating the need for design annotation or model training for different topologies or technology nodes. We conduct a comprehensive evaluation of our proposed framework and baseline on 22 different Op-Amp topologies across four FinFET technology nodes. Results demonstrate the superior performance of LEDRO as it outperforms our best baseline by an average of 13% FoM improvement with 2.15x speed-up on low complexity Op-Amps and 48% FoM improvement with 1.7x speed-up on high complexity Op-Amps. This highlights LEDRO's effective performance, efficiency, and generalizability.
- [54] arXiv:2411.12962 (cross-list from cs.RO) [pdf, html, other]
-
Title: Bring the Heat: Rapid Trajectory Optimization with Pseudospectral Techniques and the Affine Geometric Heat Flow EquationComments: 26 pages, 8 figuresSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Generating optimal trajectories for high-dimensional robotic systems in a time-efficient manner while adhering to constraints is a challenging task. To address this challenge, this paper introduces PHLAME, which applies pseudospectral collocation and spatial vector algebra to efficiently solve the Affine Geometric Heat Flow (AGHF) Partial Differential Equation (PDE) for trajectory optimization. Unlike traditional PDE approaches like the Hamilton-Jacobi-Bellman (HJB) PDE, which solve for a function over the entire state space, computing a solution to the AGHF PDE scales more efficiently because its solution is defined over a two-dimensional domain, thereby avoiding the intractability of state-space scaling. To solve the AGHF one usually applies the Method of Lines (MOL), which works by discretizing one variable of the AGHF PDE, effectively converting the PDE into a system of ordinary differential equations (ODEs) that can be solved using standard time-integration methods. Though powerful, this method requires a fine discretization to generate accurate solutions and still requires evaluating the AGHF PDE which can be computationally expensive for high-dimensional systems. PHLAME overcomes this deficiency by using a pseudospectral method, which reduces the number of function evaluations required to yield a high accuracy solution thereby allowing it to scale efficiently to high-dimensional robotic systems. To further increase computational speed, this paper presents analytical expressions for the AGHF and its Jacobian, both of which can be computed efficiently using rigid body dynamics algorithms. The proposed method PHLAME is tested across various dynamical systems, with and without obstacles and compared to a number of state-of-the-art techniques. PHLAME generates trajectories for a 44-dimensional state-space system in $\sim3$ seconds, much faster than current state-of-the-art techniques.
- [55] arXiv:2411.12970 (cross-list from cs.RO) [pdf, html, other]
-
Title: Validation of Tumbling Robot Dynamics with Posture Manipulation for Closed-Loop Heading Angle ControlSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Navigating rugged terrain and steep slopes is a challenge for mobile robots. Conventional legged and wheeled systems struggle with these environments due to limited traction and stability. Northeastern University's COBRA (Crater Observing Bio-inspired Rolling Articulator), a novel multi-modal snake-like robot, addresses these issues by combining traditional snake gaits for locomotion on flat and inclined surfaces with a tumbling mode for controlled descent on steep slopes. Through dynamic posture manipulation, COBRA can modulate its heading angle and velocity during tumbling. This paper presents a reduced-order cascade model for COBRA's tumbling locomotion and validates it against a high-fidelity rigid-body simulation, presenting simulation results that show that the model captures key system dynamics.
- [56] arXiv:2411.13000 (cross-list from cs.IT) [pdf, html, other]
-
Title: NCAirFL: CSI-Free Over-the-Air Federated Learning Based on Non-Coherent DetectionComments: 6 pages, 2 figures, submitted for possible publicationSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Over-the-air federated learning (FL), i.e., AirFL, leverages computing primitively over multiple access channels. A long-standing challenge in AirFL is to achieve coherent signal alignment without relying on expensive channel estimation and feedback. This paper proposes NCAirFL, a CSI-free AirFL scheme based on unbiased non-coherent detection at the edge server. By exploiting binary dithering and a long-term memory based error-compensation mechanism, NCAirFL achieves a convergence rate of order $\mathcal{O}(1/\sqrt{T})$ in terms of the average square norm of the gradient for general non-convex and smooth objectives, where $T$ is the number of communication rounds. Experiments demonstrate the competitive performance of NCAirFL compared to vanilla FL with ideal communications and to coherent transmission-based benchmarks.
- [57] arXiv:2411.13042 (cross-list from cs.CV) [pdf, html, other]
-
Title: Attentive Contextual Attention for Cloud RemovalComments: 13 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cloud cover can significantly hinder the use of remote sensing images for Earth observation, prompting urgent advancements in cloud removal technology. Recently, deep learning strategies have shown strong potential in restoring cloud-obscured areas. These methods utilize convolution to extract intricate local features and attention mechanisms to gather long-range information, improving the overall comprehension of the scene. However, a common drawback of these approaches is that the resulting images often suffer from blurriness, artifacts, and inconsistencies. This is partly because attention mechanisms apply weights to all features based on generalized similarity scores, which can inadvertently introduce noise and irrelevant details from cloud-covered areas. To overcome this limitation and better capture relevant distant context, we introduce a novel approach named Attentive Contextual Attention (AC-Attention). This method enhances conventional attention mechanisms by dynamically learning data-driven attentive selection scores, enabling it to filter out noise and irrelevant features effectively. By integrating the AC-Attention module into the DSen2-CR cloud removal framework, we significantly improve the model's ability to capture essential distant information, leading to more effective cloud removal. Our extensive evaluation of various datasets shows that our method outperforms existing ones regarding image reconstruction quality. Additionally, we conducted ablation studies by integrating AC-Attention into multiple existing methods and widely used network architectures. These studies demonstrate the effectiveness and adaptability of AC-Attention and reveal its ability to focus on relevant features, thereby improving the overall performance of the networks. The code is available at \url{this https URL}.
- [58] arXiv:2411.13081 (cross-list from cs.CV) [pdf, html, other]
-
Title: Practical Compact Deep Compressed SensingComments: Accepted by IEEE T-PAMISubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Recent years have witnessed the success of deep networks in compressed sensing (CS), which allows for a significant reduction in sampling cost and has gained growing attention since its inception. In this paper, we propose a new practical and compact network dubbed PCNet for general image CS. Specifically, in PCNet, a novel collaborative sampling operator is designed, which consists of a deep conditional filtering step and a dual-branch fast sampling step. The former learns an implicit representation of a linear transformation matrix into a few convolutions and first performs adaptive local filtering on the input image, while the latter then uses a discrete cosine transform and a scrambled block-diagonal Gaussian matrix to generate under-sampled measurements. Our PCNet is equipped with an enhanced proximal gradient descent algorithm-unrolled network for reconstruction. It offers flexibility, interpretability, and strong recovery performance for arbitrary sampling rates once trained. Additionally, we provide a deployment-oriented extraction scheme for single-pixel CS imaging systems, which allows for the convenient conversion of any linear sampling operator to its matrix form to be loaded onto hardware like digital micro-mirror devices. Extensive experiments on natural image CS, quantized CS, and self-supervised CS demonstrate the superior reconstruction accuracy and generalization ability of PCNet compared to existing state-of-the-art methods, particularly for high-resolution images. Code is available at this https URL.
- [59] arXiv:2411.13089 (cross-list from cs.CV) [pdf, html, other]
-
Title: ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked DemonstrationsComments: Accepted by the 26th IEEE International Conference on High Performance Computing and Communications (HPCC2024)Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper proposes a novel 3D speech-to-animation (STA) generation framework designed to address the shortcomings of existing models in producing diverse and emotionally resonant animations. Current STA models often generate animations that lack emotional depth and variety, failing to align with human expectations. To overcome these limitations, we introduce a novel STA model coupled with a reward model. This combination enables the decoupling of emotion and content under audio conditions through a cross-coupling training approach. Additionally, we develop a training methodology that leverages automatic quality evaluation of generated facial animations to guide the reinforcement learning process. This methodology encourages the STA model to explore a broader range of possibilities, resulting in the generation of diverse and emotionally expressive facial animations of superior quality. We conduct extensive empirical experiments on a benchmark dataset, and the results validate the effectiveness of our proposed framework in generating high-quality, emotionally rich 3D animations that are better aligned with human preferences.
- [60] arXiv:2411.13137 (cross-list from cs.LG) [pdf, html, other]
-
Title: Domain Adaptive Unfolded Graph Neural NetworksSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Over the last decade, graph neural networks (GNNs) have made significant progress in numerous graph machine learning tasks. In real-world applications, where domain shifts occur and labels are often unavailable for a new target domain, graph domain adaptation (GDA) approaches have been proposed to facilitate knowledge transfer from the source domain to the target domain. Previous efforts in tackling distribution shifts across domains have mainly focused on aligning the node embedding distributions generated by the GNNs in the source and target domains. However, as the core part of GDA approaches, the impact of the underlying GNN architecture has received limited attention. In this work, we explore this orthogonal direction, i.e., how to facilitate GDA with architectural enhancement. In particular, we consider a class of GNNs that are designed explicitly based on optimization problems, namely unfolded GNNs (UGNNs), whose training process can be represented as bi-level optimization. Empirical and theoretical analyses demonstrate that when transferring from the source domain to the target domain, the lower-level objective value generated by the UGNNs significantly increases, resulting in an increase in the upper-level objective as well. Motivated by this observation, we propose a simple yet effective strategy called cascaded propagation (CP), which is guaranteed to decrease the lower-level objective value. The CP strategy is widely applicable to general UGNNs, and we evaluate its efficacy with three representative UGNN architectures. Extensive experiments on five real-world datasets demonstrate that the UGNNs integrated with CP outperform state-of-the-art GDA baselines.
- [61] arXiv:2411.13159 (cross-list from cs.CL) [pdf, html, other]
-
Title: Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLMJiawei Yu, Yuang Li, Xiaosong Qiao, Huan Zhao, Xiaofeng Zhao, Wei Tang, Min Zhang, Hao Yang, Jinsong SuSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Text-to-speech (TTS) models have been widely adopted to enhance automatic speech recognition (ASR) systems using text-only corpora, thereby reducing the cost of labeling real speech data. Existing research primarily utilizes additional text data and predefined speech styles supported by TTS models. In this paper, we propose Hard-Synth, a novel ASR data augmentation method that leverages large language models (LLMs) and advanced zero-shot TTS. Our approach employs LLMs to generate diverse in-domain text through rewriting, without relying on additional text data. Rather than using predefined speech styles, we introduce a hard prompt selection method with zero-shot TTS to clone speech styles that the ASR model finds challenging to recognize. Experiments demonstrate that Hard-Synth significantly enhances the Conformer model, achieving relative word error rate (WER) reductions of 6.5\%/4.4\% on LibriSpeech dev/test-other subsets. Additionally, we show that Hard-Synth is data-efficient and capable of reducing bias in ASR.
- [62] arXiv:2411.13179 (cross-list from cs.SD) [pdf, other]
-
Title: SONNET: Enhancing Time Delay Estimation by Leveraging Simulated AudioSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied, and that captures the relevant characteristics of the real world problem. We provide our trained model, SONNET (Simulation Optimized Neural Network Estimator of Timeshifts), which is runnable in real-time and works on novel data out of the box for many real data applications, i.e. without re-training. We further demonstrate greatly improved performance on the downstream task of self-calibration when using our model compared to classical methods.
- [63] arXiv:2411.13201 (cross-list from cs.IT) [pdf, other]
-
Title: Simultaneous Communication and Tracking using Fused Bistatic MeasurementsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, we propose a bistatic sensing-assisted beam tracking method for simultaneous communication and tracking of user vehicles navigating arbitrary-shaped road trajectories. Prior work on simultaneous communication and tracking assumes a colocated radar receiver at the transmitter for sensing measurements using the reflected Integrated Sensing and Communication (ISAC) signals in the mmWave band. Full isolation between transmitter and receiver is required here to avoid self-interference. We consider the bistatic setting where the sensing receivers are not colocated and can be realized in practice using traditional half-duplex transmit or receive nodes. First, we process the echoes reflected from the vehicle at multiple multi-antenna nodes at various locations, facilitating estimation of the vehicle's current position. Then, we propose selection criteria for the estimates and a maximum likelihood (ML) fusion scheme to fuse these selected estimates based on the estimated error covariance matrices of these measurements. This fusion scheme is important in bistatic and multistatic settings as the localization error depends significantly on the geometry of the transmitter, target, and receiver locations. Finally, we predict the vehicle's next location using a simple kinematic equation-based model. Through extensive simulation, we study the average spectral efficiency of communication with a moving user using the proposed simultaneous communication and tracking scheme. The proposed fusion-based scheme achieves almost the same average spectral efficiency as an ideal scheme that knows the exact trajectory. We also show that the proposed scheme can be easily extended to systems with Hybrid Digital-Analog architectures and performs similarly even in these systems.
- [64] arXiv:2411.13209 (cross-list from cs.SD) [pdf, html, other]
-
Title: Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait SynthesisPegah Salehi, Sajad Amouei Sheshkal, Vajira Thambawita, Sushant Gautam, Saeed S. Sabet, Dag Johansen, Michael A. Riegler, Pål HalvorsenComments: 16 pages, 6 figures, 3 tables. submitted to MDPI journal in as Big Data and Cognitive ComputingSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
This paper examines the integration of real-time talking-head generation for interviewer training, focusing on overcoming challenges in Audio Feature Extraction (AFE), which often introduces latency and limits responsiveness in real-time applications. To address these issues, we propose and implement a fully integrated system that replaces conventional AFE models with Open AI's Whisper, leveraging its encoder to optimize processing and improve overall system efficiency. Our evaluation of two open-source real-time models across three different datasets shows that Whisper not only accelerates processing but also improves specific aspects of rendering quality, resulting in more realistic and responsive talking-head interactions. These advancements make the system a more effective tool for immersive, interactive training applications, expanding the potential of AI-driven avatars in interviewer training.
- [65] arXiv:2411.13224 (cross-list from cs.HC) [pdf, other]
-
Title: Building music with Lego bricks and Raspberry PiComments: 21 pagesJournal-ref: Multimedia Tools and Applications, 83, 10503-10523, 2024Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this paper, a system to build music in an intuitive and accessible way, with Lego bricks, is presented. The system makes use of the new powerful and cheap possibilities that technology offers for making old things in a new way. The Raspberry Pi is used to control the system and run the necessary algorithms, customized Lego bricks are used for building melodies, custom electronic designs, software pieces and 3D printed parts complete the items employed. The system designed is modular, it allows creating melodies with chords and percussion or just melodies or perform as a beatbox or a melody box. The main interaction with the system is made using Lego-type building blocks. Tests have demonstrated its versatility and ease of use, as well as its usefulness in music learning for both children and adults.
- [66] arXiv:2411.13234 (cross-list from math.OC) [pdf, html, other]
-
Title: Extremum and Nash Equilibrium Seeking with Delays and PDEs: Designs & ApplicationsComments: Preprint submitted to IEEE Control Systems Magazine (Special Issue: Into the Second Century of Extremum Seeking Control, 38 pages and 34 figures)Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The development of extremum seeking (ES) has progressed, over the past hundred years, from static maps, to finite-dimensional dynamic systems, to networks of static and dynamic agents. Extensions from ODE dynamics to maps and agents that incorporate delays or even partial differential equations (PDEs) is the next natural step in that progression through ascending research challenges. This paper reviews results on algorithm design and theory of ES for such infinite-dimensional systems. Both hyperbolic and parabolic dynamics are presented: delays or transport equations, heat-dominated equation, wave equations, and reaction-advection-diffusion equations. Nash equilibrium seeking (NES) methods are introduced for noncooperative game scenarios of the model-free kind and then specialized to single-agent optimization. Even heterogeneous PDE games, such as a duopoly with one parabolic and one hyperbolic agent, are considered. Several engineering applications are touched upon for illustration, including flow-traffic control for urban mobility, oil-drilling systems, deep-sea cable-actuated source seeking, additive manufacturing modeled by the Stefan PDE, biological reactors, light-source seeking with flexible-beam structures, and neuromuscular electrical stimulation.
- [67] arXiv:2411.13276 (cross-list from math.OC) [pdf, html, other]
-
Title: Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play AlgorithmsSubjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
In this work we study the behavior of the forward-backward (FB) algorithm when the proximity operator is replaced by a sub-iterative procedure to approximate a Gaussian denoiser, in a Plug-and-Play (PnP) fashion. In particular, we consider both analysis and synthesis Gaussian denoisers within a dictionary framework, obtained by unrolling dual-FB iterations or FB iterations, respectively. We analyze the associated minimization problems as well as the asymptotic behavior of the resulting FB-PnP iterations. In particular, we show that the synthesis Gaussian denoising problem can be viewed as a proximity operator. For each case, analysis and synthesis, we show that the FB-PnP algorithms solve the same problem whether we use only one or an infinite number of sub-iteration to solve the denoising problem at each iteration. To this aim, we show that each "one sub-iteration" strategy within the FB-PnP can be interpreted as a primal-dual algorithm when a warm-restart strategy is used. We further present similar results when using a Moreau-Yosida smoothing of the global problem, for an arbitrary number of sub-iterations. Finally, we provide numerical simulations to illustrate our theoretical results. In particular we first consider a toy compressive sensing example, as well as an image restoration problem in a deep dictionary framework.
- [68] arXiv:2411.13314 (cross-list from cs.SD) [pdf, html, other]
-
Title: I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial PerceptionComments: 5pages,4figuresSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Controlling the style and characteristics of speech synthesis is crucial for adapting the output to specific contexts and user requirements. Previous Text-to-speech (TTS) works have focused primarily on the technical aspects of producing natural-sounding speech, such as intonation, rhythm, and clarity. However, they overlook the fact that there is a growing emphasis on spatial perception of synthesized speech, which may provide immersive experience in gaming and virtual reality. To solve this issue, in this paper, we present a novel multi-modal TTS approach, namely Image-indicated Immersive Text-to-speech Synthesis (I2TTS). Specifically, we introduce a scene prompt encoder that integrates visual scene prompts directly into the synthesis pipeline to control the speech generation process. Additionally, we propose a reverberation classification and refinement technique that adjusts the synthesized mel-spectrogram to enhance the immersive experience, ensuring that the involved reverberation condition matches the scene accurately. Experimental results demonstrate that our model achieves high-quality scene and spatial matching without compromising speech naturalness, marking a significant advancement in the field of context-aware speech synthesis. Project demo page: this https URL Index Terms-Speech synthesis, scene prompt, spatial perception
- [69] arXiv:2411.13360 (cross-list from cs.IT) [pdf, html, other]
-
Title: Geometry-informed Channel Statistics Prediction Based upon Uncalibrated Digital TwinsComments: 6 pages, 10 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Digital twins (DTs) of wireless environments can be utilized to predict the propagation channel and reduce the overhead of required to estimate the channel statistics. However, direct channel prediction requires data-intensive calibration of the DT to capture the environment properties relevant for propagation of electromagnetic signals. We introduce a framework that starts from a satellite image of the environment to produce an uncalibrated DT, which has no or imprecise information about the materials and their electromagnetic properties. The key idea is to use the uncalibrated DT to implicitly provide a geometric prior for the environment. This is utilized to inform a Gaussian process (GP), which permits the use of few channel measurements to attain an accurate prediction of the channel statistics. Additionally, the framework is able to quantify the uncertainty in channel statistics prediction and select rate in ultra-reliable low-latency communication (URLLC) that complies with statistical guarantees. The efficacy of the proposed geometry-informed GP is validated using experimental data obtained through a measurement campaign. Furthermore, the proposed prediction framework is shown to provide significant improvements compared to the benchmarks where i) direct channel statistics prediction is obtained using an uncalibrated DT and (ii) the GP predicts channel statistics using information about the location.
- [70] arXiv:2411.13365 (cross-list from cs.AI) [pdf, html, other]
-
Title: Explainable Finite-Memory Policies for Partially Observable Markov Decision ProcessesComments: Preprint -- Under ReviewSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Partially Observable Markov Decision Processes (POMDPs) are a fundamental framework for decision-making under uncertainty and partial observability. Since in general optimal policies may require infinite memory, they are hard to implement and often render most problems undecidable. Consequently, finite-memory policies are mostly considered instead. However, the algorithms for computing them are typically very complex, and so are the resulting policies. Facing the need for their explainability, we provide a representation of such policies, both (i) in an interpretable formalism and (ii) typically of smaller size, together yielding higher explainability. To that end, we combine models of Mealy machines and decision trees; the latter describing simple, stationary parts of the policies and the former describing how to switch among them. We design a translation for policies of the finite-state-controller (FSC) form from standard literature and show how our method smoothly generalizes to other variants of finite-memory policies. Further, we identify specific properties of recently used "attractor-based" policies, which allow us to construct yet simpler and smaller representations. Finally, we illustrate the higher explainability in a few case studies.
- [71] arXiv:2411.13369 (cross-list from cs.RO) [pdf, html, other]
-
Title: REVISE: Robust Probabilistic Motion Planning in a Gaussian Random FieldSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents Robust samplE-based coVarIance StEering (REVISE), a multi-query algorithm that generates robust belief roadmaps for dynamic systems navigating through spatially dependent disturbances modeled as a Gaussian random field. Our proposed method develops a novel robust sample-based covariance steering edge controller to safely steer a robot between state distributions, satisfying state constraints along the trajectory. Our proposed approach also incorporates an edge rewiring step into the belief roadmap construction process, which provably improves the coverage of the belief roadmap. When compared to state-of-the-art methods, REVISE improves median plan accuracy (as measured by Wasserstein distance between the actual and planned final state distribution) by 10x in multi-query planning and reduces median plan cost (as measured by the largest eigenvalue of the planned state covariance at the goal) by 2.5x in single-query planning for a 6DoF system. We will release our code at this https URL.
- [72] arXiv:2411.13424 (cross-list from cs.SD) [pdf, other]
-
Title: CAFE A Novel Code switching Dataset for Algerian Dialect French and EnglishHoussam Eddine-Othman Lachemat, Akli Abbas, Nourredine Oukas, Yassine El Kheir, Samia Haboussi, Absar Showdhury ShammurComments: 24 pages, submitted to tallipSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
The paper introduces and publicly releases (Data download link available after acceptance) CAFE -- the first Code-switching dataset between Algerian dialect, French, and english languages. The CAFE speech data is unique for (a) its spontaneous speaking style in vivo human-human conversation capturing phenomena like code-switching and overlapping speech, (b) addresses distinct linguistic challenges in North African Arabic dialect; (c) the CAFE captures dialectal variations from various parts of Algeria within different sociolinguistic contexts. CAFE data contains approximately 37 hours of speech, with a subset, CAFE-small, of 2 hours and 36 minutes released with manual human annotation including speech segmentation, transcription, explicit annotation of code-switching points, overlapping speech, and other events such as noises, and laughter among others. The rest approximately 34.58 hours contain pseudo label transcriptions. In addition to the data release, the paper also highlighted the challenges of using state-of-the-art Automatic Speech Recognition (ASR) models such as Whisper large-v2,3 and PromptingWhisper to handle such content. Following, we benchmark CAFE data with the aforementioned Whisper models and show how well-designed data processing pipelines and advanced decoding techniques can improve the ASR performance in terms of Mixed Error Rate (MER) of 0.310, Character Error Rate (CER) of 0.329 and Word Error Rate (WER) of 0.538.
- [73] arXiv:2411.13440 (cross-list from cs.NI) [pdf, html, other]
-
Title: Eco-Friendly 0G Networks: Unlocking the Power of Backscatter Communications for a Greener FutureSubjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET); Signal Processing (eess.SP)
Backscatter Communication (BackCom) technology has emerged as a promising paradigm for the Green Internet of Things (IoT) ecosystem, offering advantages such as low power consumption, cost-effectiveness, and ease of deployment. While traditional BackCom systems, such as RFID technology, have found widespread applications, the advent of ambient backscatter presents new opportunities for expanding applications and enhancing capabilities. Moreover, ongoing standardization efforts are actively focusing on BackCom technologies, positioning them as a potential solution to meet the near-zero power consumption and massive connectivity requirements of next-generation wireless systems. 0G networks have the potential to provide advanced solutions by leveraging BackCom technology to deliver ultra-low-power, ubiquitous connectivity for the expanding IoT ecosystem, supporting billions of devices with minimal energy consumption. This paper investigates the integration of BackCom and 0G networks to enhance the capabilities of traditional BackCom systems and enable Green IoT. We conduct an in-depth analysis of BackCom-enabled 0G networks, exploring their architecture and operational objectives, and also explore the Waste Factor (WF) metric for evaluating energy efficiency and minimizing energy waste within integrated systems. By examining both structural and operational aspects, we demonstrate how this synergy enhances the performance, scalability, and sustainability of next-generation wireless networks. Moreover, we highlight possible applications, open challenges, and future directions, offering valuable insights for guiding future research and practical implementations aimed at achieving large-scale, sustainable IoT deployments.
- [74] arXiv:2411.13441 (cross-list from cs.DC) [pdf, html, other]
-
Title: A Case Study of API Design for Interoperability and Security of the Internet of ThingsComments: To appear in Proceedings of the 2nd EAI International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles (SmartSP 2024)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Heterogeneous distributed systems, including the Internet of Things (IoT) or distributed cyber-physical systems (CPS), often suffer a lack of interoperability and security, which hinders the wider deployment of such systems. Specifically, the different levels of security requirements and the heterogeneity in terms of communication models, for instance, point-to-point vs. publish-subscribe, are the example challenges of IoT and distributed CPS consisting of heterogeneous devices and applications. In this paper, we propose a working application programming interface (API) and runtime to enhance interoperability and security while addressing the challenges that stem from the heterogeneity in the IoT and distributed CPS. In our case study, we design and implement our application programming interface (API) design approach using open-source software, and with our working implementation, we evaluate the effectiveness of our proposed approach. Our experimental results suggest that our approach can achieve both interoperability and security in the IoT and distributed CPS with a reasonably small overhead and better-managed software.
- [75] arXiv:2411.13506 (cross-list from cs.RO) [pdf, html, other]
-
Title: Bezier Reachable Polytopes: Efficient Certificates for Robust Motion Planning with Layered ArchitecturesSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Control architectures are often implemented in a layered fashion, combining independently designed blocks to achieve complex tasks. Providing guarantees for such hierarchical frameworks requires considering the capabilities and limitations of each layer and their interconnections at design time. To address this holistic design challenge, we introduce the notion of Bezier Reachable Polytopes -- certificates of reachable points in the space of Bezier polynomial reference trajectories. This approach captures the set of trajectories that can be tracked by a low-level controller while satisfying state and input constraints, and leverages the geometric properties of Bezier polynomials to maintain an efficient polytopic representation. As a result, these certificates serve as a constructive tool for layered architectures, enabling long-horizon tasks to be reasoned about in a computationally tractable manner.
- [76] arXiv:2411.13507 (cross-list from cs.RO) [pdf, html, other]
-
Title: Dynamically Feasible Path Planning in Cluttered Environments via Reachable Bezier PolytopesComments: 7 pages, 6 figures, submitted to ICRA 2025Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
The deployment of robotic systems in real world environments requires the ability to quickly produce paths through cluttered, non-convex spaces. These planned trajectories must be both kinematically feasible (i.e., collision free) and dynamically feasible (i.e., satisfy the underlying system dynamics), necessitating a consideration of both the free space and the dynamics of the robot in the path planning phase. In this work, we explore the application of reachable Bezier polytopes as an efficient tool for generating trajectories satisfying both kinematic and dynamic requirements. Furthermore, we demonstrate that by offloading specific computation tasks to the GPU, such an algorithm can meet tight real time requirements. We propose a layered control architecture that efficiently produces collision free and dynamically feasible paths for nonlinear control systems, and demonstrate the framework on the tasks of 3D hopping in a cluttered environment.
Cross submissions (showing 27 of 27 entries)
- [77] arXiv:2305.13947 (replaced) [pdf, html, other]
-
Title: Deep-Learning-Aided Alternating Least Squares for Tensor CP Decomposition and Its Application to Massive MIMO Channel EstimationSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
CANDECOMP/PARAFAC (CP) decomposition is the mostly used model to formulate the received tensor signal in a massive MIMO system, as the receiver generally sums the components from different paths or users. To achieve accurate and low-latency channel estimation, good and fast CP decomposition (CPD) algorithms are desired. The CP alternating least squares (CPALS) is the workhorse algorithm for calculating the CPD. However, its performance depends on the initializations, and good starting values can lead to more efficient solutions. Existing initialization strategies are decoupled from the CPALS and are not necessarily favorable for solving the CPD. This paper proposes a deep-learning-aided CPALS (DL-CPALS) method that uses a deep neural network (DNN) to generate favorable initializations. The proposed DL-CPALS integrates the DNN and CPALS to a model-based deep learning paradigm, where it trains the DNN to generate an initialization that facilitates fast and accurate CPD. Moreover, benefiting from the CP low-rankness, the proposed method is trained using noisy data and does not require paired clean data. The proposed DL-CPALS is applied to millimeter wave MIMO-OFDM channel estimation. Experimental results demonstrate the significant improvements of the proposed method in terms of both speed and accuracy for CPD and channel estimation.
- [78] arXiv:2307.01081 (replaced) [pdf, html, other]
-
Title: Waveform Optimization and Beam Focusing for Near-field Wireless Power Transfer with Dynamic Metasurface Antennas and Non-linear Energy HarvestersComments: Accepted in IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP)
Radio frequency (RF) wireless power transfer (WPT) is a promising technology for future wireless systems. However, the low power transfer efficiency (PTE) is a critical challenge for practical implementations. One of the main inefficiency sources is the power consumption and loss introduced by key components such as high-power amplifier (HPA) and rectenna, thus they must be carefully considered for PTE optimization. Herein, we consider a near-field RF-WPT system with a dynamic metasurface antenna (DMA) at the transmitter and non-linear energy harvesters. We provide a mathematical framework to calculate the power consumption and harvested power from multi-tone signal transmissions. Based on this, we propose an approach relying on alternating optimization and successive convex approximation for waveform optimization and beam focusing to minimize power consumption while meeting energy harvesting requirements. Numerical results show that increasing the number of transmit tones reduces the power consumption by leveraging the rectifier's non-linearity more efficiently. Moreover, they demonstrate that increasing the antenna length improves the performance, while DMA outperforms fully-digital architecture in terms of power consumption. Finally, our results verify that the transmitter focuses the energy on receivers located in the near-field, while energy beams are formed in the receivers' direction in the far-field region.
- [79] arXiv:2309.00559 (replaced) [pdf, other]
-
Title: Signal Processing and Learning for Next Generation Multiple Access in 6GSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Wireless communication systems to date primarily rely on the orthogonality of resources to facilitate the design and implementation, from user access to data transmission. Emerging applications and scenarios in the sixth generation (6G) wireless systems will require massive connectivity and transmission of a deluge of data, which calls for more flexibility in the design concept that goes beyond orthogonality. Furthermore, recent advances in signal processing and learning, e.g., deep learning, provide promising approaches to deal with complex and previously intractable problems. This article provides an overview of research efforts to date in the field of signal processing and learning for next-generation multiple access, with an emphasis on massive random access and non-orthogonal multiple access. The promising interplay with new technologies and the challenges in learning-based NGMA are discussed.
- [80] arXiv:2311.01995 (replaced) [pdf, html, other]
-
Title: From Discrete to Continuous Binary Best-Response Dynamics: Discrete Fluctuations Almost Surely Vanish with Population SizeComments: Adding Proofs of Theorem 1 and Corollary 4Subjects: Systems and Control (eess.SY)
In binary decision-makings, individuals often go for a common or rare action. In the framework of evolutionary game theory, the best-response update rule can be used to model this dichotomy. Those who prefer a common action are called \emph{coordinators}, and those who prefer a rare one are called \emph{anticoordinators}. A finite mixed population of the two types may undergo perpetual fluctuations, the characterization of which appears to be challenging. It is particularly unknown whether the fluctuations persist as population size grows. To fill this gap, we approximate the discrete finite population dynamics of coordinators and anticoordinators with the associated mean dynamics in the form of semicontinuous differential inclusions. We show that the family of the state sequences of the discrete dynamics for increasing population sizes forms a generalized stochastic approximation process for the differential inclusion. On the other hand, we show that the differential inclusions always converge to an equilibrium. This implies that the reported perpetual fluctuations in the finite discrete dynamics of coordinators and anticoordinators almost surely vanish with population size. The results encourage to first analyze the often simpler semicontinuous mean dynamics of the discrete population dynamics as the semicontinuous dynamics partly reveal the asymptotic behaviour of the discrete dynamics.
- [81] arXiv:2312.15701 (replaced) [pdf, html, other]
-
Title: Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image RestorationComments: Published in TPAMI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'' network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner. In current deep unfolding methods, such a proximal network is generally designed as a CNN architecture, whose necessity has been proven by a recent theory. That is, CNN structure substantially delivers the translational invariant image prior, which is the most universally possessed structural prior across various types of images. However, standard CNN-based proximal networks have essential limitations in capturing the rotation symmetry prior, another universal structural prior underlying general images. This leaves a large room for further performance improvement in deep unfolding approaches. To address this issue, this study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework. Especially, we deduce, for the first time, the theoretical equivariant error for such a designed proximal network with arbitrary layers under arbitrary rotation degrees. This analysis should be the most refined theoretical conclusion for such error evaluation to date and is also indispensable for supporting the rationale behind such networks with intrinsic interpretability requirements.
- [82] arXiv:2401.15562 (replaced) [pdf, html, other]
-
Title: A Survey on Integrated Sensing and Communication with Intelligent Metasurfaces: Trends, Challenges, and OpportunitiesComments: Submitted to IEEE for possible publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
The emergence of technologies demanding high data rates and precise sensing, such as autonomous vehicles and IoT devices, has driven the popularity of integrated sensing and communication (ISAC) in recent years. ISAC provides a framework for communication and sensing, where both functionalities are performed simultaneously or in a coordinated manner. There are two levels of integration in ISAC: radio-communications coexistence (RCC), where communication and radar systems use distinct hardware, waveforms, and signal processing but share the spectrum; and dual-function radar-communications (DFRC), where communication and sensing share the same hardware, waveform, and signal processing. At the architectural level, intelligent metasurfaces are a key enabler for the sixth-generation (6G) of wireless communication due to their ability to control the propagation environment efficiently. With the potential to enhance communication and sensing performance, numerous studies have explored the gains of metasurfaces for ISAC. Moreover, certain ISAC frameworks address limitations associated with reconfigurable intelligent surfaces (RIS) for communication. Thus, integrating ISAC with metasurfaces enhances both technologies. This survey reviews the literature on metasurface-assisted ISAC, detailing challenges and opportunities. To provide a comprehensive overview, we begin with fundamentals of ISAC and metasurfaces. The paper summarizes state-of-the-art studies on metasurface-assisted ISAC, focusing on metasurfaces as separate entities between the transmitter and receiver (known as RIS) and emphasizing RCC and DFRC. We also review work on holographic ISAC, where metasurfaces are part of the transmitter and receiver. For each category, lessons learned, challenges, opportunities, and research directions are highlighted.
- [83] arXiv:2403.10362 (replaced) [pdf, html, other]
-
Title: CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality EnhancementComments: 11 pages, 8 figures, 6 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation (CPGA) network to utilize temporal and spatial information from coding priors. The CPGA mainly consists of an inter-frame temporal aggregation (ITA) module and a multi-scale non-local aggregation (MNA) module. Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames. In addition, to facilitate research in VQE task, we newly construct the Video Coding Priors (VCP) dataset, comprising 300 videos with various coding priors extracted from corresponding bitstreams. It remedies the shortage of previous datasets on the lack of coding information. Experimental results demonstrate the superiority of our method compared to existing state-of-the-art methods. The code and dataset will be released at this https URL .
- [84] arXiv:2406.10082 (replaced) [pdf, html, other]
-
Title: Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationAndrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James GlassComments: Interspeech 2024. V3: Added results on LRS2. Code at this https URLSubjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data difference motivates us to adapt Whisper to handle video inputs. Inspired by Flamingo which injects visual features into language models, we propose Whisper-Flamingo which integrates visual features into the Whisper speech recognition and translation model with gated cross attention. Our models achieve state-of-the-art ASR WER (0.68%) and AVSR WER (0.76%) on LRS3, and state-of-the-art ASR WER (1.3%) and AVSR WER (1.4%) on LRS2. Audio-visual Whisper-Flamingo outperforms audio-only Whisper on English speech recognition and En-X translation for 6 languages in noisy conditions. Moreover, Whisper-Flamingo is versatile and conducts all of these tasks using one set of parameters, while prior methods are trained separately on each language.
- [85] arXiv:2406.17215 (replaced) [pdf, html, other]
-
Title: Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of DalineSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
The integration of experiment technologies with large language models (LLMs) is transforming scientific research, offering AI capabilities beyond specialized problem-solving to becoming research assistants for human scientists. In power systems, simulations are essential for research. However, LLMs face significant challenges in power system simulations due to limited pre-existing knowledge and the complexity of power grids. To address this issue, this work proposes a modular framework that integrates expertise from both the power system and LLM domains. This framework enhances LLMs' ability to perform power system simulations on previously unseen tools. Validated using 34 simulation tasks in Daline, a (optimal) power flow simulation and linearization toolbox not yet exposed to LLMs, the proposed framework improved GPT-4o's simulation coding accuracy from 0% to 96.07%, also outperforming the ChatGPT-4o web interface's 33.8% accuracy (with the entire knowledge base uploaded). These results highlight the potential of LLMs as research assistants in power systems.
- [86] arXiv:2407.10689 (replaced) [pdf, other]
-
Title: Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNNComments: 22 pages,Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper presents a fast and cost-effective method for diagnosing cardiac abnormalities with high accuracy and reliability using low-cost systems in clinics. The primary limitation of automatic diagnosing of cardiac diseases is the rarity of correct and acceptable labeled samples, which can be expensive to prepare. To address this issue, two methods are proposed in this work. The first method is a unique Multi-Branch Deep Convolutional Neural Network (MBDCN) architecture inspired by human auditory processing, specifically designed to optimize feature extraction by employing various sizes of convolutional filters and audio signal power spectrum as input. In the second method, called as Long short-term memory-Convolutional Neural (LSCN) model, Additionally, the network architecture includes Long Short-Term Memory (LSTM) network blocks to improve feature extraction in the time domain. The innovative approach of combining multiple parallel branches consisting of the one-dimensional convolutional layers along with LSTM blocks helps in achieving superior results in audio signal processing tasks. The experimental results demonstrate superiority of the proposed methods over the state-of-the-art techniques. The overall classification accuracy of heart sounds with the LSCN network is more than 96%. The efficiency of this network is significant compared to common feature extraction methods such as Mel Frequency Cepstral Coefficients (MFCC) and wavelet transform. Therefore, the proposed method shows promising results in the automatic analysis of heart sounds and has potential applications in the diagnosis and early detection of cardiovascular diseases.
- [87] arXiv:2409.03883 (replaced) [pdf, html, other]
-
Title: Data-informativity conditions for structured linear systems with implications for dynamic networksPaul M.J. Van den Hof, Shengling Shi, Stefanie J.M. Fonken, Karthik R. Ramaswamy, Håkan Hjalmarsson, Arne G. DankersComments: 16 pages, 4 figuresSubjects: Systems and Control (eess.SY)
When estimating models of a multivariable dynamic system, a typical condition for consistency is to require the input signals to be persistently exciting, which is guaranteed if the input spectrum is positive definite for a sufficient number of frequencies. In this paper it is investigated how such a condition can be relaxed by exploiting prior structural information on the multivariable system, such as structural zero elements in the transfer matrix or entries that are a priori known and therefore not parametrized. It is shown that in particular situations the data-informativity condition can be decomposed into different MISO (multiple input single output) situations, leading to relaxed conditions for the MIMO (multiple input multiple output) model. When estimating a single module in a linear dynamic network, the data-informativity conditions can generically be formulated as path-based conditions on the graph of the network. The new relaxed conditions for data-informativity will then also lead to relaxed path-based conditions on the network graph. Additionally the new expressions are shown to be closely related to earlier derived conditions for (generic) single module identifiability.
- [88] arXiv:2409.05702 (replaced) [pdf, html, other]
-
Title: Almost Global Trajectory Tracking for Quadrotors Using Thrust Direction Control on $\mathcal{S}^2$Subjects: Systems and Control (eess.SY)
Many of the existing works on quadrotor control address the trajectory tracking problem by employing a cascade design in which the translational and rotational dynamics are stabilized by two separate controllers. The stability of the cascade is often proved by employing trajectory-based arguments, most notably, integral input-to-state stability. In this paper, we follow a different route and present a control law ensuring that a composite function constructed from the translational and rotational tracking errors is a Lyapunov function for the closed-loop cascade. In particular, starting from a generic control law for the double integrator, we develop a suitable attitude control extension, by leveraging a backstepping-like procedure. Using this construction, we provide an almost global stability certificate. The proposed design employs the unit sphere $\mathcal{S}^2$ to describe the rotational degrees of freedom required for position control. This enables a simpler controller tuning and an improved tracking performance with respect to previous global solutions. The new design is demonstrated via numerical simulations and on real-world experiments.
- [89] arXiv:2409.13075 (replaced) [pdf, html, other]
-
Title: Demons registration for 2D empirical wavelet transformsSubjects: Image and Video Processing (eess.IV)
The empirical wavelet transform is a fully adaptive time-scale representation that has been widely used in the last decade. Inspired by the empirical mode decomposition, it consists of filter banks based on harmonic mode supports. Recently, it has been generalized to build the filter banks from any generating function using mappings. In practice, the harmonic mode supports can have low constrained shape in 2D, leading to numerical difficulties to compute the mappings and therefore the related wavelet filters. This work aims to propose an efficient numerical scheme to compute empirical wavelet coefficients using the demons registration algorithm. Results show that the proposed approach gives a numerically robust wavelet transform. An application to texture segmentation of scanning tunnelling microscope images is also presented.
- [90] arXiv:2410.07908 (replaced) [pdf, html, other]
-
Title: ONCOPILOT: A Promptable CT Foundation Model For Solid Tumor EvaluationLéo Machado, Hélène Philippe, Élodie Ferreres, Julien Khlaut, Julie Dupuis, Korentin Le Floch, Denis Habip Gatenyo, Pascal Roux, Jules Grégory, Maxime Ronot, Corentin Dancette, Tom Boeken, Daniel Tordjman, Pierre Manceron, Paul HérentSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Carcinogenesis is a proteiform phenomenon, with tumors emerging in various locations and displaying complex, diverse shapes. At the crucial intersection of research and clinical practice, it demands precise and flexible assessment. However, current biomarkers, such as RECIST 1.1's long and short axis measurements, fall short of capturing this complexity, offering an approximate estimate of tumor burden and a simplistic representation of a more intricate process. Additionally, existing supervised AI models face challenges in addressing the variability in tumor presentations, limiting their clinical utility. These limitations arise from the scarcity of annotations and the models' focus on narrowly defined tasks.
To address these challenges, we developed ONCOPILOT, an interactive radiological foundation model trained on approximately 7,500 CT scans covering the whole body, from both normal anatomy and a wide range of oncological cases. ONCOPILOT performs 3D tumor segmentation using visual prompts like point-click and bounding boxes, outperforming state-of-the-art models (e.g., nnUnet) and achieving radiologist-level accuracy in RECIST 1.1 measurements. The key advantage of this foundation model is its ability to surpass state-of-the-art performance while keeping the radiologist in the loop, a capability that previous models could not achieve. When radiologists interactively refine the segmentations, accuracy improves further. ONCOPILOT also accelerates measurement processes and reduces inter-reader variability, facilitating volumetric analysis and unlocking new biomarkers for deeper insights.
This AI assistant is expected to enhance the precision of RECIST 1.1 measurements, unlock the potential of volumetric biomarkers, and improve patient stratification and clinical care, while seamlessly integrating into the radiological workflow. - [91] arXiv:2411.06033 (replaced) [pdf, html, other]
-
Title: Speech-Based Estimation of Schizophrenia Severity Using Feature FusionComments: Submitted to ICASSP-SPADE workshop 2025Subjects: Audio and Speech Processing (eess.AS)
Speech-based assessment of the schizophrenia spectrum has been widely researched over in the recent past. In this study, we develop a deep learning framework to estimate schizophrenia severity scores from speech using a feature fusion approach that fuses articulatory features with different self-supervised speech features extracted from pre-trained audio models. We also propose an auto-encoder-based self-supervised representation learning framework to extract compact articulatory embeddings from speech. Our top-performing speech-based fusion model with Multi-Head Attention (MHA) reduces Mean Absolute Error (MAE) by 9.18% and Root Mean Squared Error (RMSE) by 9.36% for schizophrenia severity estimation when compared with the previous models that combined speech and video inputs.
- [92] arXiv:2411.07976 (replaced) [pdf, html, other]
-
Title: DINO-LG: A Task-Specific DINO Model for Coronary Calcium ScoringComments: Developed by Center for Applied Artificial Intelligence (CAAI), University of KentuckySubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Coronary artery disease (CAD), one of the most common cause of mortality in the world. Coronary artery calcium (CAC) scoring using computed tomography (CT) is key for risk assessment to prevent coronary disease. Previous studies on risk assessment and calcification detection in CT scans primarily use approaches based on UNET architecture, frequently implemented on pre-built models. However, these models are limited by the availability of annotated CT scans containing CAC and suffering from imbalanced dataset, decreasing performance of CAC segmentation and scoring. In this study, we extend this approach by incorporating the self-supervised learning (SSL) technique of DINO (self-distillation with no labels) to eliminate limitations of scarce annotated data in CT scans. The DINO model's ability to train without requiring CAC area annotations enhances its robustness in generating distinct features. The DINO model is trained on to focus specifically on calcified areas by using labels, aiming to generate features that effectively capture and highlight key characteristics. The label-guided DINO (DINO-LG) enhances classification by distinguishing CT slices that contain calcification from those that do not, performing 57% better than the standard DINO model in this task. CAC scoring and segmentation tasks are performed by a basic U-NET architecture, fed specifically with CT slices containing calcified areas as identified by the DINO-LG model. This targeted identification performed by DINO-LG model improves CAC segmentation performance by approximately 10% and significant increase in CAC scoring accuracy.
- [93] arXiv:2411.11886 (replaced) [pdf, html, other]
-
Title: How Much Data is Enough? Optimization of Data Collection for Artifact Detection in EEG RecordingsLu Wang-Nöth, Philipp Heiler, Hai Huang, Daniel Lichtenstern, Alexandra Reichenbach, Luis Flacke, Linus Maisch, Helmut MayerComments: Several changes of wording. Caption of figure 10 correctedSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Objective. Electroencephalography (EEG) is a widely used neuroimaging technique known for its cost-effectiveness and user-friendliness. However, various artifacts, particularly biological artifacts like Electromyography (EMG) signals, lead to a poor signal-to-noise ratio, limiting the precision of analyses and applications. The currently reported EEG data cleaning performance largely depends on the data used for validation, and in the case of machine learning approaches, also on the data used for training. The data are typically gathered either by recruiting subjects to perform specific artifact tasks or by integrating existing datasets. Prevailing approaches, however, tend to rely on intuitive, concept-oriented data collection with minimal justification for the selection of artifacts and their quantities. Given the substantial costs associated with biological data collection and the pressing need for effective data utilization, we propose an optimization procedure for data-oriented data collection design using deep learning-based artifact detection. Approach. We apply a binary classification between artifact epochs (time intervals containing artifacts) and non-artifact epochs (time intervals containing no artifact) using three different neural architectures. Our aim is to minimize data collection efforts while preserving the cleaning efficiency. Main results. We were able to reduce the number of artifact tasks from twelve to three and decrease repetitions of isometric contraction tasks from ten to three or sometimes even just one. Significance. Our work addresses the need for effective data utilization in biological data collection, offering a systematic and dynamic quantitative approach. By providing clear justifications for the choices of artifacts and their quantity, we aim to guide future studies toward more effective and economical data collection in EEG and EMG research.
- [94] arXiv:2212.07203 (replaced) [pdf, html, other]
-
Title: Collision-free Source Seeking Control Methods for Unicycle RobotsComments: Published in IEEE Transactions on Automatic ControlSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
In this work, we propose a collision-free source-seeking control framework for a unicycle robot traversing an unknown cluttered environment. In this framework, obstacle avoidance is guided by the control barrier functions (CBF) embedded in quadratic programming, and the source-seeking control relies solely on the use of onboard sensors that measure the signal strength of the source. To tackle the mixed relative degree and avoid the undesired position offset for the nonholonomic unicycle model, we propose a novel construction of a control barrier function (CBF) that can directly be integrated with our recent gradient-ascent source-seeking control law. We present a rigorous analysis of the approach. The efficacy of the proposed approach is evaluated via Monte-Carlo simulations, as well as, using a realistic dynamic environment with moving obstacles in Gazebo/ROS.
- [95] arXiv:2301.03641 (replaced) [pdf, html, other]
-
Title: Toward Multi-Layer Networking for Satellite Network OperationsComments: To be published in the Proceedings of 12th Annual IEEE International Conference on Wireless for Space and Extreme Environments (WISEE 2024), Dec. 16 - 18, 2024, Daytona Beach, FL, USASubjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Recent advancements in low-Earth-orbit (LEO) satellites aim to bring resilience, ubiquitous, and high-quality service to future Internet infrastructure. However, the soaring number of space assets, increasing dynamics of LEO satellites and expanding dimensions of network threats call for an enhanced approach to efficient satellite operations. To address these pressing challenges, we propose an approach for satellite network operations based on multi-layer satellite networking (MLSN), called "SatNetOps". Two SatNetOps schemes are proposed, referred to as LEO-LEO MLSN (LLM) and GEO-LEO MLSN (GLM). The performance of the proposed schemes is evaluated in 24-hr satellite scenarios with typical payload setups in simulations, where the key metrics such as latency and reliability are discussed with the consideration of the Consultative Committee for Space Data Systems (CCSDS) standard-compliant telemetry and telecommand missions. Although the SatNetOps approach is promising, we analyze the factors affecting the performance of the LLM and GLM schemes. The discussions on the results and conclusive remarks are made in the end.
- [96] arXiv:2305.01626 (replaced) [pdf, html, other]
-
Title: Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networksSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary suboperation of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. We also show that the concatenated outputs contain precursors to compositionality. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution in the brain from raw acoustic inputs. We also propose a potential neural mechanism called disinhibition that outlines a possible neural pathway towards concatenation and compositionality and suggests our modeling is useful for generating testable prediction for biological and artificial neural processing of speech.
- [97] arXiv:2305.11367 (replaced) [pdf, html, other]
-
Title: Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)
With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied because of its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a Smart Pressure e-Mat (SPeM) system based on piezoresistive material, Velostat, for human monitoring applications, including recognition of sleeping postures, sports, and yoga. After a subsystem scans the e-mat readings and processes the signal, it generates a pressure image stream. Deep neural networks (DNNs) are used to fit and train the pressure image stream and recognize the corresponding human behavior. Four sleeping postures and 13 dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are used as a preliminary validation of the proposed SPeM system. The SPeM system achieves high accuracies in both applications, demonstrating the high accuracy and generalizability of the models. Compared with other pressure sensor-based systems, SPeM possesses more flexible applications and commercial application prospects, with reliable, robust, and repeatable properties.
- [98] arXiv:2402.10115 (replaced) [pdf, html, other]
-
Title: Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GANSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)
In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.
- [99] arXiv:2403.09327 (replaced) [pdf, html, other]
-
Title: Perspective-Equivariance for Unsupervised Imaging with Camera GeometryComments: ECCV camera-readySubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Ill-posed image reconstruction problems appear in many scenarios such as remote sensing, where obtaining high quality images is crucial for environmental monitoring, disaster management and urban planning. Deep learning has seen great success in overcoming the limitations of traditional methods. However, these inverse problems rarely come with ground truth data, highlighting the importance of unsupervised learning from partial and noisy measurements alone. We propose perspective-equivariant imaging (EI), a framework that leverages classical projective camera geometry in optical imaging systems, such as satellites or handheld cameras, to recover information lost in ill-posed camera imaging problems. We show that our much richer non-linear class of group transforms, derived from camera geometry, generalises previous EI work and is an excellent prior for satellite and urban image data. Perspective-EI achieves state-of-the-art results in multispectral pansharpening, outperforming other unsupervised methods in the literature. Code at this https URL.
- [100] arXiv:2407.02182 (replaced) [pdf, html, other]
-
Title: Occlusion-Aware Seamless SegmentationYihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun YangComments: Accepted to ECCV 2024. The fresh dataset and source code are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Blending Panoramic Amodal Seamless Segmentation, i.e., BlendPASS. Besides, we propose the first solution UnmaskFormer, aiming at unmasking the narrow FoV, occlusions, and domain gaps all at once. Specifically, UnmaskFormer includes the crucial designs of Unmasking Attention (UA) and Amodal-oriented Mix (AoMix). Our method achieves state-of-the-art performance on the BlendPASS dataset, reaching a remarkable mAPQ of 26.58% and mIoU of 43.66%. On public panoramic semantic segmentation datasets, i.e., SynPASS and DensePASS, our method outperforms previous methods and obtains 45.34% and 48.08% in mIoU, respectively. The fresh BlendPASS dataset and our source code are available at this https URL.
- [101] arXiv:2409.14489 (replaced) [pdf, html, other]
-
Title: A New Twist on Low-Complexity Digital BackpropagationComments: The manuscript has been submitted to the Journal of Lightwave Technology on November 2024Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This work proposes a novel low-complexity digital backpropagation (DBP) method, with the goal of optimizing the trade-off between backpropagation accuracy and complexity. The method combines a split step Fourier method (SSFM)-like structure with a simplifed logarithmic perturbation method to obtain a high accuracy with a small number of DBP steps. Subband processing and asymmetric steps with optimized splitting ratio are also employed to further reduce the number of steps required to achieve a prescribed performance. The first part of the manuscript is dedicated to the derivation of a simplified logaritmic-perturbation model for the propagation of signal in an optical fiber, which serves for the development of the proposed coupled-band enhanced split step Fourier method (CB-ESSFM) and for the analytical calculation of the model coefficients. Next, the manuscript presents a DSP algorithm for the implementation of DBP based on a discrete-time version of the model and an overlap-and-save processing strategy. Practical approaches for the optimization of the coefficients used in the algorithm and of the splitting ratio of the asymmetric steps are also discussed. A detailed analysis of the computational complexity is presented. Finally, the performance and complexity of the proposed DBP method are investigated through simulations. In a five-channel 100 GHz-spaced wavelength division multiplexing system over a 15x80 km single-mode-fiber link, the proposed CB-ESSFM achieves a gain of about 1dB over simple dispersion compensation with only 15 steps (corresponding to 681 real multiplications per 2D symbol), with an improvement of 0.9 dB over conventional SSFM and almost 0.4dB over our previously proposed ESSFM. Significant gains and improvements are obtained also at lower complexity. A similar analysis is performed also for longer links, confirming the good performance of the proposed method.
- [102] arXiv:2411.03127 (replaced) [pdf, html, other]
-
Title: Receiver-Centric Generative Semantic CommunicationsComments: Demo video has been made available at: this https URLSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper investigates semantic communications between a transmitter and a receiver, where original data, such as videos of interest to the receiver, is stored at the transmitter. Although significant process has been made in semantic communications, a fundamental design problem is that the semantic information is extracted based on certain criteria at the transmitter alone, without considering the receiver's specific information needs. As a result, critical information of primary concern to the receiver may be lost. In such cases, the semantic transmission becomes meaningless to the receiver, as all received information is irrelevant to its interests. To solve this problem, this paper presents a receiver-centric generative semantic communication system, where each transmission is initialized by the receiver. Specifically, the receiver first sends its request for the desired semantic information to the transmitter at the start of each transmission. Then, the transmitter extracts the required semantic information accordingly. A key challenge is how the transmitter understands the receiver's requests for semantic information and extracts the required semantic information in a reasonable and robust manner. We address this challenge by designing a well-structured framework and leveraging off-the-shelf generative AI products, such as GPT-4, along with several specialized tools for detection and estimation. Evaluation results demonstrate the feasibility and effectiveness of the proposed new semantic communication system.
- [103] arXiv:2411.06317 (replaced) [pdf, html, other]
-
Title: Harpocrates: A Statically Typed Privacy Conscious Programming FrameworkComments: Draft workSubjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
In this paper, we introduce Harpocrates, a compiler plugin and a framework pair for Scala that binds the privacy policies to the data during data creation in form of oblivious membranes. Harpocrates eliminates raw data for a policy protected type from the application, ensuring it can only exist in protected form and centralizes the policy checking to the policy declaration site, making the privacy logic easy to maintain and verify. Instead of approaching privacy from an information flow verification perspective, Harpocrates allow the data to flow freely throughout the application, inside the policy membranes but enforces the policies when the data is tried to be accessed, mutated, declassified or passed through the application boundary. The centralization of the policies allow the maintainers to change the enforced logic simply by updating a single function while keeping the rest of the application oblivious to the change. Especially in a setting where the data definition is shared by multiple applications, the publisher can update the policies without requiring the dependent applications to make any changes beyond updating the dependency version.
- [104] arXiv:2411.07603 (replaced) [pdf, html, other]
-
Title: $\mathscr{H}_2$ Model Reduction for Linear Quantum SystemsComments: 13 pages,3 figuresSubjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
In this paper, an $\mathscr{H}_2$ norm-based model reduction method for linear quantum systems is presented, which can obtain a physically realizable model with a reduced order for closely approximating the original system. The model reduction problem is described as an optimization problem, whose objective is taken as an $\mathscr{H}_2$ norm of the difference between the transfer function of the original system and that of the reduced one. Different from classical model reduction problems, physical realizability conditions for guaranteeing that the reduced-order system is also a quantum system should be taken as nonlinear constraints in the optimization. To solve the optimization problem with such nonlinear constraints, we employ a matrix inequality approach to transform nonlinear inequality constraints into readily solvable linear matrix inequalities (LMIs) and nonlinear equality constraints, so that the optimization problem can be solved by a lifting variables approach. We emphasize that different from existing work, which only introduces a criterion to evaluate the performance after model reduction, we guide our method to obtain an optimal reduced model with respect to the $\mathscr{H}_2$ norm. In addition, the above approach for model reduction is extended to passive linear quantum systems. Finally, examples of active and passive linear quantum systems validate the efficacy of the proposed method.
- [105] arXiv:2411.10592 (replaced) [pdf, html, other]
-
Title: A Systematic LMI Approach to Design Multivariable Sliding Mode ControllersComments: 8 pages, 4 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper deals with sliding mode control for multivariable polytopic uncertain systems. We provide systematic procedures to design variable structure controllers (VSCs) and unit-vector controllers (UVCs). Based on suitable representations for the closed-loop system, we derive sufficient conditions in the form of linear matrix inequalities (LMIs) to design the robust sliding mode controllers such that the origin of the closed-loop system is globally stable in finite time. Moreover, by noticing that the reaching time depends on the initial condition and the decay rate, we provide convex optimization problems to design robust controllers by considering the minimization of the reaching time associated with a given set of initial conditions. Two examples illustrate the effectiveness of the proposed approaches.
- [106] arXiv:2411.12254 (replaced) [pdf, html, other]
-
Title: Predicting User Intents and Musical Attributes from Music Discovery ConversationsComments: 8 pages, 4 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Intent classification is a text understanding task that identifies user needs from input text queries. While intent classification has been extensively studied in various domains, it has not received much attention in the music domain. In this paper, we investigate intent classification models for music discovery conversation, focusing on pre-trained language models. Rather than only predicting functional needs: intent classification, we also include a task for classifying musical needs: musical attribute classification. Additionally, we propose a method of concatenating previous chat history with just single-turn user queries in the input text, allowing the model to understand the overall conversation context better. Our proposed model significantly improves the F1 score for both user intent and musical attribute classification, and surpasses the zero-shot and few-shot performance of the pretrained Llama 3 model.