We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 72 entries: 1-72 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 2 May 24

[1]  arXiv:2405.00055 [pdf, other]
Title: A Hybrid Probabilistic Battery Health Management Approach for Robust Inspection Drone Operations
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Health monitoring of remote critical infrastructure is a complex and expensive activity due to the limited infrastructure accessibility. Inspection drones are ubiquitous assets that enhance the reliability of critical infrastructures through improved accessibility. However, due to the harsh operation environment, it is crucial to monitor their health to ensure successful inspection operations. The battery is a key component that determines the overall reliability of the inspection drones and, with an appropriate health management approach, contributes to reliable and robust inspections. In this context, this paper presents a novel hybrid probabilistic approach for battery end-of-discharge (EOD) voltage prediction of Li-Po batteries. The hybridization is achieved in an error-correction configuration, which combines physics-based discharge and probabilistic error-correction models to quantify the aleatoric and epistemic uncertainty. The performance of the hybrid probabilistic methodology was empirically evaluated on a dataset comprising EOD voltage under varying load conditions. The dataset was obtained from real inspection drones operated on different flights, focused on offshore wind turbine inspections. The proposed approach has been tested with different probabilistic methods and demonstrates 14.8% improved performance in probabilistic accuracy compared to the best probabilistic method. In addition, aleatoric and epistemic uncertainties provide robust estimations to enhance the diagnosis of battery health-states.

[2]  arXiv:2405.00056 [pdf, other]
Title: Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation
Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953
Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)

Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.

[3]  arXiv:2405.00069 [pdf, other]
Title: Estimation of Time-to-Total Knee Replacement Surgery
Comments: 11 pages, 3 figures,4 tables, submitted to a conference
Subjects: Image and Video Processing (eess.IV)

A survival analysis model for predicting time-to-total knee replacement (TKR) was developed using features from medical images and clinical measurements. Supervised and self-supervised deep learning approaches were utilized to extract features from radiographs and magnetic resonance images. Extracted features were combined with clinical and image assessments for survival analysis using random survival forests. The proposed model demonstrated high discrimination power by combining deep learning features and clinical and image assessments using a fusion of multiple modalities. The model achieved an accuracy of 75.6% and a C-Index of 84.8% for predicting the time-to-TKR surgery. Accurate time-to-TKR predictions have the potential to help assist physicians to personalize treatment strategies and improve patient outcomes.

[4]  arXiv:2405.00075 [pdf, ps, other]
Title: Charting the Path Forward: CT Image Quality Assessment -- An In-Depth Review
Subjects: Image and Video Processing (eess.IV)

Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, advancements in research, and current developments in CT image quality assessment (IQA) are examined in this paper. In this review, we collected and researched more than 500 CT-IQA publications published before August 2023. And we provide the visualization analysis of keywords and co-citations in the knowledge graph of these papers. Prospects and obstacles for the continued development of CT-IQA are also covered. At present, significant research branches in the CT-IQA domain include Phantom study, Artificial intelligence deep-learning reconstruction algorithm, Dose reduction opportunity, and Virtual monoenergetic reconstruction. Artificial intelligence (AI)-based CT-IQA also becomes a trend. It increases the accuracy of the CT scanning apparatus, amplifies the impact of the CT system reconstruction algorithm, and creates an effective algorithm for post-processing CT images. AI-based medical IQA offers excellent application opportunities in clinical work. AI can provide uniform quality assessment criteria and more comprehensive guidance amongst various healthcare facilities, and encourage them to identify one another's images. It will help lower the number of unnecessary tests and associated costs, and enhance the quality of medical imaging and assessment efficiency.

[5]  arXiv:2405.00121 [pdf, other]
Title: Indoor Synthetic Aperture Radar Measurements of Point-Like Targets Using a Wheeled Mobile Robot
Comments: 6 pages, 11 figures. This paper was presented at the 15th European Conference on Synthetic Aperture Radar (EUSAR 2024), 23-26 Apr 2024, Munich, Germany
Subjects: Signal Processing (eess.SP)

Small, low-cost radar sensors offer a lighting independent sensing capability for indoor mobile robots that is useful for localization and mapping. Synthetic aperture radar (SAR) offers an attractive way to increase the angular resolution of small radar sensors for use on mobile robots to generate high-resolution maps of the indoor environment. This work quantifies the maximum synthesizable aperture length of our mobile robot measurement setup using radar-inertial odometry localization and offers insights into challenges for robotic millimeter-wave SAR imaging.

[6]  arXiv:2405.00130 [pdf, other]
Title: A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices through an innovative Cross-Slice Attention (CSA) module. This module uses the cross-slice attention mechanism to effectively capture 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to understand correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MRI segmentation, (2) binary prostate MRI segmentation, and (3) multi-class prostate MRI segmentation. CSA-Net outperformed leading 2D and 2.5D segmentation methods across all three tasks, demonstrating its efficacy and superiority. Our code is publicly available at https://github.com/mirthAI/CSA-Net.

[7]  arXiv:2405.00141 [pdf, ps, other]
Title: RIS-aided Wireless Communication with Movable Elements Geometry Impact on Performance
Comments: 5 pages, 4 figures
Subjects: Systems and Control (eess.SY)

Reconfigurable Intelligent Surfaces (RIS) are known as a promising technology to improve the performance of wireless communication networks, and have been extensively studied. Movable Antennas (MA) are a novel technology that fully exploits the antenna placement for enhancing the system performance. This article aims at evaluating the impact of transmit power and number of antenna elements on the outage probability performance of an MA-enabled RIS structure (MA-RIS), compared to existing Fixed-Position Antenna RIS (FPA-RIS). The change in geometry caused by the movement of antennas and its implications for the effective number of illuminated elements, are studied for 1D and 2D array structures. Our numerical results confirm the performance advantage provided by MA-RIS, achieving 24\% improvement in outage probability, and 2 dB gain in Signal-to-Noise Ratio (SNR), as compared to FPA-RIS.

[8]  arXiv:2405.00157 [pdf, other]
Title: Information-Theoretic Opacity-Enforcement in Markov Decision Processes
Subjects: Systems and Control (eess.SY)

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered -- Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

[9]  arXiv:2405.00180 [pdf, other]
Title: Heart Rate and Body Temperature Relationship in Children Admitted to PICU -- A Machine Learning Approach
Comments: In preprint. Under review
Subjects: Signal Processing (eess.SP)

Vital signs have been essential clinical measures. Among these, body temperature (BT) and heart rate (HR) are particularly significant, and numerous studies explored their association in hospitalized adults and children. However, a lack of in-depth research persists in children admitted to the pediatric intensive care unit (PICU) despite their critical condition requiring particular attention. Objective: In this study, we explore the relationship between HR and BT in children from 0 to 18 years old admitted to the PICU of CHU Sainte-Justine Hospital. Methods: We applied Machine learning (ML) techniques to unravel subtle patterns and dependencies within our dataset to achieve this objective. Each algorithm undergoes meticulous hyperparameter tuning to optimize the model performance. Results: Our findings align with prior research, revealing a consistent trend of decreasing HR with increasing patient age, confirming the observed inverse correlation. Furthermore, a thorough analysis identifies Gradient Boosting Machines (GBM) implemented with Quantile regression (QR), as the most fitting model, effectively capturing the non-linear relationship between HR, BT, and age. Through testing the HR prediction model based on age and BT, the predictive model between the 5th and 95th percentiles accurately demonstrates the declining trend of HR with age, while HR increase with BT. Based on that, we have developed a user-friendly interface tailored to generate HR predictions at different percentiles based on three key input parameters : current HR, current BT, and patient's age. The resulting output enables caregivers to quickly determine whether a patient's HR falls within or outside the normal range, facilitating informed clinical decision-making. Thus, our results challenge previous studies' presumed direct linear association between HR and BT.

[10]  arXiv:2405.00239 [pdf, other]
Title: IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images
Comments: 12 pages, 6 figures, 1 table
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Minimizing the need for pixel-level annotated data for training PET anomaly segmentation networks is crucial, particularly due to time and cost constraints related to expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data, although these are more challenging to train. In this work, we present a weakly supervised and Implicitly guided COuNterfactual diffusion model for Detecting Anomalies in PET images, branded as IgCONDA-PET. The training is conditioned on image class labels (healthy vs. unhealthy) along with implicit guidance to generate counterfactuals for an unhealthy image with anomalies. The counterfactual generation process synthesizes the healthy counterpart for a given unhealthy image, and the difference between the two facilitates the identification of anomaly locations. The code is available at: https://github.com/igcondapet/IgCONDA-PET.git

[11]  arXiv:2405.00345 [pdf, ps, other]
Title: Multi-task Learning-based Joint CSI Prediction and Predictive Transmitter Selection for Security
Subjects: Signal Processing (eess.SP)

In mobile communication scenarios, the acquired channel state information (CSI) rapidly becomes outdated due to fast-changing channels. Opportunistic transmitter selection based on current CSI for secrecy improvement may be outdated during actual transmission, negating the diversity benefit of transmitter selection. Motivated by this problem, we propose a joint CSI prediction and predictive selection of the optimal transmitter strategy based on historical CSI by exploiting the temporal correlation among CSIs. The proposed solution utilizes the multi-task learning (MTL) framework by employing a single Long Short-Term Memory (LSTM) network architecture that simultaneously learns two tasks of predicting the CSI and selecting the optimal transmitter in parallel instead of learning these tasks sequentially. The proposed LSTM architecture outperforms convolutional neural network (CNN) based architecture due to its superior ability to capture temporal features in the data. Compared to the sequential task learning models, the MTL architecture provides superior predicted secrecy performance for a large variation in the number of transmitters and the speed of mobile nodes. It also offers significant computational and memory efficiency, leading to a substantial saving in computational time by around 40 percent.

[12]  arXiv:2405.00372 [pdf, other]
Title: High-Precision Positioning with Continuous Delay and Doppler Shift using AFT-MC Waveforms
Subjects: Signal Processing (eess.SP)

This paper explores a novel integrated localization and communication (ILAC) system using the affine Fourier transform multicarrier (AFT-MC) waveform. Specifically, we consider a multiple-input multiple-output (MIMO) AFT-MC system with ILAC and derive a continuous delay and Doppler shift channel matrix model. Based on the derived signal model, we develop a two-step algorithm with low complexity for estimating channel parameters. Furthermore, we derive the Cram\'er-Rao lower bound (CRLB) of location estimation as the fundamental limit of localization. Finally, we provide some insights about the AFT-MC parameters by explaining the impact of the parameters on localization performance. Simulation results demonstrate that the AFT-MC waveform is able to provide significant localization performance improvement compared to orthogonal frequency division multiplexing (OFDM) while achieving the CRLB of location estimation.

[13]  arXiv:2405.00407 [pdf, other]
Title: Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank
Comments: 6 Pages, 3 Figures, 1 Table
Subjects: Signal Processing (eess.SP)

Terahertz imaging shows significant potential across diverse fields, yet the cost-effectiveness of multi-pixel imaging equipment remains an obstacle for many researchers. To tackle this issue, the utilization of single-pixel imaging arises as a lower-cost option, however, the data collection process necessary for reconstructing images is time-consuming. Compressive Sensing offers a promising solution by enabling image generation with fewer measurements than required by Nyquist's theorem, yet long processing times remain an issue, especially for large-sized images. Our proposed solution to this issue involves using caustic lens effect induced by perturbations in a ripple tank as a sampling mask. The dynamic characteristics of the ripple tank introduce randomness into the sampling process, thereby reducing measurement time through exploitation of the inherent sparsity of THz band signals. In this study, a Convolutional Neural Network was used to conduct target classification, based on the distinctive signal patterns obtained via the caustic lens mask. The suggested classifier obtained a 95.16 % accuracy rate in differentiating targets resembling Latin letters.

[14]  arXiv:2405.00472 [pdf, other]
Title: DmADs-Net: Dense multiscale attention and depth-supervised network for medical image segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning has made important contributions to the development of medical image segmentation. Convolutional neural networks, as a crucial branch, have attracted strong attention from researchers. Through the tireless efforts of numerous researchers, convolutional neural networks have yielded numerous outstanding algorithms for processing medical images. The ideas and architectures of these algorithms have also provided important inspiration for the development of later technologies.Through extensive experimentation, we have found that currently mainstream deep learning algorithms are not always able to achieve ideal results when processing complex datasets and different types of datasets. These networks still have room for improvement in lesion localization and feature extraction. Therefore, we have created the Dense Multiscale Attention and Depth-Supervised Network (DmADs-Net).We use ResNet for feature extraction at different depths and create a Multi-scale Convolutional Feature Attention Block to improve the network's attention to weak feature information. The Local Feature Attention Block is created to enable enhanced local feature attention for high-level semantic information. In addition, in the feature fusion phase, a Feature Refinement and Fusion Block is created to enhance the fusion of different semantic information.We validated the performance of the network using five datasets of varying sizes and types. Results from comparative experiments show that DmADs-Net outperformed mainstream networks. Ablation experiments further demonstrated the effectiveness of the created modules and the rationality of the network architecture.

[15]  arXiv:2405.00542 [pdf, other]
Title: UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO). To mitigate potential adverse effects associated with injections, researchers have proposed the development of cross-modality medical image generation algorithms capable of converting UWF-SLO images into their UWF-FA counterparts. Current image generation techniques applied to fundus photography encounter difficulties in producing high-resolution retinal images, particularly in capturing minute vascular lesions. To address these issues, we introduce a novel conditional generative adversarial network (UWAFA-GAN) to synthesize UWF-FA from UWF-SLO. This approach employs multi-scale generators and an attention transmit module to efficiently extract both global structures and local lesions. Additionally, to counteract the image blurriness issue that arises from training with misaligned data, a registration module is integrated within this framework. Our method performs non-trivially on inception scores and details generation. Clinical user studies further indicate that the UWF-FA images generated by UWAFA-GAN are clinically comparable to authentic images in terms of diagnostic reliability. Empirical evaluations on our proprietary UWF image datasets elucidate that UWAFA-GAN outperforms extant methodologies. The code is accessible at https://github.com/Tinysqua/UWAFA-GAN.

[16]  arXiv:2405.00567 [pdf, other]
Title: Remote Sensing Data Assimilation with a Chained Hydrologic-hydraulic Model for Flood Forecasting
Comments: 13 pages, 14 figures. Submitted to the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects: Image and Video Processing (eess.IV)

A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and flood extent maps derived from remote sensing observations. The data assimilation framework is cycled in a real-time forecasting configuration. A cycle consists of a reanalysis and a forecast phase. Over the analysis, observations up to the present are assimilated. An ensemble is then initialized from the last analyzed states and issued forecasts for next 36 hr. Three strategies of forcing data for this forecast are investigated: (i) using CTRIP runoff for reanalysis and forecast, (ii) using observed discharge for analysis, then CTRIP runoff for forecast and (iii) using observed discharge for reanalysis and keep a persistent discharge value for forecast. It was shown that the data assimilation strategy provides a reliable reanalysis in hindcast mode. The combination of observed discharge and CTRIP runoff provides the most accurate results. For all strategies, the quality of the forecast decreases as the lead time increases. When the errors in CTRIP forcing are non-stationary, the forecast capability may be reduced. This work demonstrates that the forcing provided by a hydrologic model, while imperfect, can be efficiently used as input to a hydraulic model to issue reanalysis and forecasts, thanks to the assimilation of in-situ and remote sensing observations.

[17]  arXiv:2405.00572 [pdf, other]
Title: A Modular Pragmatic Architecture for Multiuser MIMO with Array-Fed RIS
Comments: 5 pages, 8 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Systems and Control (eess.SY)

We propose a power- and hardware-efficient, pragmatic, modular, multiuser/multibeam array-fed RIS architecture particularly suited to operate in very high frequency bands (high mmWave and sub-THz), where channels are typically sparse in the beamspace and line-of-sight (LOS) is required to achieve an acceptable received signal level. The key module is an active multi-antenna feeder (AMAF) with a small number of active antennas placed in the near field of a RIS with a much larger number of passive controllable reflecting elements. We propose a pragmatic approach to obtain a steerable beam with high gain and very low sidelobes. Then, $K$ independently controlled beams can be achieved by stacking $K$ of such AMAF-RIS modules. Our analysis takes in full account: 1) the near-end crosstalk (NEXT) between the modules, 2) the far-end crosstalk (FEXT) due to the sidelobes; 3) a thorough energy efficiency comparison with respect to conventional {\em active arrays} with the same beamforming performance. Overall, we show that the proposed architecture is very attractive in terms of spectral efficiency, ease of implementation (hardware complexity), and energy efficiency.

[18]  arXiv:2405.00598 [pdf, ps, other]
Title: Pseudo-noise pulse-compression thermography: a powerful tool for time-domain thermography analysis
Comments: 24 paged, 20 figures
Subjects: Signal Processing (eess.SP); Applied Physics (physics.app-ph)

Pulse-compression is a correlation-based measurement technique successfully used in many NDE applications to increase the SNR in the presence of huge noise, strong signal attenuation or when high excitation levels must be avoided. In thermography, the pulse-compression approach was firstly introduced in 2005 by Mulavesaala and co-workers, and then further developed by Mandelis and co-authors that applied to thermography the concept of the thermal-wave radar developed for photothermal measurements. Since then, many measurement schemes and applications have been reported in the literature by several groups by using various heating sources, coded excitation signals, and processing algorithms. The variety of such techniques is known as pulse-compression thermography or thermal-wave radar imaging. Even despite the continuous improvement of these techniques during these years, the advantages of using a correlation-based approach in thermography are still not fully exploited and recognized by the community. This is because up to now the reconstructed thermograms' time sequences after pulse-compression were affected by the so-called sidelobes. This is a severe drawback since it hampers an easy interpretation of the data and their comparison with other thermography techniques. To overcome this issue and unleash the full potential of the approach, this paper shows how it is possible to implement a pulse-compression thermography procedure capable of suppressing any sidelobe by using a pseudo-noise excitation and a proper processing algorithm.

[19]  arXiv:2405.00627 [pdf, other]
Title: Koopman-based Deep Learning for Nonlinear System Estimation
Comments: 11 pages
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Nonlinear differential equations are encountered as models of fluid flow, spiking neurons, and many other systems of interest in the real world. Common features of these systems are that their behaviors are difficult to describe exactly and invariably unmodeled dynamics present challenges in making precise predictions. In many cases the models exhibit extremely complicated behavior due to bifurcations and chaotic regimes. In this paper, we present a novel data-driven linear estimator that uses Koopman operator theory to extract finite-dimensional representations of complex nonlinear systems. The extracted model is used together with a deep reinforcement learning network that learns the optimal stepwise actions to predict future states of the original nonlinear system. Our estimator is also adaptive to a diffeomorphic transformation of the nonlinear system which enables transfer learning to compute state estimates of the transformed system without relearning from scratch.

[20]  arXiv:2405.00637 [pdf, ps, other]
Title: A Distributed Model Identification Algorithm for Multi-Agent Systems
Comments: 6 pages, 4 figures
Subjects: Systems and Control (eess.SY)

In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantages including a large reduction in the communication network's bandwidth requirements by minimizing the data exchanged at each iteration and enabling the model to adapt in real-time to disturbances. Furthermore, we extend our model identification process from linear frameworks to more complex non-linear convex models. This extension is validated through numerical studies demonstrating improved control performance for a synthetic IEEE test case.

Cross-lists for Thu, 2 May 24

[21]  arXiv:2405.00003 (cross-list from cs.DC) [pdf, other]
Title: TALICS$^3$: Tape Library Cloud Storage System Simulator
Comments: 19 pages, 11 figures. Submitted to Simulation Modelling Practice and Theory
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

High performance computing data is surging fast into the exabyte-scale world, where tape libraries are the main platform for long-term durable data storage besides high-cost DNA. Tape libraries are extremely hard to model, but accurate modeling is critical for system administrators to obtain valid performance estimates for their designs. This research introduces a discrete event tape simulation platform that realistically models tape library behavior in a networked cloud environment, by incorporating real-world phenomena and effects. The platform addresses several challenges, including precise estimation of data access latency, rates of robot exchange, data collocation, deduplication/compression ratio, and attainment of durability goals through replication or erasure coding. The suggested simulator has the capability to compare the single enterprise configuration with multiple commodity library (RAIL) configurations, making it a useful tool for system administrators and reliability engineers. They can use the simulator to obtain practical and reliable performance estimates for their long-term, durable, and cost-effective cold data storage architecture designs.

[22]  arXiv:2405.00027 (cross-list from cs.CV) [pdf, other]
Title: Multidimensional Compressed Sensing for Spectral Light Field Imaging
Comments: 8 pages, published of VISAPP 2024
Journal-ref: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2024, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 349-356
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.

[23]  arXiv:2405.00031 (cross-list from cs.CV) [pdf, other]
Title: SegNet: A Segmented Deep Learning based Convolutional Neural Network Approach for Drones Wildfire Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

This research addresses the pressing challenge of enhancing processing times and detection capabilities in Unmanned Aerial Vehicle (UAV)/drone imagery for global wildfire detection, despite limited datasets. Proposing a Segmented Neural Network (SegNet) selection approach, we focus on reducing feature maps to boost both time resolution and accuracy significantly advancing processing speeds and accuracy in real-time wildfire detection. This paper contributes to increased processing speeds enabling real-time detection capabilities for wildfire, increased detection accuracy of wildfire, and improved detection capabilities of early wildfire, through proposing a new direction for image classification of amorphous objects like fire, water, smoke, etc. Employing Convolutional Neural Networks (CNNs) for image classification, emphasizing on the reduction of irrelevant features vital for deep learning processes, especially in live feed data for fire detection. Amidst the complexity of live feed data in fire detection, our study emphasizes on image feed, highlighting the urgency to enhance real-time processing. Our proposed algorithm combats feature overload through segmentation, addressing challenges arising from diverse features like objects, colors, and textures. Notably, a delicate balance of feature map size and dataset adequacy is pivotal. Several research papers use smaller image sizes, compromising feature richness which necessitating a new approach. We illuminate the critical role of pixel density in retaining essential details, especially for early wildfire detection. By carefully selecting number of filters during training, we underscore the significance of higher pixel density for proper feature selection. The proposed SegNet approach is rigorously evaluated using real-world dataset obtained by a drone flight and compared to state-of-the-art literature.

[24]  arXiv:2405.00077 (cross-list from cs.LG) [pdf, other]
Title: BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samples, and (3) sampling misalignment, due to instrumental limitations, impacting downstream brain network analysis and clinical outcome predictions. In this work, we propose a novel model called BrainODE to achieve continuous modeling of dynamic brain signals using Ordinary Differential Equations (ODE). By learning latent initial values and neural ODE functions from irregular time series, BrainODE effectively reconstructs brain signals at any time point, mitigating the aforementioned three data challenges of brain signals altogether. Comprehensive experimental results on real-world neuroimaging datasets demonstrate the superior performance of BrainODE and its capability of addressing the three data challenges.

[25]  arXiv:2405.00135 (cross-list from cs.IT) [pdf, other]
Title: Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach
Comments: This work has been submitted to the IEEE Communications Letters
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if corrupted by dynamic channels. Therefore, this letter introduces a unified channel-resilient TSC framework via information bottleneck. This framework complements existing TSC approaches by controlling information flow to capture fine-grained feature-level semantic robustness. Experiments on a case study for real-time subchannel allocation validate the framework's effectiveness.

[26]  arXiv:2405.00136 (cross-list from cs.LG) [pdf, other]
Title: Data-Driven Permissible Safe Control with Barrier Certificates
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

This paper introduces a method of identifying a maximal set of safe strategies from data for stochastic systems with unknown dynamics using barrier certificates. The first step is learning the dynamics of the system via Gaussian process (GP) regression and obtaining probabilistic errors for this estimate. Then, we develop an algorithm for constructing piecewise stochastic barrier functions to find a maximal permissible strategy set using the learned GP model, which is based on sequentially pruning the worst controls until a maximal set is identified. The permissible strategies are guaranteed to maintain probabilistic safety for the true system. This is especially important for learning-enabled systems, because a rich strategy space enables additional data collection and complex behaviors while remaining safe. Case studies on linear and nonlinear systems demonstrate that increasing the size of the dataset for learning the system grows the permissible strategy set.

[27]  arXiv:2405.00213 (cross-list from cs.LG) [pdf, other]
Title: Block-As-Domain Adaptation for Workload Prediction from fNIRS Data
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

Functional near-infrared spectroscopy (fNIRS) is a non-intrusive way to measure cortical hemodynamic activity. Predicting cognitive workload from fNIRS data has taken on a diffuse set of methods. To be applicable in real-world settings, models are needed, which can perform well across different sessions as well as different subjects. However, most existing works assume that training and testing data come from the same subjects and/or cannot generalize well across never-before-seen subjects. Additional challenges imposed by fNIRS data include the high variations in inter-subject fNIRS data and also in intra-subject data collected across different blocks of sessions. To address these issues, we propose an effective method, referred to as the class-aware-block-aware domain adaptation (CABA-DA) which explicitly minimize intra-session variance by viewing different blocks from the same subject same session as different domains. We minimize the intra-class domain discrepancy and maximize the inter-class domain discrepancy accordingly. In addition, we propose an MLPMixer-based model for cognitive load classification. Experimental results demonstrate the proposed model has better performance compared with three different baseline models on three public-available datasets of cognitive workload. Two of them are collected from n-back tasks and one of them is from finger tapping. From our experiments, we also show the proposed contrastive learning method can also improve baseline models we compared with.

[28]  arXiv:2405.00230 (cross-list from math.OC) [pdf, other]
Title: A decomposition-based approach for large-scale pickup and delivery problems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

With the advent of self-driving cars, experts envision autonomous mobility-on-demand services in the near future to cope with overloaded transportation systems in cities worldwide. Efficient operations are imperative to unlock such a system's maximum improvement potential. Existing approaches either consider a narrow planning horizon or ignore essential characteristics of the underlying problem. In this paper, we develop an algorithmic framework that allows the study of very large-scale pickup and delivery routing problems with more than 20 thousand requests, which arise in the context of integrated request pooling and vehicle-to-request dispatching. We conduct a computational study and present comparative results showing the characteristics of the developed approaches. Furthermore, we apply our algorithm to related benchmark instances from the literature to show the efficacy. Finally, we solve very large-scale instances and derive insights on upper-bound improvements regarding fleet sizing and customer delay acceptance from a practical perspective.

[29]  arXiv:2405.00233 (cross-list from cs.SD) [pdf, other]
Title: SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Comments: Demo and code: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these challenges, we introduce SemantiCodec, a novel codec designed to compress audio into fewer than a hundred tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. SemantiCodec features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE, discretized using k-means clustering on extensive audio data, and an acoustic encoder to capture the remaining details. The semantic and acoustic encoder outputs are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 per second, supporting a range of ultra-low bit rates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality. Our results also suggest that SemantiCodec contains significantly richer semantic information than all evaluated audio codecs, even at significantly lower bitrates. Our code and demos are available at https://haoheliu.github.io/SemantiCodec/.

[30]  arXiv:2405.00248 (cross-list from cs.SD) [pdf, other]
Title: Who is Authentic Speaker
Authors: Qiang Huang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Voice conversion (VC) using deep learning technologies can now generate high quality one-to-many voices and thus has been used in some practical application fields, such as entertainment and healthcare. However, voice conversion can pose potential social issues when manipulated voices are employed for deceptive purposes. Moreover, it is a big challenge to find who are real speakers from the converted voices as the acoustic characteristics of source speakers are changed greatly. In this paper we attempt to explore the feasibility of identifying authentic speakers from converted voices. This study is conducted with the assumption that certain information from the source speakers persists, even when their voices undergo conversion into different target voices. Therefore our experiments are geared towards recognising the source speakers given the converted voices, which are generated by using FragmentVC on the randomly paired utterances from source and target speakers. To improve the robustness against converted voices, our recognition model is constructed by using hierarchical vector of locally aggregated descriptors (VLAD) in deep neural networks. The authentic speaker recognition system is mainly tested in two aspects, including the impact of quality of converted voices and the variations of VLAD. The dataset used in this work is VCTK corpus, where source and target speakers are randomly paired. The results obtained on the converted utterances show promising performances in recognising authentic speakers from converted voices.

[31]  arXiv:2405.00259 (cross-list from physics.med-ph) [pdf, ps, other]
Title: Optimization of Dark-Field CT for Lung Imaging
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)

Background: X-ray grating-based dark-field imaging can sense the small angle scattering caused by an object's micro-structure. This technique is sensitive to lung's porous alveoli and is able to detect lung disease at an early stage. Up to now, a human-scale dark-field CT has been built for lung imaging. Purpose: This study aimed to develop a more thorough optimization method for dark-field lung CT and summarize principles for system design. Methods: We proposed a metric in the form of contrast-to-noise ratio (CNR) for system parameter optimization, and designed a phantom with concentric circle shape to fit the task of lung disease detection. Finally, we developed the calculation method of the CNR metric, and analyzed the relation between CNR and system parameters. Results: We showed that with other parameters held constant, the CNR first increases and then decreases with the system auto-correlation length (ACL). The optimal ACL is nearly not influenced by system's visibility, and is only related to phantom's property, i.e., scattering material's size and phantom's absorption. For our phantom, the optimal ACL is about 0.21 {\mu}m. As for system geometry, larger source-detector and isocenter-detector distance can increase the system's maximal ACL, helping the system meet the optimal ACL more easily. Conclusions: This study proposed a more reasonable metric and a task-based process for optimization, and demonstrated that the system optimal ACL is only related to the phantom's property.

[32]  arXiv:2405.00307 (cross-list from cs.SD) [pdf, other]
Title: Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Comments: Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method \textsc{After}, using only 20\% of samples, improves accuracy by 8.45\% and reduces time consumption by 79\%. The additional extension of \textsc{After} and ablation studies further confirm its effectiveness and applicability to various real-world scenarios. Our source code is available on Github for reproducibility. (https://github.com/Clearloveyuan/AFTER).

[33]  arXiv:2405.00316 (cross-list from cs.RO) [pdf, other]
Title: Enhance Planning with Physics-informed Safety Controllor for End-to-end Autonomous Driving
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have limitations in achieving perfect accuracy on the training dataset and network performance can be affected by out-of-distribution problem. In this paper, we propose FusionAssurance, a novel trajectory-based end-to-end driving fusion framework which combines physics-informed control for safety assurance. By incorporating Potential Field into Model Predictive Control, FusionAssurance is capable of navigating through scenarios that are not included in the training dataset and scenarios where neural network fail to generalize. The effectiveness of the approach is demonstrated by extensive experiments under various scenarios on the CARLA benchmark.

[34]  arXiv:2405.00365 (cross-list from cs.IT) [pdf, other]
Title: Robust Continuous-Time Beam Tracking with Liquid Neural Network
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high overhead, we propose a novel solution for robust continuous-time beam tracking with liquid neural network, which dynamically adjust the narrow mmWave beams to ensure real-time beam alignment with mobile users. Through extensive simulations, we validate the effectiveness of our proposed method and demonstrate its superiority over existing state-of-the-art deep-learning-based approaches. Specifically, our scheme achieves at most 46.9% higher normalized spectral efficiency than the baselines when the user is moving at 5 m/s, demonstrating the potential of liquid neural networks to enhance mmWave mobile communication performance.

[35]  arXiv:2405.00367 (cross-list from cs.IR) [pdf, other]
Title: Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
Comments: Accepted at SIGIR 2024 short paper track
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.

[36]  arXiv:2405.00384 (cross-list from cs.CV) [pdf, other]
Title: Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications.

[37]  arXiv:2405.00387 (cross-list from cs.NI) [pdf, other]
Title: Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the Decision
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)

This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network. The problem arises from the fact that the traffic loads of sleeping base stations for the next time slot cannot be perfectly known, but they can rather be estimated, and any estimation error could result in divergence from the optimal decision, which subsequently affects the performance of energy efficiency. The traffic loads of the sleeping base stations for the next time slot are required because the switching decisions are made proactively in the current time slot. Two different Q-learning algorithms are developed; one is full-scale, focusing solely on the performance, while the other one is lightweight and addresses the computational cost. Results confirm that the estimation error is capable of changing cell switching decisions that yields performance divergence compared to no-error scenarios. Moreover, the developed Q-learning algorithms perform well since an insignificant difference (i.e., 0.3%) is observed between them and the optimum algorithm.

[38]  arXiv:2405.00389 (cross-list from math.OC) [pdf, other]
Title: Employing Federated Learning for Training Autonomous HVAC Systems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy.

[39]  arXiv:2405.00391 (cross-list from cs.IT) [pdf, other]
Title: Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial intelligence have been developed to reduce beamforming overhead. Intuitively, if we can implement the near-optimal beamforming only using a tiny subset of the all channel information, the overhead for channel estimation and beamforming would be reduced significantly compared with the traditional beamforming methods that usually need full channel information and the inversion of large dimensional matrix. In light of this idea, we propose a novel scheme that utilizes Wasserstein generative adversarial network with gradient penalty to infer the full beamforming matrices based on very little of channel information. Simulation results confirm that it can accomplish comparable performance with the weighted minimum mean-square error algorithm, while reducing the overhead by over 50%.

[40]  arXiv:2405.00426 (cross-list from cs.CR) [pdf, other]
Title: On the Potential of RIS in the Context of PLA in Wireless Communication Systems
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)

Re-configurable Intelligent Surfaces (RIS) technology has proven itself a promising candidate for the next generation of wireless networks through its enhanced performance in terms of throughput, spectral, and energy efficiency. However, the broadcast nature of RIS-assisted wireless communication makes it vulnerable to malicious attacks at the physical layer. On the other hand, physical layer authentication is an emerging area in the security domain to thwart different attacks such as cloning, spoofing, and impersonation by using the random features of the physical layer. In this paper, we investigate RIS-assisted wireless communication systems to unlock the potential of using RIS for physical layer authentication (PLA). Specifically, we exploit two distinct features of the physical layer: pathloss and channel impulse response (CIR) for PLA in RIS-assisted wireless communication. We construct hypothesis tests for the estimated features and derive the closed-form errors' expressions. Further, we chose the critical error, i.e., missed detection as our objective function for minimization by optimizing the phase shift of the RIS pannel. We compare the performance of our proposed mechanisms with baseline mechanisms which are PLA schemes using the same features but with no RIS assistance. Furthermore, we thoroughly evaluate our proposed schemes using performance metrics such as the probability of false alarm (PFA), the probability of missed detection (PMD), and the receiver operating characteristic (ROC) curves. The results demonstrate the significant positive impact of RIS on PLA, as it effectively reduces PMD values to zero when determining the optimal phase shift.

[41]  arXiv:2405.00447 (cross-list from math.OC) [pdf, other]
Title: A Modelling Framework for Energy-Management and Eco-Driving Problems using Convex Relaxations
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a convex optimization framework for eco-driving and vehicle energy management problems. We will first show that several types of eco-driving and vehicle energy management problems can be modelled using the same notions of energy storage buffers and energy storage converters that are connected to a power network. It will be shown that these problems can be formulated as optimization problems with linear cost functions and linear dynamics, and nonlinear constraints representing the power converters. We will show that under some mild conditions, the (non-convex) optimization problem has the same (globally) optimal solution as a convex relaxation. This means that the problems can be solved efficiently and that the solution is guaranteed to be globally optimal. Finally, a numerical example of the eco-driving problem is used to illustrate this claim.

[42]  arXiv:2405.00495 (cross-list from math.NA) [pdf, other]
Title: The Loewner framework for parametric systems: Taming the curse of dimensionality
Comments: 32 pages, 4 figures
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)

The Loewner framework is an interpolatory framework for the approximation of linear and nonlinear systems. The purpose here is to extend this framework to linear parametric systems with an arbitrary number n of parameters. One main innovation established here is the construction of data-based realizations for any number of parameters. Equally importantly, we show how to alleviate the computational burden, by avoiding the explicit construction of large-scale n-dimensional Loewner matrices of size $N \times N$. This reduces the complexity from $O(N^3)$ to about $O(N^{1.4})$, thus taming the curse of dimensionality and making the solution scalable to very large data sets. To achieve this, a new generalized multivariate rational function realization is defined. Then, we introduce the n-dimensional multivariate Loewner matrices and show that they can be computed by solving a coupled set of Sylvester equations. The null space of these Loewner matrices then allows the construction of the multivariate barycentric transfer function. The principal result of this work is to show how the null space of the n-dimensional Loewner matrix can be computed using a sequence of 1-dimensional Loewner matrices, leading to a drastic computational burden reduction. Finally, we suggest two algorithms (one direct and one iterative) to construct, directly from data, multivariate (or parametric) realizations ensuring (approximate) interpolation. Numerical examples highlight the effectiveness and scalability of the method.

[43]  arXiv:2405.00577 (cross-list from cs.LG) [pdf, ps, other]
Title: Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and model explainability techniques have been applied on fMRI datasets for disorder prediction tasks, with a particular emphasis on the robustness of biomarkers produced for neurodegenerative diseases and neuropsychiatric disorders. We found that while most studies have performant models, salient features highlighted in these studies vary greatly across studies on the same disorder and little has been done to evaluate their robustness. To address these issues, we suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of these potential biomarkers. We further highlight gaps in the existing literature and put together a prediction-attribution-evaluation framework that could set the foundations for future research on improving the robustness of potential biomarkers discovered via GNNs.

[44]  arXiv:2405.00603 (cross-list from cs.SD) [pdf, other]
Title: Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issues, we propose a novel framework for expressive voice conversion called "SAVC" based on soft speech units from HuBert-soft. Taking soft speech units as input, we design an attribute encoder to extract content and prosody features respectively. Specifically, we first introduce statistic perturbation imposed by adversarial style augmentation to eliminate speaker information. Then the prosody is implicitly modeled on soft speech units with knowledge distillation. Experiment results show that the intelligibility and naturalness of converted speech outperform previous work.

[45]  arXiv:2405.00665 (cross-list from cs.IT) [pdf, other]
Title: Optimizing Profitability in Timely Gossip Networks
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time. The users wish to maintain their expected version ages below a threshold, and can either rely on gossip from their neighbors or directly subscribe to a server publishing about the event, if the former option does not meet the timeliness requirements. The server wishes to maximize its profit by increasing subscriptions from users and minimizing event sampling frequency to reduce costs. This leads to a Stackelberg game between the server and the users where the sender is the leader deciding its sampling frequency and the users are the followers deciding their subscription strategies. We investigate equilibrium strategies for low-connectivity and high-connectivity topologies.

[46]  arXiv:2405.00670 (cross-list from cs.CV) [pdf, other]
Title: Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays
Comments: 7 pages, 3 figures, 3 tables. Submitted to Human Vision and Electronic Imaging 2024 (HVEI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Conventional image quality metrics (IQMs), such as PSNR and SSIM, are designed for perceptually uniform gamma-encoded pixel values and cannot be directly applied to perceptually non-uniform linear high-dynamic-range (HDR) colors. Similarly, most of the available datasets consist of standard-dynamic-range (SDR) images collected in standard and possibly uncontrolled viewing conditions. Popular pre-trained neural networks are likewise intended for SDR inputs, restricting their direct application to HDR content. On the other hand, training HDR models from scratch is challenging due to limited available HDR data. In this work, we explore more effective approaches for training deep learning-based models for image quality assessment (IQA) on HDR data. We leverage networks pre-trained on SDR data (source domain) and re-target these models to HDR (target domain) with additional fine-tuning and domain adaptation. We validate our methods on the available HDR IQA datasets, demonstrating that models trained with our combined recipe outperform previous baselines, converge much quicker, and reliably generalize to HDR inputs.

Replacements for Thu, 2 May 24

[47]  arXiv:2109.10561 (replaced) [pdf, ps, other]
Title: A Few-Shot Learning Approach for Sound Source Distance Estimation Using Relation Networks
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2208.02355 (replaced) [pdf, other]
Title: CAVE: Cerebral Artery-Vein Segmentation in Digital Subtraction Angiography
Comments: Published in Computerized Medical Imaging and Graphics
Subjects: Image and Video Processing (eess.IV)
[49]  arXiv:2211.00219 (replaced) [pdf, other]
Title: TITAN: Bringing The Deep Image Prior to Implicit Representations
Comments: 6 pages, 4 figures
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[50]  arXiv:2301.02608 (replaced) [pdf, other]
Title: An interpretable machine learning system for colorectal cancer diagnosis from pathology slides
Comments: Accepted at npj Precision Oncology. Available at: this https URL
Journal-ref: npj Precis. Onc. 8, 56 (2024)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[51]  arXiv:2303.15318 (replaced) [pdf, other]
Title: Closed-Loop Koopman Operator Approximation
Comments: 13 pages, 11 figures, 3 tables, accepted for accepted for publication in Machine Learning: Science and Technology
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[52]  arXiv:2304.01698 (replaced) [pdf, other]
Title: Inverse Unscented Kalman Filter
Comments: 20 pages, 5 figures. arXiv admin note: text overlap with arXiv:2210.00359
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP); Systems and Control (eess.SY); Machine Learning (stat.ML)
[53]  arXiv:2306.02176 (replaced) [pdf, other]
Title: TransRUPNet for Improved Polyp Segmentation
Comments: Accepted at EMBC 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[54]  arXiv:2307.06675 (replaced) [pdf, other]
Title: Meta-State-Space Learning: An Identification Approach for Stochastic Dynamical Systems
Comments: Accepted in Automatica
Subjects: Systems and Control (eess.SY)
[55]  arXiv:2307.10182 (replaced) [pdf, other]
Title: Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation
Comments: 11 pages, 4 figures
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[56]  arXiv:2307.15615 (replaced) [pdf, other]
Title: A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Comments: A list of open-sourced code from the papers reviewed has been organized and is available at this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[57]  arXiv:2309.01513 (replaced) [pdf, other]
Title: RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes
Comments: 5 pages, 3 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[58]  arXiv:2310.03289 (replaced) [pdf, other]
Title: Collaborative Safety-Critical Control for Networked Dynamic Systems
Comments: This work is under review for publication in the IEEE Transactions on Automatic Control
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Dynamical Systems (math.DS)
[59]  arXiv:2312.01009 (replaced) [pdf, other]
Title: Perceptive, Resilient, and Efficient Networks assisted by Reconfigurable Intelligent Surfaces
Comments: Submitted for publication in IEEE Vehicular Technology Magazine
Subjects: Signal Processing (eess.SP)
[60]  arXiv:2312.09197 (replaced) [pdf, other]
Title: Model-Free Change Point Detection for Mixing Processes
Comments: 20 pages, 4 figures. Accepted by IEEE OJ-CSYS
Subjects: Systems and Control (eess.SY)
[61]  arXiv:2401.11608 (replaced) [pdf, other]
Title: $\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[62]  arXiv:2401.15166 (replaced) [pdf, other]
Title: Probabilistic Design of Multi-Dimensional Spatially-Coupled Codes
Comments: 12 pages (double column), 5 figures, the short version has been accepted at the IEEE International Symposium on Information Theory (ISIT)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[63]  arXiv:2402.00384 (replaced) [pdf, ps, other]
Title: Adaptive FRIT-based Recursive Robust Controller Design Using Forgetting Factors
Comments: This work has been accepted to The 32nd Mediterranean Conference on Control and Automation (MED2024)
Subjects: Systems and Control (eess.SY)
[64]  arXiv:2403.00549 (replaced) [pdf, other]
Title: Relaxometry Guided Quantitative Cardiac Magnetic Resonance Image Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[65]  arXiv:2403.01699 (replaced) [pdf, other]
Title: Brilla AI: AI Contestant for the National Science and Maths Quiz
Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2403.10560 (replaced) [pdf, other]
Title: Holographic Phase Retrieval via Wirtinger Flow: Cartesian Form with Auxiliary Amplitude
Subjects: Information Theory (cs.IT); Graphics (cs.GR); Image and Video Processing (eess.IV); Numerical Analysis (math.NA)
[67]  arXiv:2403.13890 (replaced) [pdf, other]
Title: Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion Models
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[68]  arXiv:2403.16411 (replaced) [pdf, other]
Title: A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups
Comments: Preprint for L-CSS
Subjects: Systems and Control (eess.SY)
[69]  arXiv:2403.18257 (replaced) [pdf, other]
Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation
Comments: work in progress
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[70]  arXiv:2404.00247 (replaced) [pdf, ps, other]
Title: Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives
Comments: Final Version of Asian Control Conference (ASCC 2024)
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[71]  arXiv:2404.19242 (replaced) [pdf, other]
Title: A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Methodology (stat.ME)
[72]  arXiv:2404.19265 (replaced) [pdf, other]
Title: Mapping New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[ total of 72 entries: 1-72 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2404, contact, help  (Access key information)