We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 664 entries: 1-664 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 1 May 24

[1]  arXiv:2404.18932 [pdf, ps, other]
Title: Dynamic Model Switching for Improved Accuracy in Machine Learning
Subjects: Machine Learning (cs.LG)

In the dynamic landscape of machine learning, where datasets vary widely in size and complexity, selecting the most effective model poses a significant challenge. Rather than fixating on a single model, our research propels the field forward with a novel emphasis on dynamic model switching. This paradigm shift allows us to harness the inherent strengths of different models based on the evolving size of the dataset.
Consider the scenario where CatBoost demonstrates exceptional efficacy in handling smaller datasets, providing nuanced insights and accurate predictions. However, as datasets grow in size and intricacy, XGBoost, with its scalability and robustness, becomes the preferred choice.
Our approach introduces an adaptive ensemble that intuitively transitions between CatBoost and XGBoost. This seamless switching is not arbitrary; instead, it's guided by a user-defined accuracy threshold, ensuring a meticulous balance between model sophistication and data requirements. The user sets a benchmark, say 80% accuracy, prompting the system to dynamically shift to the new model only if it guarantees improved performance.
This dynamic model-switching mechanism aligns with the evolving nature of data in real-world scenarios. It offers practitioners a flexible and efficient solution, catering to diverse dataset sizes and optimising predictive accuracy at every juncture. Our research, therefore, stands at the forefront of innovation, redefining how machine learning models adapt and excel in the face of varying dataset dynamics.

[2]  arXiv:2404.18933 [pdf, other]
Title: Learning Low-Rank Feature for Thorax Disease Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-supervised learning with contrastive/restorative learning, have been employed for disease classification on radiographic images, there are no principled methods which can effectively reduce the adverse effect of noise and background, or non-disease areas, on the radiographic images for disease classification. To address this challenge, we propose a novel Low-Rank Feature Learning (LRFL) method in this paper, which is universally applicable to the training of all neural networks. The LRFL method is both empirically motivated by the low frequency property observed on all the medical datasets in this paper, and theoretically motivated by our sharp generalization bound for neural networks with low-rank features. In the empirical study, using a neural network such as a ViT or a CNN pre-trained on unlabeled chest X-rays by Masked Autoencoders (MAE), our novel LRFL method is applied on the pre-trained neural network and demonstrate better classification results in terms of both multiclass area under the receiver operating curve (mAUC) and classification accuracy.

[3]  arXiv:2404.18934 [pdf, ps, other]
Title: The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video
Comments: 36 pages, 1 table, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

We introduce the Visual Experience Dataset (VEDB), a compilation of over 240 hours of egocentric video combined with gaze- and head-tracking data that offers an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 58 observers ranging from 6-49 years old. This paper outlines the data collection, processing, and labeling protocols undertaken to ensure a representative sample and discusses the potential sources of error or bias within the dataset. The VEDB's potential applications are vast, including improving gaze tracking methodologies, assessing spatiotemporal image statistics, and refining deep neural networks for scene and activity recognition. The VEDB is accessible through established open science platforms and is intended to be a living dataset with plans for expansion and community contributions. It is released with an emphasis on ethical considerations, such as participant privacy and the mitigation of potential biases. By providing a dataset grounded in real-world experiences and accompanied by extensive metadata and supporting code, the authors invite the research community to utilize and contribute to the VEDB, facilitating a richer understanding of visual perception and behavior in naturalistic settings.

[4]  arXiv:2404.18935 [pdf, other]
Title: What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
Comments: Accepted in WACV-2024. Supplementary at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.

[5]  arXiv:2404.18940 [pdf, ps, other]
Title: Conceptual Mapping of Controversies
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

With our work, we contribute towards a qualitative analysis of the discourse on controversies in online news media. For this, we employ Formal Concept Analysis and the economics of conventions to derive conceptual controversy maps. In our experiments, we analyze two maps from different news journals with methods from ordinal data science. We show how these methods can be used to assess the diversity, complexity and potential bias of controversies. In addition to that, we discuss how the diagrams of concept lattices can be used to navigate between news articles.

[6]  arXiv:2404.18942 [pdf, other]
Title: GuideWalk -- Heterogeneous Data Fusion for Enhanced Learning -- A Multiclass Document Classification Case
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

One of the prime problems of computer science and machine learning is to extract information efficiently from large-scale, heterogeneous data. Text data, with its syntax, semantics, and even hidden information content, possesses an exceptional place among the data types in concern. The processing of the text data requires embedding, a method of translating the content of the text to numeric vectors. A correct embedding algorithm is the starting point for obtaining the full information content of the text data. In this work, a new embedding method based on the graph structure of the meaningful sentences is proposed. The design of the algorithm aims to construct an embedding vector that constitutes syntactic and semantic elements as well as the hidden content of the text data. The success of the proposed embedding method is tested in classification problems. Among the wide range of application areas, text classification is the best laboratory for embedding methods; the classification power of the method can be tested using dimensional reduction without any further processing. Furthermore, the method can be compared with different embedding algorithms and machine learning methods. The proposed method is tested with real-world data sets and eight well-known and successful embedding algorithms. The proposed embedding method shows significantly better classification for binary and multiclass datasets compared to well-known algorithms.

[7]  arXiv:2404.18943 [pdf, ps, other]
Title: Using artificial intelligence methods for the studyed visual analyzer
Comments: in Rusian language
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The paper describes how various techniques for applying artificial intelligence to the study of human eyes are utilized. The first dataset was collected using computerized perimetry to investigate the visualization of the human visual field and the diagnosis of glaucoma. A method to analyze the image using software tools is proposed. The second dataset was obtained, as part of the implementation of a Russian-Swiss experiment to collect and analyze eye movement data using the Tobii Pro Glasses 3 device on VR video. Eye movements and focus on the recorded route of a virtual journey through the canton of Vaud were investigated. Methods are being developed to investigate the dependencies of eye pupil movements using mathematical modelling. VR-video users can use these studies in medicine to assess the course and deterioration of glaucoma patients and to study the mechanisms of attention to tourist attractions.

[8]  arXiv:2404.18944 [pdf, ps, other]
Title: Investigating the dissemination of STEM content on social media with computational tools
Comments: 17 pages, 3 figures, 3 supplemental figures
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Social media platforms can quickly disseminate STEM content to diverse audiences, but their operation can be mysterious. We used open-source machine learning methods such as clustering, regression, and sentiment analysis to analyze over 1000 videos and metrics thereof from 6 social media STEM creators. Our data provide insights into how audiences generate interest signals(likes, bookmarks, comments, shares), on the correlation of various signals with views, and suggest that content from newer creators is disseminated differently. We also share insights on how to optimize dissemination by analyzing data available exclusively to content creators as well as via sentiment analysis of comments.

[9]  arXiv:2404.18947 [pdf, other]
Title: Multimodal Fusion on Low-quality Data: A Comprehensive Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges and recent advances of multimodal fusion in the wild and presents them in a comprehensive taxonomy. From a data-centric view, we identify four main challenges that are faced by multimodal fusion on low-quality data, namely (1) noisy multimodal data that are contaminated with heterogeneous noises, (2) incomplete multimodal data that some modalities are missing, (3) imbalanced multimodal data that the qualities or properties of different modalities are significantly different and (4) quality-varying multimodal data that the quality of each modality dynamically changes with respect to different samples. This new taxonomy will enable researchers to understand the state of the field and identify several potential directions. We also provide discussion for the open problems in this field together with interesting future research directions.

[10]  arXiv:2404.18948 [pdf, other]
Title: Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods
Comments: IJCAI 2024
Subjects: Machine Learning (cs.LG)

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable mapping function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment.

[11]  arXiv:2404.18949 [pdf, other]
Title: The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth
Comments: arXiv admin note: text overlap with arXiv:2404.16890
Subjects: Machine Learning (cs.LG)

While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. The source code will be publicly released upon acceptance of the article.

[12]  arXiv:2404.18952 [pdf, other]
Title: CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
Comments: To be published in the proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.

[13]  arXiv:2404.18955 [pdf, other]
Title: GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize relationships among genes for improving GA's accuracy and efficiency. We design a directed multipartite graph encapsulating the solution space, called RGGR, where each node corresponds to a gene in the solution and the edge represents the relationship between adjacent nodes. The edge's weight reflects the relationship degree and is updated based on the idea that the edges' weights in a complete chain as candidate solution with acceptable or unacceptable performance should be strengthened or reduced, respectively. The obtained RGGR is then employed to determine appropriate loci of crossover and mutation operators, thereby directing the evolutionary process toward faster and better convergence. We analyze and validate our proposed GRGA approach in a single-objective multimodal optimization problem, and further test it on three types of applications, including feature selection, text summarization, and dimensionality reduction. Results illustrate that our GARA is effective and promising.

[14]  arXiv:2404.18961 [pdf, other]
Title: Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras
Comments: 60 figures, 116 pages, 500+ references
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the past twenty years, MTL has become widely recognized as a flexible and effective approach in various fields, including CV, NLP, recommendation systems, disease prognosis and diagnosis, and robotics. This survey provides a comprehensive overview of the evolution of MTL, encompassing the technical aspects of cutting-edge methods from traditional approaches to deep learning and the latest trend of pretrained foundation models. Our survey methodically categorizes MTL techniques into five key areas: regularization, relationship learning, feature propagation, optimization, and pre-training. This categorization not only chronologically outlines the development of MTL but also dives into various specialized strategies within each category. Furthermore, the survey reveals how the MTL evolves from handling a fixed set of tasks to embracing a more flexible approach free from task or modality constraints. It explores the concepts of task-promptable and -agnostic training, along with the capacity for ZSL, which unleashes the untapped potential of this historically coveted learning paradigm. Overall, we hope this survey provides the research community with a comprehensive overview of the advancements in MTL from its inception in 1997 to the present in 2023. We address present challenges and look ahead to future possibilities, shedding light on the opportunities and potential avenues for MTL research in a broad manner. This project is publicly available at https://github.com/junfish/Awesome-Multitask-Learning.

[15]  arXiv:2404.18962 [pdf, other]
Title: An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The performance of Federated Learning (FL) hinges on the effectiveness of utilizing knowledge from distributed datasets. Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round. This process can cause client drift, especially with significant cross-client data heterogeneity, impacting model performance and convergence of the FL algorithm. To address these challenges, we introduce FedAF, a novel aggregation-free FL algorithm. In this framework, clients collaboratively learn condensed data by leveraging peer knowledge, the server subsequently trains the global model using the condensed data and soft labels received from the clients. FedAF inherently avoids the issue of client drift, enhances the quality of condensed data amid notable data heterogeneity, and improves the global model performance. Extensive numerical studies on several popular benchmark datasets show FedAF surpasses various state-of-the-art FL algorithms in handling label-skew and feature-skew data heterogeneity, leading to superior global model accuracy and faster convergence.

[16]  arXiv:2404.18963 [pdf, other]
Title: RE-GrievanceAssist: Enhancing Customer Experience through ML-Powered Complaint Management
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

In recent years, digital platform companies have faced increasing challenges in managing customer complaints, driven by widespread consumer adoption. This paper introduces an end-to-end pipeline, named RE-GrievanceAssist, designed specifically for real estate customer complaint management. The pipeline consists of three key components: i) response/no-response ML model using TF-IDF vectorization and XGBoost classifier ; ii) user type classifier using fasttext classifier; iii) issue/sub-issue classifier using TF-IDF vectorization and XGBoost classifier. Finally, it has been deployed as a batch job in Databricks, resulting in a remarkable 40% reduction in overall manual effort with monthly cost reduction of Rs 1,50,000 since August 2023.

[17]  arXiv:2404.18968 [pdf, other]
Title: Equitable Connected Partition and Structural Parameters Revisited: N-fold Beats Lenstra
Subjects: Data Structures and Algorithms (cs.DS)

We study the Equitable Connected Partition (ECP for short) problem, where we are given a graph G=(V,E) together with an integer p, and our goal is to find a partition of V into p parts such that each part induces a connected sub-graph of G and the size of each two parts differs by at most 1. On the one hand, the problem is known to be NP-hard in general and W[1]-hard with respect to the path-width, the feedback-vertex set, and the number of parts p combined. On the other hand, fixed-parameter algorithms are known for parameters the vertex-integrity and the max leaf number.
As our main contribution, we resolve a long-standing open question [Enciso et al.; IWPEC '09] regarding the parameterisation by the tree-depth of the underlying graph. In particular, we show that ECP is W[1]-hard with respect to the 4-path vertex cover number, which is an even more restrictive structural parameter than the tree-depth. In addition to that, we show W[1]-hardness of the problem with respect to the feedback-edge set, the distance to disjoint paths, and NP-hardness with respect to the shrub-depth and the clique-width. On a positive note, we propose several novel fixed-parameter algorithms for various parameters that are bounded for dense graphs.

[18]  arXiv:2404.18971 [pdf, other]
Title: Credible, Unreliable or Leaked?: Evidence Verification for Enhanced Automated Fact-checking
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Automated fact-checking (AFC) is garnering increasing attention by researchers aiming to help fact-checkers combat the increasing spread of misinformation online. While many existing AFC methods incorporate external information from the Web to help examine the veracity of claims, they often overlook the importance of verifying the source and quality of collected "evidence". One overlooked challenge involves the reliance on "leaked evidence", information gathered directly from fact-checking websites and used to train AFC systems, resulting in an unrealistic setting for early misinformation detection. Similarly, the inclusion of information from unreliable sources can undermine the effectiveness of AFC systems. To address these challenges, we present a comprehensive approach to evidence verification and filtering. We create the "CREDible, Unreliable or LEaked" (CREDULE) dataset, which consists of 91,632 articles classified as Credible, Unreliable and Fact checked (Leaked). Additionally, we introduce the EVidence VERification Network (EVVER-Net), trained on CREDULE to detect leaked and unreliable evidence in both short and long texts. EVVER-Net can be used to filter evidence collected from the Web, thus enhancing the robustness of end-to-end AFC systems. We experiment with various language models and show that EVVER-Net can demonstrate impressive performance of up to 91.5% and 94.4% accuracy, while leveraging domain credibility scores along with short or long texts, respectively. Finally, we assess the evidence provided by widely-used fact-checking datasets including LIAR-PLUS, MOCHEG, FACTIFY, NewsCLIPpings+ and VERITE, some of which exhibit concerning rates of leaked and unreliable evidence.

[19]  arXiv:2404.18972 [pdf, other]
Title: Impact of whole-body vibrations on electrovibration perception varies with target stimulus duration
Comments: 28 pages; 7 figures, journal
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Systems and Control (eess.SY)

This study explores the impact of whole-body vibrations induced by external vehicle perturbations, such as aircraft turbulence, on the perception of electrovibration displayed on touchscreens. Electrovibration holds promise as a technology for providing tactile feedback on future touchscreens, addressing usability challenges in vehicle cockpits. However, its performance under dynamic conditions, such as during whole-body vibrations induced by turbulence, still needs to be explored. We measured the absolute detection thresholds of 15 human participants for short- and long-duration electrovibration stimuli displayed on a touchscreen, both in the absence and presence of two types of turbulence motion generated by a motion simulator. Concurrently, we measured participants' applied contact force and finger scan speeds. Significantly higher (38%) absolute detection thresholds were observed for short electrovibration stimuli than for long stimuli. Finger scan speeds in the direction of turbulence, applied forces, and force fluctuation rates increased during whole-body vibrations due to biodynamic feedthrough. As a result, turbulence also significantly increased the perception thresholds, but only for short-duration electrovibration stimuli. The results reveal that whole-body vibrations can impede the perception of short-duration electrovibration stimuli, due to involuntary finger movements and increased normal force fluctuations. Our findings offer valuable insights for the future design of touchscreens with tactile feedback in vehicle cockpits.

[20]  arXiv:2404.18975 [pdf, ps, other]
Title: M3H: Multimodal Multitask Machine Learning for Healthcare
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent breakthroughs in AI are poised to fundamentally enhance our study and understanding of healthcare. The development of an integrated many-to-many framework that leverages multiple data modality inputs for the analytical modeling of multiple medical tasks, is critical for a unified understanding of modern medicine. In this work, we introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from diverse multimodal inputs across a broad spectrum of medical task categories and machine learning problem classes. The modular design of the framework ensures its generalizable data processing, task definition, and rapid model prototyping, applicable to both clinical and operational healthcare settings. We evaluate the M3H framework by validating models trained from four modalities (tabular, time-series, language, and vision) on 41 medical tasks across 4 machine learning problem classes. Our results demonstrate that M3H consistently produces multitask models that outperform canonical single-task models (by 1.1- 37.2%) across 37 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task: spanning ML problem classes of supervised binary classification, multiclass classification, regression, and clustering. Additionally, the framework introduces a novel attention mechanism to balance self-exploitation (focus on learning source task), and cross-exploration (encourage learning from other tasks). Furthermore, M3H provides explainability insights on how joint learning of additional tasks impacts the learning of source task using a proposed TIM score, shedding light into the dynamics of task interdependencies. Its adaptable architecture facilitates the customization and integration, establishing it as a robust and scalable candidate solution for future AI-driven healthcare systems.

[21]  arXiv:2404.18976 [pdf, other]
Title: Foundations of Multisensory Artificial Intelligence
Authors: Paul Pu Liang
Comments: CMU Machine Learning Department PhD Thesis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of theoretical frameworks and application domains, this thesis aims to advance the machine learning foundations of multisensory AI. In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets, design principled approaches to learn these interactions, and analyze whether their model has succeeded in learning. In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks, which presents a step toward grounding large language models to real-world sensory modalities. We introduce MultiBench, a unified large-scale benchmark across a wide range of modalities, tasks, and research areas, followed by the cross-modal attention and multimodal transformer architectures that now underpin many of today's multimodal foundation models. Scaling these architectures on MultiBench enables the creation of general-purpose multisensory AI systems, and we discuss our collaborative efforts in applying these models for real-world impact in affective computing, mental health, cancer prognosis, and robotics. Finally, we conclude this thesis by discussing how future work can leverage these ideas toward more general, interactive, and safe multisensory AI.

[22]  arXiv:2404.18977 [pdf, other]
Title: Computational Job Market Analysis with Natural Language Processing
Authors: Mike Zhang
Comments: Ph.D. Thesis (315 total pages, 52 figures). The thesis slightly modified with this https URL ISBN (electronic): 978-87-7949-414-5
Subjects: Computation and Language (cs.CL)

[Abridged Abstract]
Recent technological advances underscore labor market dynamics, yielding significant consequences for employment prospects and increasing job vacancy data across platforms and languages. Aggregating such data holds potential for valuable insights into labor market demands, new skills emergence, and facilitating job matching for various stakeholders. However, despite prevalent insights in the private sector, transparent language technology systems and data for this domain are lacking. This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions, identifying challenges including scarcity of training data, lack of standardized annotation guidelines, and shortage of effective extraction methods from job ads. We frame the problem, obtaining annotated data, and introducing extraction methodologies. Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training. We propose skill extraction using weak supervision, a taxonomy-aware pre-training methodology adapting multilingual language models to the job market domain, and a retrieval-augmented model leveraging multiple skill extraction datasets to enhance overall performance. Finally, we ground extracted information within a designated taxonomy.

[23]  arXiv:2404.18978 [pdf, other]
Title: Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs
Comments: Accepted as a full paper at EDM 2024: The 17th International Conference on Educational Data Mining, 14-17 of July 2024, Atlanta
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.

[24]  arXiv:2404.18982 [pdf, ps, other]
Title: Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning
Authors: Paul Thagard
Subjects: Artificial Intelligence (cs.AI)

Explanatory inference is the creation and evaluation of hypotheses that provide explanations, and is sometimes known as abduction or abductive inference. Generative AI is a new set of artificial intelligence models based on novel algorithms for generating text, images, and sounds. This paper proposes a set of benchmarks for assessing the ability of AI programs to perform explanatory inference, and uses them to determine the extent to which ChatGPT, a leading generative AI model, is capable of making explanatory inferences. Tests on the benchmarks reveal that ChatGPT performs creative and evaluative inferences in many domains, although it is limited to verbal and visual modalities. Claims that ChatGPT and similar models are incapable of explanation, understanding, causal reasoning, meaning, and creativity are rebutted.

[25]  arXiv:2404.18984 [pdf, other]
Title: "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data
Comments: Submitted To Scientific Data
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped ``like'' interactions and time of bookmarking. This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.

[26]  arXiv:2404.18988 [pdf, other]
Title: Markovian Agents for Truthful Language Modeling
Comments: 21 pages, 6 figures
Subjects: Computation and Language (cs.CL)

Chain-of-Thought (CoT) reasoning could in principle enable a deeper understanding of a language model's (LM) internal reasoning. However, prior work suggests that some LMs answer questions similarly despite changes in their CoT, suggesting that those models are not truly using the CoT. We propose a training method to produce CoTs that are sufficient alone for predicting future text, independent of other context. This methodology gives a guarantee that if the LM can predict future tokens, then it must have used the CoT to understand its context. We formalize the idea that the truthfulness of a sender to a receiver LM is the degree to which the sender helps the receiver predict their future observations. Then we define a "Markovian" LM as one which predicts future text given only a CoT as context. We derive a "Markovian training" procedure by applying our definition of truthfulness to a Markovian LM and optimizing via policy gradient and Proximal Policy Optimization (PPO). We demonstrate the effectiveness of our training algorithm on long-context arithmetic problems, show that the model utilizes the CoT, and validate that the generated CoT is meaningful and usable by other models.

[27]  arXiv:2404.18989 [pdf, ps, other]
Title: Cyberbully and Online Harassment: Issues Associated with Digital Wellbeing
Authors: Manasi Kulkarni (1), Siddhi Durve (1), Bochen Jia (1) ((1) Department of Industrial & Systems Engineering, University of Michigan-Dearborn, MI, USA)
Comments: 35 pages, 7 figures
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

As digital technology becomes increasingly embedded in daily life, its impact on social interactions has become a critical area of study, particularly concerning cyberbullying. This meta-analysis investigates the dual role of technology in cyberbullying both as a catalyst that can exacerbate the issue and as a potential solution. Cyberbullying, characterized by the use of digital platforms to harass, threaten, or humiliate individuals, poses significant challenges to mental and social wellbeing. This research synthesizes empirical findings from diverse studies to evaluate how innovative technological interventions, such as content monitoring algorithms, anonymous reporting systems, and educational initiatives integrated within digital platforms, contribute to reducing the prevalence of cyberbullying. The study focuses on the effectiveness of these interventions in various settings, highlighting the need for adaptive strategies that respond to the dynamic digital landscape. By offering a comprehensive overview of the current state of cyberbullying and the efficacy of technology based solutions, this analysis provides valuable insights for stakeholders, including educators, policymakers, and technology developers, aiming to enhance digital wellbeing and create safer online environments. The findings underscore the importance of leveraging technology not only as a medium of communication but also as a strategic tool to combat the negative impacts of cyberbullying, thus promoting a more inclusive and respectful digital world.

[28]  arXiv:2404.18990 [pdf, ps, other]
Title: Timely Status Updates in Slotted ALOHA Network With Energy Harvesting
Comments: Submitted to IEEE Transaction of Communications. A short version [arXiv:[2310.00348] was presented at GLOBECOM 2023. Simulation code: this https URL
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We investigate the age of information (AoI) in a scenario where energy-harvesting devices send status updates to a gateway following the slotted ALOHA protocol and receive no feedback. We let the devices adjust the transmission probabilities based on their current battery level. Using a Markovian analysis, we derive analytically the average AoI. We further provide an approximate analysis for accurate and easy-to-compute approximations of both the average AoI and the age-violation probability (AVP), i.e., the probability that the AoI exceeds a given threshold. We also analyze the average throughput. Via numerical results, we investigate two baseline strategies: transmit a new update whenever possible to exploit every opportunity to reduce the AoI, and transmit only when sufficient energy is available to increase the chance of successful decoding. The two strategies are beneficial for low and high update-generation rates, respectively. We show that an optimized policy that balances the two strategies outperforms them significantly in terms of both AoI metrics and throughput. Finally, we show the benefit of decoding multiple packets in a slot using successive interference cancellation and adapting the transmission probability based on both the current battery level and the time elapsed since the last transmission.

[29]  arXiv:2404.19007 [pdf, other]
Title: How Did We Get Here? Summarizing Conversation Dynamics
Comments: To appear in the Proceedings of NAACL 2024. Data available in ConvoKit this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Throughout a conversation, the way participants interact with each other is in constant flux: their tones may change, they may resort to different strategies to convey their points, or they might alter their interaction patterns. An understanding of these dynamics can complement that of the actual facts and opinions discussed, offering a more holistic view of the trajectory of the conversation: how it arrived at its current state and where it is likely heading.
In this work, we introduce the task of summarizing the dynamics of conversations, by constructing a dataset of human-written summaries, and exploring several automated baselines. We evaluate whether such summaries can capture the trajectory of conversations via an established downstream task: forecasting whether an ongoing conversation will eventually derail into toxic behavior. We show that they help both humans and automated systems with this forecasting task. Humans make predictions three times faster, and with greater confidence, when reading the summaries than when reading the transcripts. Furthermore, automated forecasting systems are more accurate when constructing, and then predicting based on, summaries of conversation dynamics, compared to directly predicting on the transcripts.

[30]  arXiv:2404.19015 [pdf, other]
Title: Simple-RF: Regularizing Sparse Input Radiance Fields with Simpler Solutions
Comments: The source code for our model can be found on our project page: this https URL arXiv admin note: substantial text overlap with arXiv:2309.03955
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Neural Radiance Fields (NeRF) show impressive performance in photo-realistic free-view rendering of scenes. Recent improvements on the NeRF such as TensoRF and ZipNeRF employ explicit models for faster optimization and rendering, as compared to the NeRF that employs an implicit representation. However, both implicit and explicit radiance fields require dense sampling of images in the given scene. Their performance degrades significantly when only a sparse set of views is available. Researchers find that supervising the depth estimated by a radiance field helps train it effectively with fewer views. The depth supervision is obtained either using classical approaches or neural networks pre-trained on a large dataset. While the former may provide only sparse supervision, the latter may suffer from generalization issues. As opposed to the earlier approaches, we seek to learn the depth supervision by designing augmented models and training them along with the main radiance field. Further, we aim to design a framework of regularizations that can work across different implicit and explicit radiance fields. We observe that certain features of these radiance field models overfit to the observed images in the sparse-input scenario. Our key finding is that reducing the capability of the radiance fields with respect to positional encoding, the number of decomposed tensor components or the size of the hash table, constrains the model to learn simpler solutions, which estimate better depth in certain regions. By designing augmented models based on such reduced capabilities, we obtain better depth supervision for the main radiance field. We achieve state-of-the-art view-synthesis performance with sparse input views on popular datasets containing forward-facing and 360$^\circ$ scenes by employing the above regularizations.

[31]  arXiv:2404.19019 [pdf, other]
Title: Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering
Comments: To appear at SPAA 2024
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings obtained by contracting the edges of $T$ in order of weight. Existing algorithms for computing the SLD all require $\Omega(n\log n)$ work where $n = |T|$. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem.
In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires $O(n \log h)$ work and $O(\log^2 n \log^2 h)$ depth, where $h$ is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves $O(n\log h)$ work and $O(h \log n)$ depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.

[32]  arXiv:2404.19020 [pdf, ps, other]
Title: Information literacy development and assessment at school level: a systematic review of the literature
Subjects: Information Retrieval (cs.IR)

Information literacy (IL) involves a group of competences and fundamental skills in the 21st century. Today, society operates around information, which is challenging considering the vast amount of content available online. People must be capable of searching, critically assessing, making sense of, and communicating information. This set of competences must be properly developed since childhood, especially if considering early age access to online resources. To better understand the evolution and current status of IL development and assessment at school (K-12) level, we conducted a systematic literature review based on the guidelines established by the PRISMA statement. Our review led us to an initial set of 1,234 articles, from which 53 passed the inclusion criteria. These articles were used to address six research questions focused on IL definitions, skills, standards, and assessment tools. Our review shows IL evolution over the years and how it has been formalisedthrough definitions and standards. These findings reveal key gaps that must be addressed in order to advance the field further. Keywords: Elementary education, Information literacy, Secondary education, 21st Century abilities.

[33]  arXiv:2404.19021 [pdf, ps, other]
Title: Enhancing Autonomous Vehicle Design and Testing: A Comprehensive Review of AR and VR Integration
Subjects: Human-Computer Interaction (cs.HC)

This comprehensive literature review explores the potential of Augmented Reality and Virtual Reality technologies to enhance the design and testing of autonomous vehicles. By analyzing existing research, the review aims to identify how AR and VR can be leveraged to improve various aspects of autonomous vehicle development, including: creating more realistic and comprehensive testing environments, facilitating the design of user centered interfaces, and safely evaluating driver behavior in complex scenarios. Ultimately, the review highlights AR and VR utilization as a key driver in the development of adaptable testing environments, fostering more dependable autonomous vehicle technology, and ultimately propelling significant advancements within the field.

[34]  arXiv:2404.19024 [pdf, other]
Title: Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Comments: Accepted to ICDAR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Documents are 2-dimensional carriers of written communication, and as such their interpretation requires a multi-modal approach where textual and visual information are efficiently combined. Document Visual Question Answering (Document VQA), due to this multi-modal nature, has garnered significant interest from both the document understanding and natural language processing communities. The state-of-the-art single-page Document VQA methods show impressive performance, yet in multi-page scenarios, these methods struggle. They have to concatenate all pages into one large page for processing, demanding substantial GPU resources, even for evaluation. In this work, we propose a novel method and efficient training strategy for multi-page Document VQA tasks. In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct. Our approach utilizes a self-attention scoring mechanism to generate relevance scores for each document page, enabling the retrieval of pertinent pages. This adaptation allows us to extend single-page Document VQA models to multi-page scenarios without constraints on the number of pages during evaluation, all with minimal demand for GPU resources. Our extensive experiments demonstrate not only achieving state-of-the-art performance without the need for Optical Character Recognition (OCR), but also sustained performance in scenarios extending to documents of nearly 800 pages compared to a maximum of 20 pages in the MP-DocVQA dataset. Our code is publicly available at \url{https://github.com/leitro/SelfAttnScoring-MPDocVQA}.

[35]  arXiv:2404.19025 [pdf, ps, other]
Title: Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery
Comments: conference
Journal-ref: The 2023 Conference on Empirical Methods in Natural Language Processing. 2023
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Binary code analysis has immense importance in the research domain of software security. Today, software is very often compiled for various Instruction Set Architectures (ISAs). As a result, cross-architecture binary code analysis has become an emerging problem. Recently, deep learning-based binary analysis has shown promising success. It is widely known that training a deep learning model requires a massive amount of data. However, for some low-resource ISAs, an adequate amount of data is hard to find, preventing deep learning from being widely adopted for binary analysis. To overcome the data scarcity problem and facilitate cross-architecture binary code analysis, we propose to apply the ideas and techniques in Neural Machine Translation (NMT) to binary code analysis. Our insight is that a binary, after disassembly, is represented in some assembly language. Given a binary in a low-resource ISA, we translate it to a binary in a high-resource ISA (e.g., x86). Then we can use a model that has been trained on the high-resource ISA to test the translated binary. We have implemented the model called UNSUPERBINTRANS, and conducted experiments to evaluate its performance. Specifically, we conducted two downstream tasks, including code similarity detection and vulnerability discovery. In both tasks, we achieved high accuracies.

[36]  arXiv:2404.19026 [pdf, other]
Title: MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.

[37]  arXiv:2404.19028 [pdf, other]
Title: Adaptive Regulated Sparsity Promoting Approach for Data-Driven Modeling and Control of Grid-Connected Solar Photovoltaic Generation
Subjects: Systems and Control (eess.SY)

This paper aims to introduce a new statistical learning technique based on sparsity promoting for data-driven modeling and control of solar photovoltaic (PV) systems. Compared with conventional sparse regression techniques that might introduce computational complexities when the number of candidate functions increases, an innovative algorithm, named adaptive regulated sparse regression (ARSR) is proposed that adaptively regulates the hyperparameter weights of candidate functions to best represent the dynamics of PV systems. Utilizing this algorithm, open-loop and closed-loop models of single-stage and two-stage PV systems are obtained from measurements and are utilized for control design purposes. Moreover, it is demonstrated that the proposed data-driven approach can successfully be employed for fault analysis studies, which distinguishes its capabilities compared with other data-driven techniques. Finally, the proposed approach is validated through real-time simulations.

[38]  arXiv:2404.19031 [pdf, other]
Title: Machine Unlearning for Document Classification
Comments: Accepted to ICDAR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a well-trained model and possesses only a small portion of training data. This setup is designed for efficient forgetting manipulation. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications. Our code is publicly available at \url{https://github.com/leitro/MachineUnlearning-DocClassification}.

[39]  arXiv:2404.19038 [pdf, other]
Title: Embedded Representation Learning Network for Animating Styled Video Portrait
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable talking heads. (2) Displacement artifacts around the neck in rendered images. To overcome these two challenges, we propose a novel generative paradigm \textit{Embedded Representation Learning Network} (ERLNet) with two learning stages. First, the \textit{ audio-driven FLAME} (ADF) module is constructed to produce facial expression and head pose sequences synchronized with content audio and style video. Second, given the sequence deduced by the ADF, one novel \textit{dual-branch fusion NeRF} (DBF-NeRF) explores these contents to render the final images. Extensive empirical studies demonstrate that the collaboration of these two stages effectively facilitates our method to render a more realistic talking head than the existing algorithms.

[40]  arXiv:2404.19040 [pdf, other]
Title: GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Gaussian deformation field to translate and transform 3D Gaussians to synchronize with audio information, in which multi-resolution hashing grid-based tri-plane and temporal smooth module are incorporated to learn accurate deformation for fine-grained facial details. In addition, a pose-conditioned deformation field is designed to model the stabilized torso. To enable efficient optimization of the condition Gaussian deformation field, we initialize 3D Gaussians by learning a coarse static Gaussian representation. Extensive experiments in person-specific videos with audio tracks validate that GSTalker can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.

[41]  arXiv:2404.19043 [pdf, other]
Title: Improving Interpretability of Deep Active Learning for Flood Inundation Mapping Through Class Ambiguity Indices Using Multi-spectral Satellite Imagery
Authors: Hyunho Lee, Wenwen Li
Comments: 46 pages, 11 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Flood inundation mapping is a critical task for responding to the increasing risk of flooding linked to global warming. Significant advancements of deep learning in recent years have triggered its extensive applications, including flood inundation mapping. To cope with the time-consuming and labor-intensive data labeling process in supervised learning, deep active learning strategies are one of the feasible approaches. However, there remains limited exploration into the interpretability of how deep active learning strategies operate, with a specific focus on flood inundation mapping in the field of remote sensing. In this study, we introduce a novel framework of Interpretable Deep Active Learning for Flood inundation Mapping (IDAL-FIM), specifically in terms of class ambiguity of multi-spectral satellite images. In the experiments, we utilize Sen1Floods11 dataset, and adopt U-Net with MC-dropout. In addition, we employ five acquisition functions, which are the random, K-means, BALD, entropy, and margin acquisition functions. Based on the experimental results, we demonstrate that two proposed class ambiguity indices are effective variables to interpret the deep active learning by establishing statistically significant correlation with the predictive uncertainty of the deep learning model at the tile level. Then, we illustrate the behaviors of deep active learning through visualizing two-dimensional density plots and providing interpretations regarding the operation of deep active learning, in flood inundation mapping.

[42]  arXiv:2404.19045 [pdf, other]
Title: Maritime Vessel Tank Inspection using Aerial Robots: Experience from the field and dataset release
Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024
Subjects: Robotics (cs.RO)

This paper presents field results and lessons learned from the deployment of aerial robots inside ship ballast tanks. Vessel tanks including ballast tanks and cargo holds present dark, dusty environments having simultaneously very narrow openings and wide open spaces that create several challenges for autonomous navigation and inspection operations. We present a system for vessel tank inspection using an aerial robot along with its autonomy modules. We show the results of autonomous exploration and visual inspection in 3 ships spanning across 7 distinct types of sections of the ballast tanks. Additionally, we comment on the lessons learned from the field and possible directions for future work. Finally, we release a dataset consisting of the data from these missions along with data collected with a handheld sensor stick.

[43]  arXiv:2404.19048 [pdf, other]
Title: A Framework for Real-time Safeguarding the Text Generation of Large Language
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. To address this, various approaches have been developed to safeguard LLMs from producing unsafe content. However, existing methods have limitations, including the need for training specific control models and proactive intervention during text generation, that lead to quality degradation and increased computational overhead. To mitigate those limitations, we propose LLMSafeGuard, a lightweight framework to safeguard LLM text generation in real-time. LLMSafeGuard integrates an external validator into the beam search algorithm during decoding, rejecting candidates that violate safety constraints while allowing valid ones to proceed. We introduce a similarity based validation approach, simplifying constraint introduction and eliminating the need for control model training. Additionally, LLMSafeGuard employs a context-wise timing selection strategy, intervening LLMs only when necessary. We evaluate LLMSafe-Guard on two tasks, detoxification and copyright safeguarding, and demonstrate its superior performance over SOTA baselines. For instance, LLMSafeGuard reduces the average toxic score of. LLM output by 29.7% compared to the best baseline meanwhile preserving similar linguistic quality as natural output in detoxification task. Similarly, in the copyright task, LLMSafeGuard decreases the Longest Common Subsequence (LCS) by 56.2% compared to baselines. Moreover, our context-wise timing selection strategy reduces inference time by at least 24% meanwhile maintaining comparable effectiveness as validating each time step. LLMSafeGuard also offers tunable parameters to balance its effectiveness and efficiency.

[44]  arXiv:2404.19051 [pdf, ps, other]
Title: Assembling Modular, Hierarchical Cognitive Map Learners with Hyperdimensional Computing
Comments: Accepted for the World Congress on Computational Intelligence (WCCI), 30 Jun - 5 Jul 2024
Subjects: Neural and Evolutionary Computing (cs.NE)

Cognitive map learners (CML) are a collection of separate yet collaboratively trained single-layer artificial neural networks (matrices), which navigate an abstract graph by learning internal representations of the node states, edge actions, and edge action availabilities. A consequence of this atypical segregation of information is that the CML performs near-optimal path planning between any two graph node states. However, the CML does not learn when or why to transition from one node to another. This work created CMLs with node states expressed as high dimensional vectors consistent with hyperdimensional computing (HDC), a form of symbolic machine learning (ML). This work evaluated HDC-based CMLs as ML modules, capable of receiving external inputs and computing output responses which are semantically meaningful for other HDC-based modules. Several CMLs were prepared independently then repurposed to solve the Tower of Hanoi puzzle without retraining these CMLs and without explicit reference to their respective graph topologies. This work suggests a template for building levels of biologically plausible cognitive abstraction and orchestration.

[45]  arXiv:2404.19052 [pdf, other]
Title: Exploring Weighted Property Approaches for RDF Graph Similarity Measure
Subjects: Databases (cs.DB); Information Retrieval (cs.IR)

Measuring similarity between RDF graphs is essential for various applications, including knowledge discovery, semantic web analysis, and recommender systems. However, traditional similarity measures often treat all properties equally, potentially overlooking the varying importance of different properties in different contexts. Consequently, exploring weighted property approaches for RDF graph similarity measure presents an intriguing avenue for investigation. Therefore, in this paper, we propose a weighted property approach for RDF graph similarity measure to address this limitation. Our approach incorporates the relative importance of properties into the similarity calculation, enabling a more nuanced and context-aware measures of similarity. We evaluate our approach through a comprehensive experimental study on an RDF graph dataset in the vehicle domain. Our results demonstrate that the proposed approach achieves promising accuracy and effectively reflects the perceived similarity between RDF graphs.

[46]  arXiv:2404.19055 [pdf, other]
Title: Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
Authors: Houjun Liu
Comments: 7 pages, 2 figures
Subjects: Computation and Language (cs.CL)

While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively.

[47]  arXiv:2404.19059 [pdf, other]
Title: Convergence and stability of randomized implicit two-stage Runge-Kutta schemes
Subjects: Numerical Analysis (math.NA)

We randomize the implicit two-stage Runge-Kutta scheme in order to improve the rate of convergence (with respect to a deterministic scheme) and stability of the approximate solution (with respect to the solution generated by the explicit scheme). For stability analysis, we use Dahlquist's concept of A-stability, adopted to randomized schemes by considering three notions of stability: asymptotic, mean-square, and in probability. The randomized implicit RK2 scheme proves to be A-stable asymptotically and in probability but not in the mean-square sense.

[48]  arXiv:2404.19060 [pdf, ps, other]
Title: Modular, Hierarchical Machine Learning for Sequential Goal Completion
Authors: Nathan McDonald
Comments: Accepted at SPIE Defense + Commercial Sensing, 21 - 25 Apr 2024
Subjects: Neural and Evolutionary Computing (cs.NE)

Given a maze populated with different objects, one may task a robot with a sequential goal completion task, e.g. 1) pick up a key then 2) unlock the door then 3) unlock the treasure chest. A typical machine learning (ML) solution would involve a monolithically trained artificial neural network (ANN). However, if the sequence of goals or the goals themselves change, then the ANN must be significantly (or, at worst, completely) retrained. Instead of a monolithic ANN, a modular ML component would be 1) independently optimizable (task-agnostic) and 2) arbitrarily reconfigurable with other ML modules. This work describes a modular, hierarchical ML framework by integrating two emerging ML techniques: 1) cognitive map learners (CML) and 2) hyperdimensional computing (HDC). A CML is a collection of three single layer ANNs (matrices) collaboratively trained to learn the topology of an abstract graph. Here, two CMLs were constructed, one describing locations on in 2D physical space and the other the relative distribution of objects found in this space. Each CML node states was encoded as a high-dimensional vector to utilize HDC, an ML algebra, for symbolic reasoning over these high-dimensional symbol vectors. In this way, each sub-goal above was described by algebraic equations of CML node states. Multiple, independently trained CMLs were subsequently assembled together to navigate a maze to solve a sequential goal task. Critically, changes to these goals required only localized changes in the CML-HDC architecture, as opposed to a global ANN retraining scheme. This framework therefore enabled a more traditional engineering approach to ML, akin to digital logic design.

[49]  arXiv:2404.19063 [pdf, other]
Title: SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications
Comments: 11 pages, 19 figures, and tables
Subjects: Computation and Language (cs.CL)

The SuperCLUE-Fin (SC-Fin) benchmark is a pioneering evaluation framework tailored for Chinese-native financial large language models (FLMs). It assesses FLMs across six financial application domains and twenty-five specialized tasks, encompassing theoretical knowledge and practical applications such as compliance, risk management, and investment analysis. Using multi-turn, open-ended conversations that mimic real-life scenarios, SC-Fin measures models on a range of criteria, including accurate financial understanding, logical reasoning, clarity, computational efficiency, business acumen, risk perception, and compliance with Chinese regulations.
In a rigorous evaluation involving over a thousand questions, SC-Fin identifies a performance hierarchy where domestic models like GLM-4 and MoonShot-v1-128k outperform others with an A-grade, highlighting the potential for further development in transforming theoretical knowledge into pragmatic financial solutions. This benchmark serves as a critical tool for refining FLMs in the Chinese context, directing improvements in financial knowledge databases, standardizing financial interpretations, and promoting models that prioritize compliance, risk management, and secure practices.
We create a contextually relevant and comprehensive benchmark that drives the development of AI in the Chinese financial sector. SC-Fin facilitates the advancement and responsible deployment of FLMs, offering valuable insights for enhancing model performance and usability for both individual and institutional users in the Chinese market..~\footnote{Our benchmark can be found at \url{https://www.CLUEbenchmarks.com}}.

[50]  arXiv:2404.19064 [pdf, other]
Title: Zero Knowledge Proof for Multiple Sequence Alignment
Subjects: Cryptography and Security (cs.CR)

Multiple sequence alignment (MSA) is a fundamental algorithm in bioinformatics. In a situation when the alignment might need to be protected while revealing the other information such the input sequences and the alignment score, zero knowledge proof can be used. In this paper, a validator checks the consistency between the input sequence and the alignment, and between the alignment and the alignment score. The validator is written in Circom language which will be compile into a circuit. Using a zero knowledge prove system called zkSNARK, a cryptographic proof is generates for the circuit and its input. This proof demonstrates that all inputs are consistent without revealing the actual alignment.

[51]  arXiv:2404.19065 [pdf, other]
Title: HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models
Comments: Videos and code this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans. In this technical report, we extend the capabilities of HELPER, by expanding its memory with a wider array of examples and prompts, and by integrating additional APIs for asking questions. This simple expansion of HELPER into a shared memory enables the agent to work across the domains of executing plans from dialogue, natural language instruction following, active question asking, and commonsense room reorganization. We evaluate the agent on four diverse interactive visual-language embodied agent benchmarks: ALFRED, TEACh, DialFRED, and the Tidy Task. HELPER-X achieves few-shot, state-of-the-art performance across these benchmarks using a single agent, without requiring in-domain training, and remains competitive with agents that have undergone in-domain training.

[52]  arXiv:2404.19066 [pdf, other]
Title: Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This research introduces an innovative method for Traffic Sign Recognition (TSR) by leveraging deep learning techniques, with a particular emphasis on Vision Transformers. TSR holds a vital role in advancing driver assistance systems and autonomous vehicles. Traditional TSR approaches, reliant on manual feature extraction, have proven to be labor-intensive and costly. Moreover, methods based on shape and color have inherent limitations, including susceptibility to various factors and changes in lighting conditions. This study explores three variants of Vision Transformers (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models. To address the shortcomings of traditional methods, a novel pyramid EATFormer backbone is proposed, amalgamating Evolutionary Algorithms (EAs) with the Transformer architecture. The introduced EA-based Transformer block captures multi-scale, interactive, and individual information through its components: Feed-Forward Network, Global and Local Interaction, and Multi-Scale Region Aggregation modules. Furthermore, a Modulated Deformable MSA module is introduced to dynamically model irregular locations. Experimental evaluations on the GTSRB and BelgiumTS datasets demonstrate the efficacy of the proposed approach in enhancing both prediction speed and accuracy. This study concludes that Vision Transformers hold significant promise in traffic sign classification and contributes a fresh algorithmic framework for TSR. These findings set the stage for the development of precise and dependable TSR algorithms, benefiting driver assistance systems and autonomous vehicles.

[53]  arXiv:2404.19070 [pdf, other]
Title: Reinforcement Learning Driven Cooperative Ball Balance in Rigidly Coupled Drones
Subjects: Robotics (cs.RO)

Multi-drone cooperative transport (CT) problem has been widely studied in the literature. However, limited work exists on control of such systems in the presence of time-varying uncertainties, such as the time-varying center of gravity (CG). This paper presents a leader-follower approach for the control of a multi-drone CT system with time-varying CG. The leader uses a traditional Proportional-Integral-Derivative (PID) controller, and in contrast, the follower uses a deep reinforcement learning (RL) controller using only local information and minimal leader information. Extensive simulation results are presented, showing the effectiveness of the proposed method over a previously developed adaptive controller and for variations in the mass of the objects being transported and CG speeds. Preliminary experimental work also demonstrates ball balance (depicting moving CG) on a stick/rod lifted by two Crazyflie drones cooperatively.

[54]  arXiv:2404.19071 [pdf, other]
Title: Blind Spots and Biases: Exploring the Role of Annotator Cognitive Biases in NLP
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

With the rapid proliferation of artificial intelligence, there is growing concern over its potential to exacerbate existing biases and societal disparities and introduce novel ones. This issue has prompted widespread attention from academia, policymakers, industry, and civil society. While evidence suggests that integrating human perspectives can mitigate bias-related issues in AI systems, it also introduces challenges associated with cognitive biases inherent in human decision-making. Our research focuses on reviewing existing methodologies and ongoing investigations aimed at understanding annotation attributes that contribute to bias.

[55]  arXiv:2404.19076 [pdf, ps, other]
Title: Who Followed the Blueprint? Analyzing the Responses of U.S. Federal Agencies to the Blueprint for an AI Bill of Rights
Comments: 8 pages
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

This study examines the extent to which U.S. federal agencies responded to and implemented the principles outlined in the White House's October 2022 "Blueprint for an AI Bill of Rights." The Blueprint provided a framework for the ethical governance of artificial intelligence systems, organized around five core principles: safety and effectiveness, protection against algorithmic discrimination, data privacy, notice and explanation about AI systems, and human alternatives and fallback.
Through an analysis of publicly available records across 15 federal departments, the authors found limited evidence that the Blueprint directly influenced agency actions after its release. Only five departments explicitly mentioned the Blueprint, while 12 took steps aligned with one or more of its principles. However, much of this work appeared to have precedents predating the Blueprint or motivations disconnected from it, such as compliance with prior executive orders on trustworthy AI. Departments' activities often emphasized priorities like safety, accountability and transparency that overlapped with Blueprint principles, but did not necessarily stem from it.
The authors conclude that the non-binding Blueprint seems to have had minimal impact on shaping the U.S. government's approach to ethical AI governance in its first year. Factors like public concerns after high-profile AI releases and obligations to follow direct executive orders likely carried more influence over federal agencies. More rigorous study would be needed to definitively assess the Blueprint's effects within the federal bureaucracy and broader society.

[56]  arXiv:2404.19077 [pdf, other]
Title: Replicating Human Anatomy with Vision Controlled Jetting -- A Pneumatic Musculoskeletal Hand and Forearm
Authors: Thomas Buchner (1), Stefan Weirich (1), Alexander M. Kübler (1), Wojciech Matusik (2,3), Robert K. Katzschmann (1) ((1) ETH Zurich, Switzerland, (2) Inkbit, USA, (3) CSAIL, MIT, USA)
Subjects: Robotics (cs.RO)

The functional replication and actuation of complex structures inspired by nature is a longstanding goal for humanity. Creating such complex structures combining soft and rigid features and actuating them with artificial muscles would further our understanding of natural kinematic structures. We printed a biomimetic hand in a single print process comprised of a rigid skeleton, soft joint capsules, tendons, and printed touch sensors. We showed it's actuation using electric motors. In this work, we expand on this work by adding a forearm that is also closely modeled after the human anatomy and replacing the hand's motors with 22 independently controlled pneumatic artificial muscles (PAMs). Our thin, high-strain (up to 30.1%) PAMs match the performance of state-of-the-art artificial muscles at a lower cost. The system showcases human-like dexterity with independent finger movements, demonstrating successful grasping of various objects, ranging from a small, lightweight coin to a large can of 272g in weight. The performance evaluation, based on fingertip and grasping forces along with finger joint range of motion, highlights the system's potential.

[57]  arXiv:2404.19081 [pdf, ps, other]
Title: $(Δ+ 1)$ Vertex Coloring in $O(n)$ Communication
Comments: 16 pages, 1 figure; full version of paper accepted to PODC '24
Subjects: Data Structures and Algorithms (cs.DS)

We study the communication complexity of $(\Delta + 1)$ vertex coloring, where the edges of an $n$-vertex graph of maximum degree $\Delta$ are partitioned between two players. We provide a randomized protocol which uses $O(n)$ bits of communication and ends with both players knowing the coloring. Combining this with a folklore $\Omega(n)$ lower bound, this settles the randomized communication complexity of $(\Delta + 1)$-coloring up to constant factors.

[58]  arXiv:2404.19087 [pdf, other]
Title: Deep Reinforcement Learning for Advanced Longitudinal Control and Collision Avoidance in High-Risk Driving Scenarios
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Existing Advanced Driver Assistance Systems primarily focus on the vehicle directly ahead, often overlooking potential risks from following vehicles. This oversight can lead to ineffective handling of high risk situations, such as high speed, closely spaced, multi vehicle scenarios where emergency braking by one vehicle might trigger a pile up collision. To overcome these limitations, this study introduces a novel deep reinforcement learning based algorithm for longitudinal control and collision avoidance. This proposed algorithm effectively considers the behavior of both leading and following vehicles. Its implementation in simulated high risk scenarios, which involve emergency braking in dense traffic where traditional systems typically fail, has demonstrated the algorithm ability to prevent potential pile up collisions, including those involving heavy duty vehicles.

[59]  arXiv:2404.19090 [pdf, ps, other]
Title: Transmit Power Optimization for Integrated Sensing and Backscatter Communication
Comments: Submitted to an IEEE Transactions Journal
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Ambient Internet of Things networks use low-cost, low-power backscatter tags in various industry applications. By exploiting those tags, we introduce the integrated sensing and backscatter communication (ISABC) system, featuring multiple backscatter tags, a user (reader), and a full-duplex base station (BS) that integrates sensing and (backscatter) communications. The BS undertakes dual roles of detecting backscatter tags and communicating with the user, leveraging the same temporal and frequency resources. The tag-reflected BS signals offer data to the user and enable the BS to sense the environment simultaneously. We derive both user and tag communication rates and the sensing rate of the BS. We jointly optimize the transmit/received beamformers and tag reflection coefficients to minimize the total BS power. To solve this problem, we employ the alternating optimization technique. We offer a closed-form solution for the received beamformers while utilizing semi-definite relaxation and slack-optimization for transmit beamformers and power reflection coefficients, respectively. For example, with ten transmit/reception antennas at the BS, ISABC delivers a 75% sum communication and sensing rates gain over a traditional backscatter while requiring a 3.4% increase in transmit power. Furthermore, ISABC with active tags only requires a 0.24% increase in transmit power over conventional integrated sensing and communication.

[60]  arXiv:2404.19093 [pdf, other]
Title: Large Language Models as Conversational Movie Recommenders: A User Study
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

This paper explores the effectiveness of using large language models (LLMs) for personalized movie recommendations from users' perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active users, we find that LLMs offer strong recommendation explainability but lack overall personalization, diversity, and user trust. Our results also indicate that different personalized prompting techniques do not significantly affect user-perceived recommendation quality, but the number of movies a user has watched plays a more significant role. Furthermore, LLMs show a greater ability to recommend lesser-known or niche movies. Through qualitative analysis, we identify key conversational patterns linked to positive and negative user interaction experiences and conclude that providing personal context and examples is crucial for obtaining high-quality recommendations from LLMs.

[61]  arXiv:2404.19094 [pdf, other]
Title: In-Context Symbolic Regression: Leveraging Language Models for Function Discovery
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Symbolic Regression (SR) is a task which aims to extract the mathematical expression underlying a set of empirical observations. Transformer-based methods trained on SR datasets detain the current state-of-the-art in this task, while the application of Large Language Models (LLMs) to SR remains unexplored. This work investigates the integration of pre-trained LLMs into the SR pipeline, utilizing an approach that iteratively refines a functional form based on the prediction error it achieves on the observation set, until it reaches convergence. Our method leverages LLMs to propose an initial set of possible functions based on the observations, exploiting their strong pre-training prior. These functions are then iteratively refined by the model itself and by an external optimizer for their coefficients. The process is repeated until the results are satisfactory. We then analyze Vision-Language Models in this context, exploring the inclusion of plots as visual inputs to aid the optimization process. Our findings reveal that LLMs are able to successfully recover good symbolic equations that fit the given data, outperforming SR baselines based on Genetic Programming, with the addition of images in the input showing promising results for the most complex benchmarks.

[62]  arXiv:2404.19095 [pdf, ps, other]
Title: Catalyzing Social Interactions in Mixed Reality using ML Recommendation Systems
Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

We create an innovative mixed reality-first social recommendation model, utilizing features uniquely collected through mixed reality (MR) systems to promote social interaction, such as gaze recognition, proximity, noise level, congestion level, and conversational intensity. We further extend these models to include right-time features to deliver timely notifications. We measure performance metrics across various models by creating a new intersection of user features, MR features, and right-time features. We create four model types trained on different combinations of the feature classes, where we compare the baseline model trained on the class of user features against the models trained on MR features, right-time features, and a combination of all of the feature classes. Due to limitations in data collection and cost, we observe performance degradation in the right-time, mixed reality, and combination models. Despite these challenges, we introduce optimizations to improve accuracy across all models by over 14 percentage points, where the best performing model achieved 24% greater accuracy.

[63]  arXiv:2404.19096 [pdf, other]
Title: Data-Driven Min-Max MPC for Linear Systems: Robustness and Adaptation
Comments: arXiv admin note: text overlap with arXiv:2309.17307
Subjects: Systems and Control (eess.SY)

Data-driven controllers design is an important research problem, in particular when data is corrupted by the noise. In this paper, we propose a data-driven min-max model predictive control (MPC) scheme using noisy input-state data for unknown linear time-invariant (LTI) system. The unknown system matrices are characterized by a set-membership representation using the noisy input-state data. Leveraging this representation, we derive an upper bound on the worst-case cost and determine the corresponding optimal state-feedback control law through a semidefinite program (SDP). We prove that the resulting closed-loop system is robustly stabilized and satisfies the input and state constraints. Further, we propose an adaptive data-driven min-max MPC scheme which exploits additional online input-state data to improve closed-loop performance. Numerical examples show the effectiveness of the proposed methods.

[64]  arXiv:2404.19097 [pdf, other]
Title: Exploring the Capability of LLMs in Performing Low-Level Visual Analytic Tasks on SVG Data Visualizations
Subjects: Human-Computer Interaction (cs.HC)

Data visualizations help extract insights from datasets, but reaching these insights requires decomposing high level goals into low-level analytic tasks that can be complex due to varying data literacy and experience. Recent advancements in large language models (LLMs) have shown promise for lowering barriers for users to achieve tasks such as writing code. Scalable Vector Graphics (SVG), a text-based image format common in data visualizations, matches well with the text sequence processing of transformer-based LLMs. In this paper, we explore the capability of LLMs to perform low-level visual analytic tasks defined by Amar, Eagan, and Stasko directly on SVG-based visualizations. Using zero-shot prompts, we instruct the models to provide responses or modify the SVG code based on given visualizations. Our findings demonstrate that LLMs can effectively modify existing SVG visualizations for specific tasks like Cluster but perform poorly on tasks requiring a sequence of math operations. We also discovered that LLM performance varies based on factors such as the number of data points, the presence of value labels, and the chart type. Our findings contribute to gauging the general capabilities of LLMs and highlight the need for further exploration and development to fully harness their potential in supporting visual analytic tasks.

[65]  arXiv:2404.19100 [pdf, other]
Title: Predicting Fairness of ML Software Configuration
Comments: To Appear in the 20th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE'24)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts?
We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperforms the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.

[66]  arXiv:2404.19104 [pdf, other]
Title: An Oracle with no $\mathrm{UP}$-Complete Sets, but $\mathrm{NP}=\mathrm{PSPACE}$
Subjects: Computational Complexity (cs.CC)

We construct an oracle relative to which $\mathrm{NP} = \mathrm{PSPACE}$, but $\mathrm{UP}$ has no many-one complete sets. This combines the properties of an oracle by Hartmanis and Hemachandra [HH88] and one by Ogiwara and Hemachandra [OH93].
The oracle provides new separations of classical conjectures on optimal proof systems and complete sets in promise classes. This answers several questions by Pudl\'ak [Pud17], e.g., the implications $\mathsf{UP} \Longrightarrow \mathsf{CON}^{\mathsf{N}}$ and $\mathsf{SAT} \Longrightarrow \mathsf{TFNP}$ are false relative to our oracle.
Moreover, the oracle demonstrates that, in principle, it is possible that $\mathrm{TFNP}$-complete problems exist, while at the same time $\mathrm{SAT}$ has no p-optimal proof systems.

[67]  arXiv:2404.19108 [pdf, other]
Title: Real-Time Convolutional Neural Network-Based Star Detection and Centroiding Method for CubeSat Star Tracker
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Image and Video Processing (eess.IV)

Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for centroid computation. However, challenges like high sensor noise and stray light can compromise algorithm performance. This article introduces a Convolutional Neural Network (CNN)-based approach for star detection and centroiding, tailored to address the issues posed by noisy star tracker images in the presence of stray light and other artifacts. Trained using simulated star images overlayed with real sensor noise and stray light, the CNN produces both a binary segmentation map distinguishing star pixels from the background and a distance map indicating each pixel's proximity to the nearest star centroid. Leveraging this distance information alongside pixel coordinates transforms centroid calculations into a set of trilateration problems solvable via the least squares method. Our method employs efficient UNet variants for the underlying CNN architectures, and the variants' performances are evaluated. Comprehensive testing has been undertaken with synthetic image evaluations, hardware-in-the-loop assessments, and night sky tests. The tests consistently demonstrated that our method outperforms several existing algorithms in centroiding accuracy and exhibits superior resilience to high sensor noise and stray light interference. An additional benefit of our algorithms is that they can be executed in real-time on low-power edge AI processors.

[68]  arXiv:2404.19109 [pdf, other]
Title: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset
Subjects: Machine Learning (cs.LG); General Finance (q-fin.GN)

Subgraph representation learning is a technique for analyzing local structures (or shapes) within complex networks. Enabled by recent developments in scalable Graph Neural Networks (GNNs), this approach encodes relational information at a subgroup level (multiple connected nodes) rather than at a node level of abstraction. We posit that certain domain applications, such as anti-money laundering (AML), are inherently subgraph problems and mainstream graph techniques have been operating at a suboptimal level of abstraction. This is due in part to the scarcity of annotated datasets of real-world size and complexity, as well as the lack of software tools for managing subgraph GNN workflows at scale. To enable work in fundamental algorithms as well as domain applications in AML and beyond, we introduce Elliptic2, a large graph dataset containing 122K labeled subgraphs of Bitcoin clusters within a background graph consisting of 49M node clusters and 196M edge transactions. The dataset provides subgraphs known to be linked to illicit activity for learning the set of "shapes" that money laundering exhibits in cryptocurrency and accurately classifying new criminal activity. Along with the dataset we share our graph techniques, software tooling, promising early experimental results, and new domain insights already gleaned from this approach. Taken together, we find immediate practical value in this approach and the potential for a new standard in anti-money laundering and forensic analytics in cryptocurrencies and other financial networks.

[69]  arXiv:2404.19110 [pdf, other]
Title: EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Head avatars animated by visual signals have gained popularity, particularly in cross-driving synthesis where the driver differs from the animated character, a challenging but highly practical approach. The recently presented MegaPortraits model has demonstrated state-of-the-art results in this domain. We conduct a deep examination and evaluation of this model, with a particular focus on its latent space for facial expression descriptors, and uncover several limitations with its ability to express intense face motions. To address these limitations, we propose substantial changes in both training pipeline and model architecture, to introduce our EMOPortraits model, where we:
Enhance the model's capability to faithfully support intense, asymmetric face expressions, setting a new state-of-the-art result in the emotion transfer task, surpassing previous methods in both metrics and quality.
Incorporate speech-driven mode to our model, achieving top-tier performance in audio-driven facial animation, making it possible to drive source identity through diverse modalities, including visual signal, audio, or a blend of both.
We propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions, filling the gap with absence of such data in existing datasets.

[70]  arXiv:2404.19112 [pdf, other]
Title: Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization
Authors: Aditya Biswas
Comments: 8 pages body, 2 tables, 1 figure, 3 appendices
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We present PSiLON Net, an MLP architecture that uses $L_1$ weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers, we conduct experiments in the small data regime using overparameterized PSiLON Nets and PSiLON ResNets, demonstrating reliable optimization and strong performance.

[71]  arXiv:2404.19113 [pdf, other]
Title: Source-Free Domain Adaptation of Weakly-Supervised Object Localization Models for Histology
Comments: 16 pages, 21 figures, 5 tables, CVPRw 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Given the emergence of deep learning, digital pathology has gained popularity for cancer diagnosis based on histology images. Deep weakly supervised object localization (WSOL) models can be trained to classify histology images according to cancer grade and identify regions of interest (ROIs) for interpretation, using inexpensive global image-class annotations. A WSOL model initially trained on some labeled source image data can be adapted using unlabeled target data in cases of significant domain shifts caused by variations in staining, scanners, and cancer type. In this paper, we focus on source-free (unsupervised) domain adaptation (SFDA), a challenging problem where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons. SFDA of WSOL models raises several challenges in histology, most notably because they are not intended to adapt for both classification and localization tasks. In this paper, 4 state-of-the-art SFDA methods, each one representative of a main SFDA family, are compared for WSOL in terms of classification and localization accuracy. They are the SFDA-Distribution Estimation, Source HypOthesis Transfer, Cross-Domain Contrastive Learning, and Adaptively Domain Statistics Alignment. Experimental results on the challenging Glas (smaller, breast cancer) and Camelyon16 (larger, colon cancer) histology datasets indicate that these SFDA methods typically perform poorly for localization after adaptation when optimized for classification.

[72]  arXiv:2404.19114 [pdf, other]
Title: Enhancing IoT Security: A Novel Feature Engineering Approach for ML-Based Intrusion Detection Systems
Comments: This paper has been accepted by DCOSS-IoT 2024
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The integration of Internet of Things (IoT) applications in our daily lives has led to a surge in data traffic, posing significant security challenges. IoT applications using cloud and edge computing are at higher risk of cyberattacks because of the expanded attack surface from distributed edge and cloud services, the vulnerability of IoT devices, and challenges in managing security across interconnected systems leading to oversights. This led to the rise of ML-based solutions for intrusion detection systems (IDSs), which have proven effective in enhancing network security and defending against diverse threats. However, ML-based IDS in IoT systems encounters challenges, particularly from noisy, redundant, and irrelevant features in varied IoT datasets, potentially impacting its performance. Therefore, reducing such features becomes crucial to enhance system performance and minimize computational costs. This paper focuses on improving the effectiveness of ML-based IDS at the edge level by introducing a novel method to find a balanced trade-off between cost and accuracy through the creation of informative features in a two-tier edge-user IoT environment. A hybrid Binary Quantum-inspired Artificial Bee Colony and Genetic Programming algorithm is utilized for this purpose. Three IoT intrusion detection datasets, namely NSL-KDD, UNSW-NB15, and BoT-IoT, are used for the evaluation of the proposed approach.

[73]  arXiv:2404.19115 [pdf, other]
Title: Sparsity-promoting hierarchical Bayesian model for EIT with a blocky target
Subjects: Numerical Analysis (math.NA)

The electrical impedance tomography (EIT) problem of estimating the unknown conductivity distribution inside a domain from boundary current or voltage measurements requires the solution of a nonlinear inverse problem. Sparsity promoting hierarchical Bayesian models have been shown to be very effective in the recovery of almost piecewise constant solutions in linear inverse problems. We demonstrate that by exploiting linear algebraic considerations it is possible to organize the calculation for the Bayesian solution of the nonlinear EIT inverse problem via finite element methods with sparsity promoting priors in a computationally efficient manner. The proposed approach uses the Iterative Alternating Sequential (IAS) algorithm for the solution of the linearized problems. Within the IAS algorithm, a substantial reduction in computational complexity is attained by exploiting the low dimensionality of the data space and an adjoint formulation of the Tikhonov regularized solution that constitutes part of the iterative updating scheme. Numerical tests illustrate the computational efficiency of the proposed algorithm. The paper sheds light also on the convexity properties of the objective function of the maximum a posteriori (MAP) estimation problem.

[74]  arXiv:2404.19117 [pdf, other]
Title: Coexistence of eMBB+ and mMTC+ in Uplink Cell-Free Massive MIMO Networks
Comments: This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper tackles the problem of designing proper uplink multiple access (MA) schemes for coexistence between enhanced mobile broadband+ (eMBB+) users and massive machine-type communications+ (mMTC+) devices in a terminal-centric cell-free massive MIMO system. Specifically, the use of a time-frequency spreading technique for the mMTC+ devices has been proposed. Coupled with the assumption of imperfect channel knowledge, closed-form bounds of the achievable (ergodic) rate for the two types of data services are derived. Using suitable power control mechanisms, we show it is possible to efficiently multiplex eMBB+ and mMTC+ traffic in the same time-frequency resource grid. Numerical experiments reveal interesting trade-offs in the selection of the spreading gain and the number of serving access points within the system. Results also demonstrate that the performance of the mMTC+ devices is slightly affected by the presence of the eMBB+ users. Overall, our approach can endow good quality of service to both 6G cornerstones at once.

[75]  arXiv:2404.19119 [pdf, ps, other]
Title: Effects of Added Emphasis and Pause in Audio Delivery of Health Information
Authors: Arif Ahmed (1), Gondy Leroy (1), Stephen A. Rains (1), Philip Harber (1), David Kauchak (2), Prosanta Barai (1) ((1) The University of Arizona, (2) Pomona College)
Comments: This manuscript is accepted to American Medical Informatics Association summit, 2024
Subjects: Computation and Language (cs.CL)

Health literacy is crucial to supporting good health and is a major national goal. Audio delivery of information is becoming more popular for informing oneself. In this study, we evaluate the effect of audio enhancements in the form of information emphasis and pauses with health texts of varying difficulty and we measure health information comprehension and retention. We produced audio snippets from difficult and easy text and conducted the study on Amazon Mechanical Turk (AMT). Our findings suggest that emphasis matters for both information comprehension and retention. When there is no added pause, emphasizing significant information can lower the perceived difficulty for difficult and easy texts. Comprehension is higher (54%) with correctly placed emphasis for the difficult texts compared to not adding emphasis (50%). Adding a pause lowers perceived difficulty and can improve retention but adversely affects information comprehension.

[76]  arXiv:2404.19121 [pdf, ps, other]
Title: Characterising Payload Entropy in Packet Flows
Comments: 14 pages, 8 figures
Subjects: Cryptography and Security (cs.CR)

Accurate and timely detection of cyber threats is critical to keeping our online economy and data safe. A key technique in early detection is the classification of unusual patterns of network behaviour, often hidden as low-frequency events within complex time-series packet flows. One of the ways in which such anomalies can be detected is to analyse the information entropy of the payload within individual packets, since changes in entropy can often indicate suspicious activity - such as whether session encryption has been compromised, or whether a plaintext channel has been co-opted as a covert channel. To decide whether activity is anomalous we need to compare real-time entropy values with baseline values, and while the analysis of entropy in packet data is not particularly new, to the best of our knowledge there are no published baselines for payload entropy across common network services. We offer two contributions: 1) We analyse several large packet datasets to establish baseline payload information entropy values for common network services, 2) We describe an efficient method for engineering entropy metrics when performing flow recovery from live or offline packet data, which can be expressed within feature subsets for subsequent analysis and machine learning applications.

[77]  arXiv:2404.19124 [pdf, other]
Title: Accelerating Production LLMs with Combined Token/Embedding Speculators
Subjects: Computation and Language (cs.CL)

This technical report describes the design and training of novel speculative decoding draft models, for accelerating the inference speeds of large language models in a production environment. By conditioning draft predictions on both context vectors and sampled tokens, we can train our speculators to efficiently predict high-quality n-grams, which the base model then accepts or rejects. This allows us to effectively predict multiple tokens per inference forward pass, accelerating wall-clock inference speeds of highly optimized base model implementations by a factor of 2-3x. We explore these initial results and describe next steps for further improvements.

[78]  arXiv:2404.19126 [pdf, other]
Title: Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks
Comments: 9 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

We propose a system for visual scene analysis and recognition based on encoding the sparse, latent feature-representation of an image into a high-dimensional vector that is subsequently factorized to parse scene content. The sparse feature representation is learned from image statistics via convolutional sparse coding, while scene parsing is performed by a resonator network. The integration of sparse coding with the resonator network increases the capacity of distributed representations and reduces collisions in the combinatorial search space during factorization. We find that for this problem the resonator network is capable of fast and accurate vector factorization, and we develop a confidence-based metric that assists in tracking the convergence of the resonator network.

[79]  arXiv:2404.19128 [pdf, other]
Title: Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Comments: Accepted to CVPR 2024, Second Workshop on Foundation Models (WFM)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Vision and Language Models (VLMs) continue to demonstrate remarkable zero-shot (ZS) performance across various tasks. However, many probing studies have revealed that even the best-performing VLMs struggle to capture aspects of compositional scene understanding, lacking the ability to properly ground and localize linguistic phrases in images. Recent VLM advancements include scaling up both model and dataset sizes, additional training objectives and levels of supervision, and variations in the model architectures. To characterize the grounding ability of VLMs, such as phrase grounding, referring expressions comprehension, and relationship understanding, Pointing Game has been used as an evaluation metric for datasets with bounding box annotations. In this paper, we introduce a novel suite of quantitative metrics that utilize GradCAM activations to rigorously evaluate the grounding capabilities of pre-trained VLMs like CLIP, BLIP, and ALBEF. These metrics offer an explainable and quantifiable approach for a more detailed comparison of the zero-shot capabilities of VLMs and enable measuring models' grounding uncertainty. This characterization reveals interesting tradeoffs between the size of the model, the dataset size, and their performance.

[80]  arXiv:2404.19130 [pdf, other]
Title: SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval
Comments: Accepted by SIGIR 2024, Camera Ready Version
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Knowledge graphs (KGs), which store an extensive number of relational facts (head, relation, tail), serve various applications. While many downstream tasks highly rely on the expressive modeling and predictive embedding of KGs, most of the current KG representation learning methods, where each entity is embedded as a vector in the Euclidean space and each relation is embedded as a transformation, follow an entity ranking protocol. On one hand, such an embedding design cannot capture many-to-many relations. On the other hand, in many retrieval cases, the users wish to get an exact set of answers without any ranking, especially when the results are expected to be precise, e.g., which genes cause an illness. Such scenarios are commonly referred to as "set retrieval". This work presents a pioneering study on the KG set retrieval problem. We show that the set retrieval highly depends on expressive modeling of many-to-many relations, and propose a new KG embedding model SpherE to address this problem. SpherE is based on rotational embedding methods, but each entity is embedded as a sphere instead of a vector. While inheriting the high interpretability of rotational-based models, our SpherE can more expressively model one-to-many, many-to-one, and many-to-many relations. Through extensive experiments, we show that our SpherE can well address the set retrieval problem while still having a good predictive ability to infer missing facts. The code is available at https://github.com/Violet24K/SpherE.

[81]  arXiv:2404.19132 [pdf, other]
Title: Integrating Present and Past in Unsupervised Continual Learning
Comments: CoLLAs 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We formulate a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that are specific to the present and the past data, encompassing stability, plasticity, and cross-task consolidation. The framework reveals that many existing UCL approaches overlook cross-task consolidation and try to balance plasticity and stability in a shared embedding space. This results in worse performance due to a lack of within-task data diversity and reduced effectiveness in learning the current task. Our method, Osiris, which explicitly optimizes all three objectives on separate embedding spaces, achieves state-of-the-art performance on all benchmarks, including two novel benchmarks proposed in this paper featuring semantically structured task sequences. Compared to standard benchmarks, these two structured benchmarks more closely resemble visual signals received by humans and animals when navigating real-world environments. Finally, we show some preliminary evidence that continual models can benefit from such realistic learning scenarios.

[82]  arXiv:2404.19133 [pdf, other]
Title: Parameterized Wasserstein Gradient Flow
Subjects: Numerical Analysis (math.NA)

We develop a fast and scalable numerical approach to solve Wasserstein gradient flows (WGFs), particularly suitable for high-dimensional cases. Our approach is to use general reduced-order models, like deep neural networks, to parameterize the push-forward maps such that they can push a simple reference density to the one solving the given WGF. The new dynamical system is called parameterized WGF (PWGF), and it is defined on the finite-dimensional parameter space equipped with a pullback Wasserstein metric. Our numerical scheme can approximate the solutions of WGFs for general energy functionals effectively, without requiring spatial discretization or nonconvex optimization procedures, thus avoiding some limitations of classical numerical methods and more recent deep-learning-based approaches. A comprehensive analysis of the approximation errors measured by Wasserstein distance is also provided in this work. Numerical experiments show promising computational efficiency and verified accuracy on various WGF examples using our approach.

[83]  arXiv:2404.19134 [pdf, other]
Title: Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate the fundamental challenges of evaluating clustering methods for non-categorical data. Based on these challenges, we propose a novel and viable ensemble-based clustering comparison approach. This work is the first to directly target the underexplored area of deep clustering algorithms for 3D shapes, and we believe it will be an important building block to analyze and utilize the massive 3D shape collections that are starting to appear in deep geometric computing.

[84]  arXiv:2404.19136 [pdf, ps, other]
Title: On Rational Recursion for Holonomic Sequences
Comments: 9 pages
Subjects: Symbolic Computation (cs.SC); Formal Languages and Automata Theory (cs.FL); Commutative Algebra (math.AC); Dynamical Systems (math.DS)

It was recently conjectured that every component of a discrete rational dynamical system is a solution to an algebraic difference equation that is linear in its highest-shift term (a quasi-linear equation). Holonomic sequences are trivially seen as solutions to such dynamical systems. We prove that the conjecture holds for holonomic sequences and propose two algorithms for converting holonomic recurrence equations into such quasi-linear equations. The two algorithms differ in their efficiency and the minimality of orders in their outputs.

[85]  arXiv:2404.19138 [pdf, other]
Title: Multi-Source Encapsulation With Guaranteed Convergence Using Minimalist Robots
Subjects: Robotics (cs.RO)

We present a decentralized control algorithm for a minimalist robotic swarm lacking memory, explicit communication, or relative position information, to encapsulate multiple diffusive target sources in a bounded environment. The state-of-the-art approaches generally require either local communication or relative localization to provide guarantees of convergence and safety. We quantify trade-offs between task, control, and robot parameters for guaranteed safe convergence to all the sources. Furthermore, our algorithm is robust to occlusions and noise in the sensor measurements as we demonstrate in simulation.

[86]  arXiv:2404.19139 [pdf, other]
Title: HMTRace: Hardware-Assisted Memory-Tagging based Dynamic Data Race Detection
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Data race, a category of insidious software concurrency bugs, is often challenging and resource-intensive to detect and debug. Existing dynamic race detection tools incur significant execution time and memory overhead while exhibiting high false positives. This paper proposes HMTRace, a novel Armv8.5-A memory tag extension (MTE) based dynamic data race detection framework, emphasizing low compute and memory requirements while maintaining high accuracy and precision. HMTRace supports race detection in userspace OpenMP- and Pthread-based multi-threaded C applications. HMTRace showcases a combined f1-score of 0.86 while incurring a mean execution time overhead of 4.01% and peak memory (RSS) overhead of 54.31%. HMTRace also does not report false positives, asserting all reported races.

[87]  arXiv:2404.19141 [pdf, other]
Title: Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder for Map-Constrained Trajectory Recovery
Comments: This paper has been accepted as a regular paper at IEEE TKDE
Subjects: Machine Learning (cs.LG)

Recovering intermediate missing GPS points in a sparse trajectory, while adhering to the constraints of the road network, could offer deep insights into users' moving behaviors in intelligent transportation systems. Although recent studies have demonstrated the advantages of achieving map-constrained trajectory recovery via an end-to-end manner, they still face two significant challenges. Firstly, existing methods are mostly sequence-based models. It is extremely hard for them to comprehensively capture the micro-semantics of individual trajectory, including the information of each GPS point and the movement between two GPS points. Secondly, existing approaches ignore the impact of the macro-semantics, i.e., the road conditions and the people's shared travel preferences reflected by a group of trajectories. To address the above challenges, we propose a Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder (MM-STGED). Specifically, we model each trajectory as a graph to efficiently describe the micro-semantics of trajectory and design a novel message-passing mechanism to learn trajectory representations. Additionally, we extract the macro-semantics of trajectories and further incorporate them into a well-designed graph-based decoder to guide trajectory recovery. Extensive experiments conducted on sparse trajectories with three different sampling intervals that are respectively constructed from two real-world trajectory datasets demonstrate the superiority of our proposed model.

[88]  arXiv:2404.19143 [pdf, other]
Title: Workload Intelligence: Punching Holes Through the Cloud Abstraction
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Today, cloud workloads are essentially opaque to the cloud platform. Typically, the only information the platform receives is the virtual machine (VM) type and possibly a decoration to the type (e.g., the VM is evictable). Similarly, workloads receive little to no information from the platform; generally, workloads might receive telemetry from their VMs or exceptional signals (e.g., shortly before a VM is evicted). The narrow interface between workloads and platforms has several drawbacks: (1) a surge in VM types and decorations in public cloud platforms complicates customer selection; (2) essential workload characteristics (e.g., low availability requirements, high latency tolerance) are often unspecified, hindering platform customization for optimized resource usage and cost savings; and (3) workloads may be unaware of potential optimizations or lack sufficient time to react to platform events.
In this paper, we propose a framework, called Workload Intelligence (WI), for dynamic bi-directional communication between cloud workloads and cloud platform. Via WI, workloads can programmatically adjust their key characteristics, requirements, and even dynamically adapt behaviors like VM priorities. In the other direction, WI allows the platform to programmatically inform workloads about upcoming events, opportunities for optimization, among other scenarios. Because of WI, the cloud platform can drastically simplify its offerings, reduce its costs without fear of violating any workload requirements, and reduce prices to its customers on average by 48.8%.

[89]  arXiv:2404.19146 [pdf, other]
Title: Automated Construction of Theme-specific Knowledge Graphs
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g., specialized scientific research) and rapidly evolving contexts (e.g., breaking news or disaster tracking). To tackle such challenges, we propose a theme-specific knowledge graph (i.e., ThemeKG), a KG constructed from a theme-specific corpus, and design an unsupervised framework for ThemeKG construction (named TKGCon). The framework takes raw theme-specific corpus and generates a high-quality KG that includes salient entities and relations under the theme. Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by Large Language Models (LLMs) to construct a relation ontology. To parse the documents from the theme corpus, we first map the extracted entity pairs to the ontology and retrieve the candidate relations. Finally, we incorporate the context and ontology to consolidate the relations for entity pairs. We observe that directly prompting GPT-4 for theme-specific KG leads to inaccurate entities (such as "two main types" as one entity in the query result) and unclear (such as "is", "has") or wrong relations (such as "have due to", "to start"). In contrast, by constructing the theme-specific KG step by step, our model outperforms GPT-4 and could consistently identify accurate entities and relations. Experimental results also show that our framework excels in evaluations compared with various KG construction baselines.

[90]  arXiv:2404.19148 [pdf, other]
Title: Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Effective communication is paramount for the inclusion of deaf individuals in society. However, persistent communication barriers due to limited Sign Language (SL) knowledge hinder their full participation. In this context, Sign Language Recognition (SLR) systems have been developed to improve communication between signing and non-signing individuals. In particular, there is the problem of recognizing isolated signs (Isolated Sign Language Recognition, ISLR) of great relevance in the development of vision-based SL search engines, learning tools, and translation systems. This work proposes an ISLR approach where body, hands, and facial landmarks are extracted throughout time and encoded as 2-D images. These images are processed by a convolutional neural network, which maps the visual-temporal information into a sign label. Experimental results demonstrate that our method surpassed the state-of-the-art in terms of performance metrics on two widely recognized datasets in Brazilian Sign Language (LIBRAS), the primary focus of this study. In addition to being more accurate, our method is more time-efficient and easier to train due to its reliance on a simpler network architecture and solely RGB data as input.

[91]  arXiv:2404.19149 [pdf, other]
Title: SAGS: Structure-Aware 3D Gaussian Splatting
Comments: 15 pages, 8 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24$\times$ size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps. Project page https://eververas.github.io/SAGS/.

[92]  arXiv:2404.19154 [pdf, other]
Title: RTF: Region-based Table Filling Method for Relational Triple Extraction
Comments: Rejected by EMNLP 2023
Subjects: Computation and Language (cs.CL)

Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling method (RTF). We devise a novel region-based tagging scheme and bi-directional decoding strategy, which regard each relational triple as a region on the relation-specific table, and identifies triples by determining two endpoints of each region. We also introduce convolution to construct region-level table representations from a spatial perspective which makes triples easier to be captured. In addition, we share partial tagging scores among different relations to improve learning efficiency of relation classifier. Experimental results show that our method achieves state-of-the-art with better generalization capability on three variants of two widely used benchmark datasets.

[93]  arXiv:2404.19156 [pdf, other]
Title: Parameter Selection by GCV and a $χ^2$ test within Iterative Methods for $\ell_1$-regularized Inverse Problems
Comments: 23 pages, 12 figures
Subjects: Numerical Analysis (math.NA)

$\ell_1$ regularization is used to preserve edges or enforce sparsity in a solution to an inverse problem. We investigate the Split Bregman and the Majorization-Minimization iterative methods that turn this non-smooth minimization problem into a sequence of steps that include solving an $\ell_2$-regularized minimization problem. We consider selecting the regularization parameter in the inner generalized Tikhonov regularization problems that occur at each iteration in these $\ell_1$ iterative methods. The generalized cross validation and $\chi^2$ degrees of freedom methods are extended to these inner problems. In particular, for the $\chi^2$ method this includes extending the $\chi^2$ result for problems in which the regularization operator has more rows than columns, and showing how to use the $A-$weighted generalized inverse to estimate prior information at each inner iteration. Numerical experiments for image deblurring problems demonstrate that it is more effective to select the regularization parameter automatically within the iterative schemes than to keep it fixed for all iterations. Moreover, an appropriate regularization parameter can be estimated in the early iterations and used fixed to convergence.

[94]  arXiv:2404.19159 [pdf, other]
Title: What Drives Performance in Multilingual Language Models?
Comments: Accepted at VarDial @ NAACL 2024
Subjects: Computation and Language (cs.CL)

This study investigates the factors influencing the performance of multilingual large language models (MLLMs) across diverse languages. We study 6 MLLMs, including masked language models, autoregressive models, and instruction-tuned LLMs, on the SIB-200 dataset, a topic classification dataset encompassing 204 languages. Our analysis considers three scenarios: ALL languages, SEEN languages (present in the model's pretraining data), and UNSEEN languages (not present or documented in the model's pretraining data in any meaningful way). We examine the impact of factors such as pretraining data size, general resource availability, language family, and script type on model performance. Decision tree analysis reveals that pretraining data size is the most influential factor for SEEN languages. However, interestingly, script type and language family are crucial for UNSEEN languages, highlighting the importance of cross-lingual transfer learning. Notably, model size and architecture do not significantly alter the most important features identified. Our findings provide valuable insights into the strengths and limitations of current MLLMs and hope to guide the development of more effective and equitable multilingual NLP systems.

[95]  arXiv:2404.19164 [pdf, ps, other]
Title: Optimal Bridge, Twin Bridges and Beyond: Inserting Edges into a Road Network to Minimize the Constrained Diameters
Comments: 18 pages, 5 figures
Subjects: Computational Geometry (cs.CG)

Given a road network modelled as a planar straight-line graph $G=(V,E)$ with $|V|=n$, let $(u,v)\in V\times V$, the shortest path (distance) between $u,v$ is denoted as $\delta_G(u,v)$. Let $\delta(G)=\max_{(u,v)}\delta_G(u,v)$, for $(u,v)\in V\times V$, which is called the diameter of $G$. Given a disconnected road network modelled as two disjoint trees $T_1$ and $T_2$, this paper first aims at inserting one and two edges (bridges) between them to minimize the (constrained) diameter $\delta(T_1\cup T_2\cup I_j)$ going through the inserted edges, where $I_j, j=1,2$, is the set of inserted edges with $|I_1|=1$ and $|I_2|=2$. The corresponding problems are called the {\em optimal bridge} and {\em twin bridges} problems. Since when more than one edge are inserted between two trees the resulting graph is becoming more complex, for the general network $G$ we consider the problem of inserting a minimum of $k$ edges such that the shortest distances between a set of $m$ pairs $P=\{(u_i,v_i)\mid u_i,v_i\in V, i\in [m]\}$, $\delta_G(u_i,v_i)$'s, are all decreased.
The main results of this paper are summarized as follows:
(1) We show that the optimal bridge problem can be solved in $O(n^2)$ time and that a variation of it has a near-quadratic lower bound unless SETH fails. The proof also implies that the famous 3-SUM problem does have a near-quadratic lower bound for large integers, e.g., each of the $n$ input integers has $\Omega(\log n)$ decimal digits. We then give a simple factor-2 $O(n\log n)$ time approximation algorithm for the optimal bridge problem.
(2) We present an $O(n^4)$ time algorithm to solve the twin bridges problem, exploiting some new property not in the optimal bridge problem.
(3) For the general problem of inserting $k$ edges to reduce the (graph) distances between $m$ given pairs, we show that the problem is NP-complete.

[96]  arXiv:2404.19165 [pdf, other]
Title: DelGrad: Exact gradients in spiking networks for learning transmission delays and weights
Comments: 15 pages, 7 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information. Transmission delays play an important role in shaping these temporal characteristics. Recent work has demonstrated the substantial advantages of learning these delays along with synaptic weights, both in terms of accuracy and memory efficiency. However, these approaches suffer from drawbacks in terms of precision and efficiency, as they operate in discrete time and with approximate gradients, while also requiring membrane potential recordings for calculating parameter updates. To alleviate these issues, we propose an analytical approach for calculating exact loss gradients with respect to both synaptic weights and delays in an event-based fashion. The inclusion of delays emerges naturally within our proposed formalism, enriching the model's search space with a temporal dimension. Our algorithm is purely based on the timing of individual spikes and does not require access to other variables such as membrane potentials. We explicitly compare the impact on accuracy and parameter efficiency of different types of delays - axonal, dendritic and synaptic. Furthermore, while previous work on learnable delays in SNNs has been mostly confined to software simulations, we demonstrate the functionality and benefits of our approach on the BrainScaleS-2 neuromorphic platform.

[97]  arXiv:2404.19168 [pdf, other]
Title: PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by multiple view images under the scenarios of either without explicit training (zero-shot 3D shape recognition) or training with a limited number of data (few-shot 3D shape recognition). We analyze that both tasks are relevant and can be considered simultaneously. Specifically, leveraging the descriptor which is effective for zero-shot inference to guide the tuning of the aggregated descriptor under the few-shot training can significantly improve the few-shot learning efficacy. Hence, we propose Prompt-Enhanced View Aggregation Network (PEVA-Net) to simultaneously address zero/few-shot 3D shape recognition. Under the zero-shot scenario, we propose to leverage the prompts built up from candidate categories to enhance the aggregation process of multiple view-associated visual features. The resulting aggregated feature serves for effective zero-shot recognition of the 3D shapes. Under the few-shot scenario, we first exploit a transformer encoder to aggregate the view-associated visual features into a global descriptor. To tune the encoder, together with the main classification loss, we propose a self-distillation scheme via a feature distillation loss by treating the zero-shot descriptor as the guidance signal for the few-shot descriptor. This scheme can significantly enhance the few-shot learning efficacy.

[98]  arXiv:2404.19170 [pdf, other]
Title: Asymptotically Compatible Fractional Grönwall Inequality and its Applications
Comments: 22 pages, 4 figures,
Subjects: Numerical Analysis (math.NA)

In this work, we will give proper estimates for the discrete convolution complementary (DCC) kernels, which leads to the asymptotically compatible fractional Gr\"onwall inequality. The consequence can be applied in the analysis of the stability and pointwise-in-time error of difference-type schemes on a non-uniform mesh. The pointwise error is explicitly bound when a non-uniform time grid is given by a specific scale function e.g. graded mesh, can be given directly. Numerical experiments towards the conclusion of this work validate the error analysis.

[99]  arXiv:2404.19171 [pdf, other]
Title: Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection
Comments: accepted by ICME 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance deepfake detection towards various generation scenarios. Our approach introduces a correlation distillation task, which models the inherent cross-modal correlation based on content information. This strategy helps to prevent the model from overfitting merely to audio-visual synchronization. Additionally, we present the Cross-Modal Deepfake Dataset (CMDFD), a comprehensive dataset with four generation methods to evaluate the detection of diverse cross-modal deepfakes. The experimental results on CMDFD and FakeAVCeleb datasets demonstrate the superior generalizability of our method over existing state-of-the-art methods. Our code and data can be found at \url{https://github.com/ljj898/CMDFD-Dataset-and-Deepfake-Detection}.

[100]  arXiv:2404.19173 [pdf, other]
Title: Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking
Comments: 8 pages, 5 figs
Subjects: Robotics (cs.RO)

A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.

[101]  arXiv:2404.19174 [pdf, other]
Title: XFeat: Accelerated Features for Lightweight Image Matching
Comments: CVPR 2024; Source code available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, accurate image matching requires sufficiently large image resolutions - for this reason, we keep the resolution as large as possible while limiting the number of channels in the network. Besides, our model is designed to offer the choice of matching at the sparse or semi-dense levels, each of which may be more suitable for different downstream applications, such as visual navigation and augmented reality. Our model is the first to offer semi-dense matching efficiently, leveraging a novel match refinement module that relies on coarse local descriptors. XFeat is versatile and hardware-independent, surpassing current deep learning-based local features in speed (up to 5x faster) with comparable or better accuracy, proven in pose estimation and visual localization. We showcase it running in real-time on an inexpensive laptop CPU without specialized hardware optimizations. Code and weights are available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24.

[102]  arXiv:2404.19175 [pdf, other]
Title: Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Subjects: Computation and Language (cs.CL)

The dynamic nature of esports makes the situation relatively complicated for average viewers. Esports broadcasting involves game expert casters, but the caster-dependent game commentary is not enough to fully understand the game situation. It will be richer by including diverse multimodal esports information, including audiences' talks/emotions, game audio, and game match event information. This paper introduces GAME-MUG, a new multimodal game situation understanding and audience-engaged commentary generation dataset and its strong baseline. Our dataset is collected from 2020-2022 LOL game live streams from YouTube and Twitch, and includes multimodal esports game information, including text, audio, and time-series event logs, for detecting the game situation. In addition, we also propose a new audience conversation augmented commentary dataset by covering the game situation and audience conversation understanding, and introducing a robust joint multimodal dual learning model as a baseline. We examine the model's game situation/event understanding ability and commentary generation capability to show the effectiveness of the multimodal aspects coverage and the joint integration learning approach.

[103]  arXiv:2404.19178 [pdf, other]
Title: Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
Subjects: Computation and Language (cs.CL)

Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

[104]  arXiv:2404.19180 [pdf, other]
Title: MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor
Subjects: Hardware Architecture (cs.AR)

General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled multi-core general-purpose architecture optimized for GEMM-related applications. To enhance the programmability and flexibility of MACO, the paper introduces a tile-based instruction set architecture. Additionally, the paper presents techniques such as hardware-assisted data prefetching and locking, and predictive address translation to further enhance the computational efficiency of MACO for GEMM workloads. The experimental results demonstrate that MACO exhibits good scalability, achieving an average computational efficiency of 90% across multiple cores. Furthermore, evaluations on state-of-the-art deep neural networks show that MACO can achieve up to 1.1 TFLOPS with 88% computational efficiency, indicating its adaptivity to deep learning workloads.

[105]  arXiv:2404.19186 [pdf, ps, other]
Title: The Mathematical Foundation of Post-Quantum Cryptography
Authors: Chuanming Zong
Comments: 23 pages
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Metric Geometry (math.MG); Number Theory (math.NT)

On July 5, 2022, the National Institute of Standards and Technology announced four possible post-quantum cryptography standards, three of them are based on lattice theory and the other one is based on Hash function. It is well-known that the security of the lattice cryptography relies on the hardness of the shortest vector problem (SVP) and the closest vector problem (CVP). In fact, the SVP is a sphere packing problem and the CVP is a sphere covering problem. Furthermore, both SVP and CVP are equivalent to arithmetic problems of positive definite quadratic forms. This paper will briefly introduce the post-quantum cryptography and show its connections with sphere packing, sphere covering, and positive definite quadratic forms.

[106]  arXiv:2404.19187 [pdf, other]
Title: CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and singing voice beautifying doesn't only contain pitch correction but other aspects like emotion and rhythm. Since we propose a fast and high-fidelity singing voice beautifying system called ConTuner, a diffusion model combined with the modified condition to generate the beautified Mel-spectrogram, where the modified condition is composed of optimized pitch and expressiveness. For pitch correction, we establish a mapping relationship from MIDI, spectrum envelope to pitch. To make amateur singing more expressive, we propose the expressiveness enhancer in the latent space to convert amateur vocal tone to professional. ConTuner achieves a satisfactory beautification effect on both Mandarin and English songs. Ablation study demonstrates that the expressiveness enhancer and generator-based accelerate method in ConTuner are effective.

[107]  arXiv:2404.19188 [pdf, other]
Title: Maximum bound principle and original energy dissipation of arbitrarily high-order rescaled exponential time differencing Runge-Kutta schemes for Allen--Cahn equations
Subjects: Numerical Analysis (math.NA)

The energy dissipation law and the maximum bound principle are two critical physical properties of the Allen--Cahn equations. While many existing time-stepping methods are known to preserve the energy dissipation law, most apply to a modified form of energy. In this work, we demonstrate that, when the nonlinear term of the Allen--Cahn equation is Lipschitz continuous, a class of arbitrarily high-order exponential time differencing Runge--Kutta (ETDRK) schemes preserve the original energy dissipation property, under a mild step-size constraint. Additionally, we guarantee the Lipschitz condition on the nonlinear term by applying a rescaling post-processing technique, which ensures that the numerical solution unconditionally satisfies the maximum bound principle. Consequently, our proposed schemes maintain both the original energy dissipation law and the maximum bound principle and can achieve arbitrarily high-order accuracy. We also establish an optimal error estimate for the proposed schemes. Some numerical experiments are carried out to verify our theoretical results.

[108]  arXiv:2404.19189 [pdf, other]
Title: Assessing the safety benefits of CACC+ based coordination of connected and autonomous vehicle platoons in emergency braking scenarios
Subjects: Systems and Control (eess.SY)

Ensuring safety is the most important factor in connected and autonomous vehicles, especially in emergency braking situations. As such, assessing the safety benefits of one information topology over other is a necessary step towards evaluating and ensuring safety. In this paper, we compare the safety benefits of a cooperative adaptive cruise control which utilizes information from one predecessor vehicle (CACC) with the one that utilizes information from multiple predecessors (CACC+) for the maintenance of spacing under an emergency braking scenario. A constant time headway policy is employed for maintenance of spacing (that includes a desired standstill spacing distance and a velocity dependent spacing distance) between the vehicles in the platoon. The considered emergency braking scenario consists of braking of the leader vehicle of the platoon at its maximum deceleration and that of the following vehicles to maintain the spacing as per CACC or CACC+. By focusing on the standstill spacing distance and utilizing Monte Carlo simulations, we assess the safety benefits of CACC+ over CACC by utilizing the following safety metrics: (1) probability of collision, (2) expected number of collisions, and (3) severity of collision (defined as the relative velocity of the two vehicles at impact). We present and provide discussion of these results.

[109]  arXiv:2404.19192 [pdf, other]
Title: Mix of Experts Language Model for Named Entity Recognition
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named Entity Recognition (NER) is an essential steppingstone in the field of natural language processing. Although promising performance has been achieved by various distantly supervised models, we argue that distant supervision inevitably introduces incomplete and noisy annotations, which may mislead the model training process. To address this issue, we propose a robust NER model named BOND-MoE based on Mixture of Experts (MoE). Instead of relying on a single model for NER prediction, multiple models are trained and ensembled under the Expectation-Maximization (EM) framework, so that noisy supervision can be dramatically alleviated. In addition, we introduce a fair assignment module to balance the document-model assignment process. Extensive experiments on real-world datasets show that the proposed method achieves state-of-the-art performance compared with other distantly supervised NER.

[110]  arXiv:2404.19195 [pdf, ps, other]
Title: Evaluation of Thermal Performance of a Wick-free Vapor Chamber in Power Electronics Cooling
Authors: Arani Mukhopadhyay (1), Anish Pal (1), Congbo Bao (2), Mohamad Jafari Gukeh (1), Sudip K. Mazumder (2), Constantine M. Megaridis (1) ((1) Mechanical and Industrial Engineering, University of Illinois Chicago, IL, US. (2) Electrical and Computer Engineering, University of Illinois Chicago, IL, US.)
Comments: Presented at IEEE ITherm (Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems) 2023, Orlando FL. Corresponding author: cmm@uic.edu
Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Applied Physics (physics.app-ph)

Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the wicking material, which slows down condensate transport rates and increases the chances for dry-out. Thicker wicks add to overall thermal resistance, while deterring the development of thinner devices by limiting the total thickness of the vapor chamber. Wickless vapor chambers eliminate the use of metal wicks entirely, by incorporating complementary wettability-patterned flat plates on both the evaporator and the condenser side. Such surface modifications enhance fluid transport on the evaporator side, while allowing the chambers to be virtually as thin as imaginable, thereby permitting design of thermally efficient thin electronic cooling devices. While wick-free vapor chambers have been studied and efficient design strategies have been suggested, we delve into real-life applications of wick-free vapor chambers in forced air cooling of high-power electronics. An experimental setup is developed wherein two Si-based MOSFETs of TO-247-3 packaging having high conduction resistance, are connected in parallel and switched at 100 kHz, to emulate high frequency power electronics operations. A rectangular copper wick-free vapor chamber spreads heat laterally over a surface 13 times larger than the heating area. This chamber is cooled externally by a fan that circulates air at room temperature. The present experimental setup extends our previous work on wick-free vapor chambers, while demonstrating the effectiveness of low-cost air cooling in vapor-chamber enhanced high-power electronics applications.

[111]  arXiv:2404.19204 [pdf, other]
Title: NeRF-Insert: 3D Local Editing with Multimodal Control Signals
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We propose NeRF-Insert, a NeRF editing framework that allows users to make high-quality local edits with a flexible level of control. Unlike previous work that relied on image-to-image models, we cast scene editing as an in-painting problem, which encourages the global structure of the scene to be preserved. Moreover, while most existing methods use only textual prompts to condition edits, our framework accepts a combination of inputs of different modalities as reference. More precisely, a user may provide a combination of textual and visual inputs including images, CAD models, and binary image masks for specifying a 3D region. We use generic image generation models to in-paint the scene from multiple viewpoints, and lift the local edits to a 3D-consistent NeRF edit. Compared to previous methods, our results show better visual quality and also maintain stronger consistency with the original NeRF.

[112]  arXiv:2404.19205 [pdf, other]
Title: TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
Comments: Technical Report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest accuracy among commercial and open-sourced MLLMs from our experiments. Moreover, we discover that the number of vision queries plays a significant role in TableVQA performance. To further analyze the capabilities of MLLMs in comparison to their LLM backbones, we investigate by presenting image-formatted tables to MLLMs and text-formatted tables to LLMs, respectively. Our findings suggest that processing visual inputs is more challenging than text inputs, as evidenced by the lower performance of MLLMs, despite generally requiring higher computational costs than LLMs. The proposed TableVQA-Bench and evaluation codes are available at \href{https://github.com/naver-ai/tablevqabench}{https://github.com/naver-ai/tablevqabench}.

[113]  arXiv:2404.19209 [pdf, other]
Title: AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning computations across different processors for parallelism and speedup doesn't necessarily correlate with energy consumption optimization and may even increase it. To address this, we present AdaOper, an energy-efficient concurrent DNN inference system. It optimizes energy efficiency on mobile heterogeneous processors while maintaining responsiveness. AdaOper includes a runtime energy profiler that dynamically adjusts operator partitioning to optimize energy efficiency based on dynamic device conditions. We conduct preliminary experiments, which show that AdaOper reduces energy consumption by 16.88% compared to the existing concurrent method while ensuring real-time performance.

[114]  arXiv:2404.19212 [pdf, other]
Title: EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of information overlap in the disentangling process which results in less speech naturalness. To overcome such limits, we propose a two-stage model to disentangle speech representations in a self-supervised manner without a human-crafted bottleneck design, which uses the Mutual Information (MI) with the designed upper bound estimator (IFUB) to separate overlapping information between speech components. Moreover, we design a Joint Text-Guided Consistent (TGC) module to guide the extraction of speech content and eliminate timbre leakage issues. Experiments show that our model can achieve a better performance than the baseline, regarding disentanglement effectiveness, speech naturalness, and similarity. Audio samples can be found at https://largeaudiomodel.com/eadvc.

[115]  arXiv:2404.19214 [pdf, other]
Title: EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared Residual Multi-Head Attention (SRMHA) and Chunk-Level Feedforward Networks (CFFN). The SRMHA module effectively reduces redundant computations in the network, while the CFFN module captures spatial knowledge and reduces the number of parameters. The effectiveness of the EfficientASR model is validated on two public datasets, namely Aishell-1 and HKUST. Experimental results demonstrate a 36% reduction in parameters compared to the baseline Transformer network, along with improvements of 0.3% and 0.2% in Character Error Rate (CER) on the Aishell-1 and HKUST datasets, respectively.

[116]  arXiv:2404.19217 [pdf, other]
Title: FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills
Subjects: Robotics (cs.RO)

Simulation is a widely used tool in robotics to reduce hardware consumption and gather large-scale data. Despite previous efforts to simulate optical tactile sensors, there remain challenges in efficiently synthesizing images and replicating marker motion under different contact loads. In this work, we propose a fast optical tactile simulator, named FOTS, for simulating optical tactile sensors. We utilize multi-layer perceptron mapping and planar shadow generation to simulate the optical response, while employing marker distribution approximation to simulate the motion of surface markers caused by the elastomer deformation. Experimental results demonstrate that FOTS outperforms other methods in terms of image generation quality and rendering speed, achieving 28.6 fps for optical simulation and 326.1 fps for marker motion simulation on a single CPU without GPU acceleration. In addition, we integrate the FOTS simulation model with physical engines like MuJoCo, and the peg-in-hole task demonstrates the effectiveness of our method in achieving zero-shot Sim2Real learning of tactile-motor robot manipulation skills. Our code is available at https://github.com/Rancho-zhao/FOTS.

[117]  arXiv:2404.19218 [pdf, ps, other]
Title: Flight Trajectory Prediction Using an Enhanced CNN-LSTM Network
Subjects: Machine Learning (cs.LG)

Aiming at the problem of low accuracy of flight trajectory prediction caused by the high speed of fighters, the diversity of tactical maneuvers, and the transient nature of situational change in close range air combat, this paper proposes an enhanced CNN-LSTM network as a fighter flight trajectory prediction method. Firstly, we extract spatial features from fighter trajectory data using CNN, aggregate spatial features of multiple fighters using the social-pooling module to capture geographic information and positional relationships in the trajectories, and use the attention mechanism to capture mutated trajectory features in air combat; subsequently, we extract temporal features by using the memory nature of LSTM to capture long-term temporal dependence in the trajectories; and finally, we merge the temporal and spatial features to predict the flight trajectories of enemy fighters. Extensive simulation experiments verify that the proposed method improves the trajectory prediction accuracy compared to the original CNN-LSTM method, with the improvements of 32% and 34% in ADE and FDE indicators.

[118]  arXiv:2404.19221 [pdf, other]
Title: Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Comments: CORLW 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring expressions is challenging -- it requires the ability to both parse the 3D structure of the scene and correctly ground free-form language in the presence of distraction and clutter. We introduce Transcrib3D, an approach that brings together 3D detection methods and the emergent reasoning capabilities of large language models (LLMs). Transcrib3D uses text as the unifying medium, which allows us to sidestep the need to learn shared representations connecting multi-modal inputs, which would require massive amounts of annotated 3D data. As a demonstration of its effectiveness, Transcrib3D achieves state-of-the-art results on 3D reference resolution benchmarks, with a great leap in performance from previous multi-modality baselines. To improve upon zero-shot performance and facilitate local deployment on edge computers and robots, we propose self-correction for fine-tuning that trains smaller models, resulting in performance close to that of large models. We show that our method enables a real robot to perform pick-and-place tasks given queries that contain challenging referring expressions. Project site is at https://ripl.github.io/Transcrib3D.

[119]  arXiv:2404.19222 [pdf, other]
Title: Cycles of Well-Linked Sets and an Elementary Bound for the Directed Grid Theorem
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

In 2015, Kawarabayashi and Kreutzer proved the directed grid theorem confirming a conjecture by Reed, Johnson, Robertson, Seymour, and Thomas from the mid-nineties. The theorem states the existence of a function $f$ such that every digraph of directed tree-width $f(k)$ contains a cylindrical grid of order $k$ as a butterfly minor, but the given function grows non-elementarily with the size of the grid minor.
In this paper we present an alternative proof of the directed grid theorem which is conceptually much simpler, more modular in its composition and also improves the upper bound for the function $f$ to a power tower of height 22.
Our proof is inspired by the breakthrough result of Chekuri and Chuzhoy, who proved a polynomial bound for the excluded grid theorem for undirected graphs. We translate a key concept of their proof to directed graphs by introducing \emph{cycles of well-linked sets (CWS)}, and show that any digraph of high directed tree-width contains a large CWS, which in turn contains a large cylindrical grid, improving the result due to Kawarabayashi and Kreutzer from an non-elementary to an elementary function.
An immediate application of our result is an improvement of the bound for Younger's conjecture proved by Reed, Robertson, Seymour and Thomas (1996) from a non-elementary to an elementary function. The same improvement applies to other types of Erd\H{o}s-P\'osa style problems on directed graphs. To the best of our knowledge this is the first significant improvement on the bound for Younger's conjecture since it was proved in 1996.
We believe that the theoretical tools we developed may find applications beyond the directed grid theorem, in a similar way as the path-of-sets-system framework due to Chekuri and Chuzhoy (2016) did (see for example Hatzel, Komosa, Pilipczuk and Sorge (2022); Chekuri and Chuzhoy (2015); Chuzhoy and Nimavat (2019)).

[120]  arXiv:2404.19223 [pdf, ps, other]
Title: Temporal Logic Resilience for Dynamical Systems
Subjects: Systems and Control (eess.SY)

We consider the notion of resilience for cyber-physical systems, that is, the ability of the system to withstand adverse events while maintaining acceptable functionality. We use finite temporal logic to express the requirements on the acceptable functionality and define the resilience metric as the maximum disturbance under which the system satisfies the temporal requirements. We fix a parameterized template for the set of disturbances and form a robust optimization problem under the system dynamics and the temporal specifications to find the maximum value of the parameter. Additionally, we introduce two novel classes of specifications: closed and convex finite temporal logics specifications, offering a comprehensive analysis of the resilience metric within these specific frameworks. From a computational standpoint, we present an exact solution for linear systems and exact-time reachability and finite-horizon safety, complemented by an approximate solution for finite-horizon reachability. Extending our findings to nonlinear systems, we leverage linear approximations and SMT-based approaches to offer viable computational methodologies. The theoretical results are demonstrated on the temperature regulation of buildings, adaptive cruise control and DC motors.

[121]  arXiv:2404.19226 [pdf, ps, other]
Title: A Survey of Deep Learning Based Software Refactoring
Comments: 45 pages, 8 figures
Subjects: Software Engineering (cs.SE)

Refactoring is one of the most important activities in software engineering which is used to improve the quality of a software system. With the advancement of deep learning techniques, researchers are attempting to apply deep learning techniques to software refactoring. Consequently, dozens of deep learning-based refactoring approaches have been proposed. However, there is a lack of comprehensive reviews on such works as well as a taxonomy for deep learning-based refactoring. To this end, in this paper, we present a survey on deep learning-based software refactoring. We classify related works into five categories according to the major tasks they cover. Among these categories, we further present key aspects (i.e., code smell types, refactoring types, training strategies, and evaluation) to give insight into the details of the technologies that have supported refactoring through deep learning. The classification indicates that there is an imbalance in the adoption of deep learning techniques for the process of refactoring. Most of the deep learning techniques have been used for the detection of code smells and the recommendation of refactoring solutions as found in 56.25\% and 33.33\% of the literature respectively. In contrast, only 6.25\% and 4.17\% were towards the end-to-end code transformation as refactoring and the mining of refactorings, respectively. Notably, we found no literature representation for the quality assurance for refactoring. We also observe that most of the deep learning techniques have been used to support refactoring processes occurring at the method level whereas classes and variables attracted minimal attention. Finally, we discuss the challenges and limitations associated with the employment of deep learning-based refactorings and present some potential research opportunities for future work.

[122]  arXiv:2404.19227 [pdf, other]
Title: Espresso: Robust Concept Filtering in Text-to-Image Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Diffusion-based text-to-image (T2I) models generate high-fidelity images for given textual prompts. They are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright infringing or unsafe). Retraining T2I models after filtering out unacceptable concepts in the training data is inefficient and degrades utility. Hence, there is a need for concept removal techniques (CRTs) which are effective in removing unacceptable concepts, utility-preserving on acceptable concepts, and robust against evasion with adversarial prompts. None of the prior filtering and fine-tuning CRTs satisfy all these requirements simultaneously.
We introduce Espresso, the first robust concept filter based on Contrastive Language-Image Pre-Training (CLIP). It identifies unacceptable concepts by projecting the generated image's embedding onto the vector connecting unacceptable and acceptable concepts in the joint text-image embedding space. This ensures robustness by restricting the adversary to adding noise only along this vector, in the direction of the acceptable concept. Further fine-tuning Espresso to separate embeddings of acceptable and unacceptable concepts, while preserving their pairing with image embeddings, ensures both effectiveness and utility. We evaluate Espresso on eleven concepts to show that it is effective (~5% CLIP accuracy on unacceptable concepts), utility-preserving (~93% normalized CLIP score on acceptable concepts), and robust (~4% CLIP accuracy on adversarial prompts for unacceptable concepts). Finally, we present theoretical bounds for the certified robustness of Espresso against adversarial prompts, and an empirical analysis.

[123]  arXiv:2404.19228 [pdf, other]
Title: Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information
Subjects: Machine Learning (cs.LG)

Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP is a key concept in multimodal representation learning. In this work, we provide a theoretical understanding of the symmetric InfoNCE loss through the lens of the pointwise mutual information and show that encoders that achieve the optimal similarity in the pretraining provide a good representation for downstream classification tasks under mild assumptions. Based on our theoretical results, we also propose a new similarity metric for multimodal contrastive learning by utilizing a nonlinear kernel to enrich the capability. To verify the effectiveness of the proposed method, we demonstrate pretraining of multimodal representation models on the Conceptual Caption datasets and evaluate zero-shot classification and linear classification on common benchmark datasets.

[124]  arXiv:2404.19232 [pdf, other]
Title: GRAMMAR: Grounded and Modular Evaluation of Domain-Specific Retrieval-Augmented Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.

[125]  arXiv:2404.19234 [pdf, other]
Title: Multi-hop Question Answering over Knowledge Graphs using Large Language Models
Authors: Abir Chakraborty
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)

Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance.

[126]  arXiv:2404.19236 [pdf, other]
Title: On the Effect of Bounded Rationality in Electricity Markets
Authors: Lihui Yi, Ermin Wei
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Nash equilibrium is a common solution concept that captures the strategic interaction in electricity market analysis. However, it requires a fundamental but impractical assumption that all market participants are fully rational, which implies unlimited computational resources and cognitive abilities. To tackle the limitation, level-k reasoning is proposed and studied to model the bounded rational behaviors. In this paper, we consider a Cournot competition in electricity markets with two suppliers both following level-k reasoning. One is a self-interested firm and the other serves as a benevolent social planner. First, we observe that the optimal strategy of the social planner is to be of a particular rationality level. Being less or more rational may both result in reduced social welfare. Then, we investigate the effect of bounded rationality on social welfare performance and find that it could largely deviate from that at the Nash equilibrium point. Finally, we characterize optimal, mean maximizing and max-min strategies for the benevolent social planner, when having access to different information. The numerical experiments further demonstrate and validate our findings.

[127]  arXiv:2404.19238 [pdf, other]
Title: Pilot Contamination in Massive MIMO Systems: Challenges and Future Prospects
Comments: Accepted At IWCMC 2024 Comm & SP Symposium
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Massive multiple input multiple output (M-MIMO) technology plays a pivotal role in fifth-generation (5G) and beyond communication systems, offering a wide range of benefits, from increased spectral efficiency (SE) to enhanced energy efficiency and higher reliability. However, these advantages are contingent upon precise channel state information (CSI) availability at the base station (BS). Ensuring precise CSI is challenging due to the constrained size of the coherence interval and the resulting limitations on pilot sequence length. Therefore, reusing pilot sequences in adjacent cells introduces pilot contamination, hindering SE enhancement. This paper reviews recent advancements and addresses research challenges in mitigating pilot contamination and improving channel estimation, categorizing the existing research into three broader categories: pilot assignment schemes, advanced signal processing methods, and advanced channel estimation techniques. Salient representative pilot mitigation/assignment techniques are analyzed and compared in each category. Lastly, possible future research directions are discussed.

[128]  arXiv:2404.19242 [pdf, other]
Title: A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Methodology (stat.ME)

Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method.

[129]  arXiv:2404.19243 [pdf, other]
Title: Co-occurrence order-preserving pattern mining
Subjects: Databases (cs.DB)

Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a specific prefix pattern, this paper addresses the issue of co-occurrence OPP mining (COP) and proposes an algorithm named COP-Miner to discover COPs from historical time series. COP-Miner consists of three parts: extracting keypoints, preparation stage, and iteratively calculating supports and mining frequent COPs. Extracting keypoints is used to obtain local extreme points of patterns and time series. The preparation stage is designed to prepare for the first round of mining, which contains four steps: obtaining the suffix OPP of the keypoint sub-time series, calculating the occurrences of the suffix OPP, verifying the occurrences of the keypoint sub-time series, and calculating the occurrences of all fusion patterns of the keypoint sub-time series. To further improve the efficiency of support calculation, we propose a support calculation method with an ending strategy that uses the occurrences of prefix and suffix patterns to calculate the occurrences of superpatterns. Experimental results indicate that COP-Miner outperforms the other competing algorithms in running time and scalability. Moreover, COPs with keypoint alignment yield better prediction performance.

[130]  arXiv:2404.19244 [pdf, other]
Title: A University Framework for the Responsible use of Generative AI in Research
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Generative Artificial Intelligence (generative AI) poses both opportunities and risks for the integrity of research. Universities must guide researchers in using generative AI responsibly, and in navigating a complex regulatory landscape subject to rapid change. By drawing on the experiences of two Australian universities, we propose a framework to help institutions promote and facilitate the responsible use of generative AI. We provide guidance to help distil the diverse regulatory environment into a principles-based position statement. Further, we explain how a position statement can then serve as a foundation for initiatives in training, communications, infrastructure, and process change. Despite the growing body of literature about AI's impact on academic integrity for undergraduate students, there has been comparatively little attention on the impacts of generative AI for research integrity, and the vital role of institutions in helping to address those challenges. This paper underscores the urgency for research institutions to take action in this area and suggests a practical and adaptable framework for so doing.

[131]  arXiv:2404.19245 [pdf, other]
Title: HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
Comments: 19 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for improved PEFT approaches that can achieve better performance. Through a series of experiments, we have uncovered two critical insights that shed light on the training and parameter inefficiency of LoRA. Building on these insights, we have developed HydraLoRA, a LoRA framework with an asymmetric structure that eliminates the need for domain expertise. Our experiments demonstrate that HydraLoRA outperforms other PEFT approaches, even those that rely on domain knowledge during the training and inference phases. \href{https://github.com/Clin0212/HydraLoRA}{Code}.

[132]  arXiv:2404.19246 [pdf, ps, other]
Title: Logistic Map Pseudo Random Number Generator in FPGA
Comments: 10 pages, 6 figures
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display driver. These components facilitate the direct display of PRNG values on the FPGA and the transmission of data to a laptop for histogram analysis, verifying the Gaussian nature of the output. This approach demonstrates the practical application of chaotic systems for generating Gaussian-distributed pseudo-random numbers in digital hardware, highlighting the logistic map's potential in PRNG design.

[133]  arXiv:2404.19247 [pdf, ps, other]
Title: Improved AutoEncoder with LSTM module and KL divergence
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep convolutional autoencoder (CAE) network and deep supporting vector data description (SVDD) model have been universally employed and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to high false negative rate in detecting anomalous data. On the other hand, the deep SVDD model has the drawback of feature collapse, which leads to a decrease of detection accuracy for anomalies. To address these problems, we propose the Improved AutoEncoder with LSTM module and Kullback-Leibler divergence (IAE-LSTM-KL) model in this paper. An LSTM network is added after the encoder to memorize feature representations of normal data. In the meanwhile, the phenomenon of feature collapse can also be mitigated by penalizing the featured input to SVDD module via KL divergence. The efficacy of the IAE-LSTM-KL model is validated through experiments on both synthetic and real-world datasets. Experimental results show that IAE-LSTM-KL model yields higher detection accuracy for anomalies. In addition, it is also found that the IAE-LSTM-KL model demonstrates enhanced robustness to contaminated outliers in the dataset.

[134]  arXiv:2404.19248 [pdf, other]
Title: Transition Rate Scheduling for Quantization-Aware Training
Comments: Submitted to IEEE TPAMI on Apr. 03, 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations. It learns quantized weights indirectly by updating latent weights, i.e., full-precision inputs to a quantizer, using gradient-based optimizers. We claim that coupling a user-defined learning rate (LR) with these optimizers is sub-optimal for QAT. Quantized weights transit discrete levels of a quantizer, only if corresponding latent weights pass transition points, where the quantizer changes discrete states. This suggests that the changes of quantized weights are affected by both the LR for latent weights and their distributions. It is thus difficult to control the degree of changes for quantized weights by scheduling the LR manually. We conjecture that the degree of parameter changes in QAT is related to the number of quantized weights transiting discrete levels. Based on this, we introduce a transition rate (TR) scheduling technique that controls the number of transitions of quantized weights explicitly. Instead of scheduling a LR for latent weights, we schedule a target TR of quantized weights, and update the latent weights with a novel transition-adaptive LR (TALR), enabling considering the degree of changes for the quantized weights during QAT. Experimental results demonstrate the effectiveness of our approach on standard benchmarks.

[135]  arXiv:2404.19249 [pdf, other]
Title: A Nonnested Augmented Subspace Method for Kohn-Sham Equation
Subjects: Numerical Analysis (math.NA)

In this paper, a novel adaptive finite element method is proposed to solve the Kohn-Sham equation based on the moving mesh (nonnested mesh) adaptive technique and the augmented subspace method. Different from the classical self-consistent field iterative algorithm which requires to solve the Kohn-Sham equation directly in each adaptive finite element space, our algorithm transforms the Kohn-Sham equation into some linear boundary value problems of the same scale in each adaptive finite element space, and then the wavefunctions derived from the linear boundary value problems are corrected by solving a small-scale Kohn-Sham equation defined in a low-dimensional augmented subspace. Since the new algorithm avoids solving large-scale Kohn-Sham equation directly, a significant improvement for the solving efficiency can be obtained. In addition, the adaptive moving mesh technique is used to generate the nonnested adaptive mesh for the nonnested augmented subspace method according to the singularity of the approximate wavefunctions. The modified Hessian matrix of the approximate wavefunctions is used as the metric matrix to redistribute the mesh. Through the moving mesh adaptive technique, the redistributed mesh is almost optimal. A number of numerical experiments are carried out to verify the efficiency and the accuracy of the proposed algorithm.

[136]  arXiv:2404.19250 [pdf, other]
Title: Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias attributes. While recent approaches mainly focus on emphasizing the learning of data samples without bias attributes (i.e., bias-conflicting samples) compared to samples with bias attributes (i.e., bias-aligned samples), they fall short of directly guiding models where to focus for learning intrinsic features. To address this limitation, this paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. We first identify the intrinsic features by investigating the class-discerning common features between a bias-aligned (BA) sample and a bias-conflicting (BC) sample (i.e., bias-contrastive pair). Next, we enhance the intrinsic features in the BA sample that are relatively under-exploited for prediction compared to the BC sample. To construct the bias-contrastive pair without using bias information, we introduce a bias-negative score that distinguishes BC samples from BA samples employing a biased model. The experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.

[137]  arXiv:2404.19252 [pdf, other]
Title: Exploiting Hatred by Targets for Hate Speech Detection on Vietnamese Social Media Texts
Subjects: Computation and Language (cs.CL)

The growth of social networks makes toxic content spread rapidly. Hate speech detection is a task to help decrease the number of harmful comments. With the diversity in the hate speech created by users, it is necessary to interpret the hate speech besides detecting it. Hence, we propose a methodology to construct a system for targeted hate speech detection from online streaming texts from social media. We first introduce the ViTHSD - a targeted hate speech detection dataset for Vietnamese Social Media Texts. The dataset contains 10K comments, each comment is labeled to specific targets with three levels: clean, offensive, and hate. There are 5 targets in the dataset, and each target is labeled with the corresponding level manually by humans with strict annotation guidelines. The inter-annotator agreement obtained from the dataset is 0.45 by Cohen's Kappa index, which is indicated as a moderate level. Then, we construct a baseline for this task by combining the Bi-GRU-LSTM-CNN with the pre-trained language model to leverage the power of text representation of BERTology. Finally, we suggest a methodology to integrate the baseline model for targeted hate speech detection into the online streaming system for practical application in preventing hateful and offensive content on social media.

[138]  arXiv:2404.19253 [pdf, other]
Title: Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration
Comments: 8 Pages, Accepted to RA-L March 2024
Journal-ref: LRA.2024.3384037
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction. Nonverbal communication is widely used to communicate information during human-robot interaction, however, such methods may also be misunderstood, leading to communication errors. In this work, we explore modulating the acoustic parameter values (pitch bend, beats per minute, beats per loop) of nonverbal auditory expressions to convey functional robot states (accomplished, progressing, stuck). We propose a reinforcement learning (RL) algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions. The proposed approach was evaluated through a user study with 24 participants. The results demonstrate that: 1. Our proposed RL-based approach is able to learn suitable acoustic parameter values which improve the users' ability to correctly identify the state of the robot. 2. Algorithm initialization informed by previous user data can be used to significantly speed up the learning process. 3. The method used for algorithm initialization strongly influences whether participants converge to similar sounds for each robot state. 4. Modulation of pitch bend has the largest influence on user association between sounds and robotic states.

[139]  arXiv:2404.19254 [pdf, other]
Title: Suvach -- Generated Hindi QA benchmark
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Current evaluation benchmarks for question answering (QA) in Indic languages often rely on machine translation of existing English datasets. This approach suffers from bias and inaccuracies inherent in machine translation, leading to datasets that may not reflect the true capabilities of EQA models for Indic languages. This paper proposes a new benchmark specifically designed for evaluating Hindi EQA models and discusses the methodology to do the same for any task. This method leverages large language models (LLMs) to generate a high-quality dataset in an extractive setting, ensuring its relevance for the target language. We believe this new resource will foster advancements in Hindi NLP research by providing a more accurate and reliable evaluation tool.

[140]  arXiv:2404.19256 [pdf, other]
Title: Bias Mitigation via Compensation: A Reinforcement Learning Perspective
Comments: 8 pages, 5 diagrams
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

As AI increasingly integrates with human decision-making, we must carefully consider interactions between the two. In particular, current approaches focus on optimizing individual agent actions but often overlook the nuances of collective intelligence. Group dynamics might require that one agent (e.g., the AI system) compensate for biases and errors in another agent (e.g., the human), but this compensation should be carefully developed. We provide a theoretical framework for algorithmic compensation that synthesizes game theory and reinforcement learning principles to demonstrate the natural emergence of deceptive outcomes from the continuous learning dynamics of agents. We provide simulation results involving Markov Decision Processes (MDP) learning to interact. This work then underpins our ethical analysis of the conditions in which AI agents should adapt to biases and behaviors of other agents in dynamic and complex decision-making environments. Overall, our approach addresses the nuanced role of strategic deception of humans, challenging previous assumptions about its detrimental effects. We assert that compensation for others' biases can enhance coordination and ethical alignment: strategic deception, when ethically managed, can positively shape human-AI interactions.

[141]  arXiv:2404.19257 [pdf, ps, other]
Title: Persistent Homology generalizations for Social Media Network Analysis
Authors: Isabela Rocha
Comments: 52 pages, 20 figures
Subjects: Computers and Society (cs.CY); Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI)

This study details an approach for the analysis of social media collected political data through the lens of Topological Data Analysis, with a specific focus on Persistent Homology and the political processes they represent by proposing a set of mathematical generalizations using Gaussian functions to define and analyze these Persistent Homology categories. Three distinct types of Persistent Homologies were recurrent across datasets that had been plotted through retweeting patterns and analyzed through the k-Nearest-Neighbor filtrations. As these Persistent Homologies continued to appear, they were then categorized and dubbed Nuclear, Bipolar, and Multipolar Constellations. Upon investigating the content of these plotted tweets, specific patterns of interaction and political information dissemination were identified, namely Political Personalism and Political Polarization. Through clustering and application of Gaussian density functions, I have mathematically characterized each category, encapsulating their distinctive topological features. The mathematical generalizations of Bipolar, Nuclear, and Multipolar Constellations developed in this study are designed to inspire other political science digital media researchers to utilize these categories as to identify Persistent Homology in datasets derived from various social media platforms, suggesting the broader hypothesis that such structures are bound to be present on political scraped data regardless of the social media it's derived from. This method aims to offer a new perspective in Network Analysis as it allows for an exploration of the underlying shape of the networks formed by retweeting patterns, enhancing the understanding of digital interactions within the sphere of Computational Social Sciences.

[142]  arXiv:2404.19259 [pdf, other]
Title: DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Document semantic segmentation is a promising avenue that can facilitate document analysis tasks, including optical character recognition (OCR), form classification, and document editing. Although several synthetic datasets have been developed to distinguish handwriting from printed text, they fall short in class variety and document diversity. We demonstrate the limitations of training on existing datasets when solving the National Archives Form Semantic Segmentation dataset (NAFSS), a dataset which we introduce. To address these limitations, we propose the most comprehensive document semantic segmentation synthesis pipeline to date, incorporating preprinted text, handwriting, and document backgrounds from over 10 sources to create the Document Element Layer INtegration Ensemble 8K, or DELINE8K dataset. Our customized dataset exhibits superior performance on the NAFSS benchmark, demonstrating it as a promising tool in further research. The DELINE8K dataset is available at https://github.com/Tahlor/deline8k.

[143]  arXiv:2404.19260 [pdf, ps, other]
Title: Aspect and Opinion Term Extraction Using Graph Attention Network
Authors: Abir Chakraborty
Subjects: Computation and Language (cs.CL)

In this work we investigate the capability of Graph Attention Network for extracting aspect and opinion terms. Aspect and opinion term extraction is posed as a token-level classification task akin to named entity recognition. We use the dependency tree of the input query as additional feature in a Graph Attention Network along with the token and part-of-speech features. We show that the dependency structure is a powerful feature that in the presence of a CRF layer substantially improves the performance and generates the best result on the commonly used datasets from SemEval 2014, 2015 and 2016. We experiment with additional layers like BiLSTM and Transformer in addition to the CRF layer. We also show that our approach works well in the presence of multiple aspects or sentiments in the same query and it is not necessary to modify the dependency tree based on a single aspect as was the original application for sentiment classification.

[144]  arXiv:2404.19261 [pdf, other]
Title: High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Data Analysis, Statistics and Probability (physics.data-an)

Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues of the training loss Hessian have some remarkably robust features across models and datasets in the full batch regime. There is often an early period of progressive sharpening where the large eigenvalues increase, followed by stabilization at a predictable value known as the edge of stability. Previous work showed that in the stochastic setting, the eigenvalues increase more slowly - a phenomenon we call conservative sharpening. We provide a theoretical analysis of a simple high-dimensional model which shows the origin of this slowdown. We also show that there is an alternative stochastic edge of stability which arises at small batch size that is sensitive to the trace of the Neural Tangent Kernel rather than the large Hessian eigenvalues. We conduct an experimental study which highlights the qualitative differences from the full batch phenomenology, and suggests that controlling the stochastic edge of stability can help optimization.

[145]  arXiv:2404.19263 [pdf, other]
Title: mm-Wave and sub-THz Chip-to-Package Transitions for Communications Systems
Comments: 8 pages, 16 figures, to be submitted to an IEEE Journal. The first two authors are co-first authors
Subjects: Systems and Control (eess.SY)

This work presents mm-Wave and sub-THz chip-to-package transitions for communications systems. To date, reported transitions either have high loss, typically 3 to 4 dB, or require high cost packages to support very fine bump pitches and low loss materials. We analyze the impact of transitions on a high frequency, wide bandwidth communication system and present the design of a chip-to-package transition in two different commercial packaging technologies. The proposed transitions achieve <1 dB loss in both technologies, validating the design methodology.

[146]  arXiv:2404.19264 [pdf, other]
Title: DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets
Subjects: Robotics (cs.RO)

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.

[147]  arXiv:2404.19265 [pdf, other]
Title: Mapping New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail the Pix2Pix model's utilization for generating high-fidelity datasets, supported by a dataset of paired map and aerial images, and enhanced by a tailored training regimen. The results demonstrate the model's capability to accurately render complex urban features, establishing its efficacy and potential for broad real-world applications.

[148]  arXiv:2404.19267 [pdf, ps, other]
Title: Study on the Temporal Evolution of Literature Bradford Curves in the Context of Library Specialization
Authors: Haobai Xue, Xian Liu
Subjects: Digital Libraries (cs.DL); Numerical Analysis (math.NA)

The Bradford's law of bibliographic scattering is a fundamental law in bibliometrics and can provide valuable guidance to academic libraries in literature search and procurement. However, the Bradford's curves can take various shapes at different time points and there is still a lack of causal explanation for it, so the prediction of its shape is still an open question. This paper attributes the deviation of Bradford curve from the theoretical J-shape to the integer constraints of the journal number and article number, and extends the Leimkuhler and Egghe's formula to cover the core region of very productive journals, where the theoretical journal number of which fall below one. The key parameters of the extended formula are identified and studied by using the Simon-Yule model. The reasons for the Groos Droop are explained and the critical point for the shape change are studied. Finally, the proposed formulae are validated with the empirical data found in the literature. It is found that the proposed method can be used to predict the evolution of Bradford's curves and thus guide the academic library for scientific literature procurement and utilization.

[149]  arXiv:2404.19275 [pdf, other]
Title: AdapTics: A Toolkit for Creative Design and Integration of Real-Time Adaptive Mid-Air Ultrasound Tactons
Comments: Source code available: this https URL . To be published in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24)
Subjects: Human-Computer Interaction (cs.HC)

Mid-air ultrasound haptic technology can enhance user interaction and immersion in extended reality (XR) applications through contactless touch feedback. Yet, existing design tools for mid-air haptics primarily support creating tactile sensations (i.e., tactons) which cannot change at runtime. These tactons lack expressiveness in interactive scenarios where a continuous closed-loop response to user movement or environmental states is desirable. This paper introduces AdapTics, a toolkit featuring a graphical interface for rapid prototyping of adaptive tactons-dynamic sensations that can adjust at runtime based on user interactions, environmental changes, or other inputs. A software library and a Unity package accompany the graphical interface to enable integration of adaptive tactons in existing applications. We present the design space offered by AdapTics for creating adaptive mid-air ultrasound tactons and show the design tool can improve Creativity Support Index ratings for Exploration and Expressiveness in a user study with 12 XR and haptic designers.

[150]  arXiv:2404.19276 [pdf, other]
Title: C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
Comments: Accepted at ICRA 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have limitations in handling extreme blur and minuscule objects. To address this, we propose a novel coarse-to-fine detection strategy based on vision transformers. We evaluate our approach on three challenging drone-to-drone detection datasets, achieving F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Additionally, we demonstrate real-time processing capabilities by deploying our model on an edge-computing device. Our code will be made publicly available.

[151]  arXiv:2404.19277 [pdf, other]
Title: Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model
Journal-ref: IJCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger movements, as well as lip movements, meanwhile the two kinds of movements need to be asynchronously aligned. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models and careful hand-crafted pre-processing to fit the models. Therefore, we propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff). Specifically, to integrate additional linguistic rules knowledge into the model. we first introduce a bridging instruction called \textbf{Gloss}, which is an automatically generated descriptive text to establish a direct and more delicate semantic connection between spoken language and CS gestures. Moreover, we first suggest rhythm is an important paralinguistic feature for CS to improve the communication efficacy. Therefore, we propose a novel Audio-driven Rhythmic Module (ARM) to learn rhythm that matches audio speech. Moreover, in this work, we design, record, and publish the first Chinese CS dataset with four CS cuers. Extensive experiments demonstrate that our method quantitatively and qualitatively outperforms current state-of-the-art (SOTA) methods. We release the code and data at https://glossdiff.github.io/.

[152]  arXiv:2404.19279 [pdf, other]
Title: Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-being. The advent of deep learning significantly advances the performance of 3D pose estimation by incorporating temporal information for predicting the spatial positions of human joints. However, traditional methods often fall short as they primarily focus on the spatial coordinates of joints and overlook the orientation and rotation of the connecting bones, which are crucial for a comprehensive understanding of human pose in 3D space. To address these limitations, we introduce Quater-GCN (Q-GCN), a directed graph convolutional network tailored to enhance pose estimation by orientation. Q-GCN excels by not only capturing the spatial dependencies among node joints through their coordinates but also integrating the dynamic context of bone rotations in 2D space. This approach enables a more sophisticated representation of human poses by also regressing the orientation of each bone in 3D space, moving beyond mere coordinate prediction. Furthermore, we complement our model with a semi-supervised training strategy that leverages unlabeled data, addressing the challenge of limited orientation ground truth data. Through comprehensive evaluations, Q-GCN has demonstrated outstanding performance against current state-of-the-art methods.

[153]  arXiv:2404.19281 [pdf, other]
Title: Audio-Visual Traffic Light State Detection for Urban Robots
Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024
Subjects: Robotics (cs.RO)

We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors. The fusion method aggregates features across multiple frames in a given timeframe, increasing robustness and adaptability. Results show that our approach effectively addresses the challenge of visual occlusion and surpasses the performance of single-modality solutions when the robot is in motion. This study serves as a proof of concept, highlighting the significant, yet often overlooked, potential of multi-modal perception in robotics.

[154]  arXiv:2404.19282 [pdf, other]
Title: Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning
Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications
Subjects: Multimedia (cs.MM)

Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It is a vital parameter to reduce the redundant sample pairs participating in training. Nonetheless, finding the optimal threshold can be a time-consuming endeavor, often requiring extensive grid searches. Because the threshold cannot be dynamically adjusted in the training stage, we should conduct plenty of repeated experiments to determine the threshold. Therefore, we introduce a novel approach for adjusting the thresholds associated with both the loss function and the sample mining strategy. We design a static Asymmetric Sample Mining Strategy (ASMS) and its dynamic version Adaptive Tolerance ASMS (AT-ASMS), tailored for sample mining methods. ASMS utilizes differentiated thresholds to address the problems (too few positive pairs and too many redundant negative pairs) caused by only applying a single threshold to filter samples. AT-ASMS can adaptively regulate the ratio of positive and negative pairs during training according to the ratio of the currently mined positive and negative pairs. This meta-learning-based threshold generation algorithm utilizes a single-step gradient descent to obtain new thresholds. We combine these two threshold adjustment algorithms to form the Dual Dynamic Threshold Adjustment Strategy (DDTAS). Experimental results show that our algorithm achieves competitive performance on CUB200, Cars196, and SOP datasets.

[155]  arXiv:2404.19283 [pdf, other]
Title: MAP-Former: Multi-Agent-Pair Gaussian Joint Prediction
Comments: Accepted for publication in Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Jeju Island - Korea, 2-5 June 2024
Subjects: Machine Learning (cs.LG)

There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed. Closing this gap necessitates advancements in prediction beyond current practices. Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density functions (PDFs) for single agents. Although, these methods achieve high accurate trajectory predictions, they only provide little or no information about the dependencies of interacting agents. Since traffic is a process of highly interdependent agents, whose actions directly influence their mutual behavior, the existing methods are not sufficient to reliably assess the risk of future trajectories. This paper addresses that gap by introducing a novel approach to motion prediction, focusing on predicting agent-pair covariance matrices in a ``scene-centric'' manner, which can then be used to model Gaussian joint PDFs for all agent-pairs in a scene. We propose a model capable of predicting those agent-pair covariance matrices, leveraging an enhanced awareness of interactions. Utilizing the prediction results of our model, this work forms the foundation for comprehensive risk assessment with statistically based methods for analyzing agents' relations by their joint PDFs.

[156]  arXiv:2404.19284 [pdf, other]
Title: Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Subjects: Machine Learning (cs.LG)

Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the frequency and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.

[157]  arXiv:2404.19286 [pdf, other]
Title: Soft Prompt Generation for Domain Generalization
Comments: 23 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt and residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains, potentially compromising the transferability of the prompts. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely \textbf{S}oft \textbf{P}rompt \textbf{G}eneration (SPG). To the best of our knowledge, we are the first to introduce the generative model into prompt learning in VLMs and explore its potential for producing soft prompts by relying solely on the generative model, ensuring the diversity of prompts. Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt labels for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that our proposed SPG achieves state-of-the-art performance. The code will be available soon.

[158]  arXiv:2404.19287 [pdf, other]
Title: Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective
Comments: 16 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been overlooked. In this work, we initiate the first known and comprehensive effort to study adapting vision-language models for adversarial robustness under the multimodal attack. Firstly, we introduce a multimodal attack strategy and investigate the impact of different attacks. We then propose a multimodal contrastive adversarial training loss, aligning the clean and adversarial text embeddings with the adversarial and clean visual features, to enhance the adversarial robustness of both image and text encoders of CLIP. Extensive experiments on 15 datasets across two tasks demonstrate that our method significantly improves the adversarial robustness of CLIP. Interestingly, we find that the model fine-tuned against multimodal adversarial attacks exhibits greater robustness than its counterpart fine-tuned solely against image-based attacks, even in the context of image attacks, which may open up new possibilities for enhancing the security of VLMs.

[159]  arXiv:2404.19288 [pdf, other]
Title: Training-free Graph Neural Networks and the Power of Labels as Features
Authors: Ryoma Sato
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose training-free graph neural networks (TFGNNs), which can be used without training and can also be improved with optional training, for transductive node classification. We first advocate labels as features (LaF), which is an admissible but not explored technique. We show that LaF provably enhances the expressive power of graph neural networks. We design TFGNNs based on this analysis. In the experiments, we confirm that TFGNNs outperform existing GNNs in the training-free setting and converge with much fewer training iterations than traditional GNNs.

[160]  arXiv:2404.19289 [pdf, other]
Title: On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning
Comments: 13 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Self-supervised learning (SSL) has developed rapidly in recent years. However, most of the mainstream methods are computationally expensive and rely on two (or more) augmentations for each image to construct positive pairs. Moreover, they mainly focus on large models and large-scale datasets, which lack flexibility and feasibility in many practical applications. In this paper, we propose an efficient single-branch SSL method based on non-parametric instance discrimination, aiming to improve the algorithm, model, and data efficiency of SSL. By analyzing the gradient formula, we correct the update rule of the memory bank with improved performance. We further propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version. We show that this alleviates the infrequent updating problem in instance discrimination and greatly accelerates convergence. We systematically compare the training overhead and performance of different methods in different scales of data, and under different backbones. Experimental results show that our method outperforms various baselines with significantly less overhead, and is especially effective for limited amounts of data and small models.

[161]  arXiv:2404.19290 [pdf, other]
Title: Efficient inverse $Z$-transform and Wiener-Hopf factorization
Subjects: Numerical Analysis (math.NA); Computational Finance (q-fin.CP)

We suggest new closely related methods for numerical inversion of $Z$-transform and Wiener-Hopf factorization of functions on the unit circle, based on sinh-deformations of the contours of integration, corresponding changes of variables and the simplified trapezoid rule. As applications, we consider evaluation of high moments of probability distributions and construction of causal filters. Programs in Matlab running on a Mac with moderate characteristics achieves the precision E-14 in several dozen of microseconds and E-11 in several milliseconds, respectively.

[162]  arXiv:2404.19291 [pdf, other]
Title: Dynamic Human Trust Modeling of Autonomous Agents With Varying Capability and Strategy
Authors: Jason Dekarske (1), Zhaodan Kong (1), Sanjay Joshi (1) ((1) University of California, Davis)
Subjects: Human-Computer Interaction (cs.HC)

Objective We model the dynamic trust of human subjects in a human-autonomy-teaming screen-based task.
Background Trust is an emerging area of study in human-robot collaboration. Many studies have looked at the issue of robot performance as a sole predictor of human trust, but this could underestimate the complexity of the interaction.
Method Subjects were paired with autonomous agents to search an on-screen grid to determine the number of outlier objects. In each trial, a different autonomous agent with a preassigned capability used one of three search strategies and then reported the number of outliers it found as a fraction of its capability. Then, the subject reported their total outlier estimate. Human subjects then evaluated statements about the agent's behavior, reliability, and their trust in the agent.
Results 80 subjects were recruited. Self-reported trust was modeled using Ordinary Least Squares, but the group that interacted with varying capability agents on a short time order produced a better performing ARIMAX model. Models were cross-validated between groups and found a moderate improvement in the next trial trust prediction.
Conclusion A time series modeling approach reveals the effects of temporal ordering of agent performance on estimated trust. Recency bias may affect how subjects weigh the contribution of strategy or capability to trust. Understanding the connections between agent behavior, agent performance, and human trust is crucial to improving human-robot collaborative tasks.
Application The modeling approach in this study demonstrates the need to represent autonomous agent characteristics over time to capture changes in human trust.

[163]  arXiv:2404.19292 [pdf, other]
Title: Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.

[164]  arXiv:2404.19294 [pdf, other]
Title: Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The main function of depth completion is to compensate for an insufficient and unpredictable number of sparse depth measurements of hardware sensors. However, existing research on depth completion assumes that the sparsity -- the number of points or LiDAR lines -- is fixed for training and testing. Hence, the completion performance drops severely when the number of sparse depths changes significantly. To address this issue, we propose the sparsity-adaptive depth refinement (SDR) framework, which refines monocular depth estimates using sparse depth points. For SDR, we propose the masked spatial propagation network (MSPN) to perform SDR with a varying number of sparse depths effectively by gradually propagating sparse depth information throughout the entire depth map. Experimental results demonstrate that MPSN achieves state-of-the-art performance on both SDR and conventional depth completion scenarios.

[165]  arXiv:2404.19296 [pdf, other]
Title: Octopus v4: Graph of language models
Authors: Wei Chen, Zhiyuan Li
Subjects: Computation and Language (cs.CL)

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{https://www.nexa4ai.com/}) to try Octopus v4 models (\url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.

[166]  arXiv:2404.19299 [pdf, other]
Title: Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pedestrian detection is a crucial field of computer vision research which can be adopted in various real-world applications (e.g., self-driving systems). However, despite noticeable evolution of pedestrian detection, pedestrian representations learned within a detection framework are usually limited to particular scene data in which they were trained. Therefore, in this paper, we propose a novel approach to construct versatile pedestrian knowledge bank containing representative pedestrian knowledge which can be applicable to various detection frameworks and adopted in diverse scenes. We extract generalized pedestrian knowledge from a large-scale pretrained model, and we curate them by quantizing most representative features and guiding them to be distinguishable from background scenes. Finally, we construct versatile pedestrian knowledge bank which is composed of such representations, and then we leverage it to complement and enhance pedestrian features within a pedestrian detection framework. Through comprehensive experiments, we validate the effectiveness of our method, demonstrating its versatility and outperforming state-of-the-art detection performances.

[167]  arXiv:2404.19303 [pdf, ps, other]
Title: Data Set Terminology of Artificial Intelligence in Medicine: A Historical Review and Recommendation
Comments: Totally 20 pages, 3 figures, 3 tables
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Medicine and artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. With such history comes a set of terminology that has a specific way in which it is applied. However, when two distinct fields with overlapping terminology start to collaborate, miscommunication and misunderstandings can occur. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. Then the data sets used for AI evaluation are classified, namely random splitting, cross-validation, temporal, geographic, internal, and external sets. The accurate and standardized description of these data sets is crucial for demonstrating the robustness and generalizability of AI applications in medicine. This review clarifies existing literature to provide a comprehensive understanding of these classifications and their implications in AI evaluation. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion. Among these solutions are the use of standardized terminology such as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit definition of data set splitting terminologies in each medical AI research publication. This review aspires to enhance the precision of communication in medical AI, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.

[168]  arXiv:2404.19306 [pdf, ps, other]
Title: Comprehensive Forecasting-Based Analysis of Hybrid and Stacked Stateful/ Stateless Models
Authors: Swayamjit Saha
Comments: 8 pages, 14 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Wind speed is a powerful source of renewable energy, which can be used as an alternative to the non-renewable resources for production of electricity. Renewable sources are clean, infinite and do not impact the environment negatively during production of electrical energy. However, while eliciting electrical energy from renewable resources viz. solar irradiance, wind speed, hydro should require special planning failing which may result in huge loss of labour and money for setting up the system. In this paper, we discuss four deep recurrent neural networks viz. Stacked Stateless LSTM, Stacked Stateless GRU, Stacked Stateful LSTM and Statcked Stateful GRU which will be used to predict wind speed on a short-term basis for the airport sites beside two campuses of Mississippi State University. The paper does a comprehensive analysis of the performance of the models used describing their architectures and how efficiently they elicit the results with the help of RMSE values. A detailed description of the time and space complexities of the above models has also been discussed.

[169]  arXiv:2404.19307 [pdf, other]
Title: Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repetitive exploration of a few GUI pages. To address this, we utilize Android's deep links, which assist in triggering Android intents to lead users to specific pages and introduce a deep link-enhanced exploration method. This approach, integrated into the testing tool Monkey, gives rise to Delm (Deep Link-enhanced Monkey). Delm oversees the dynamic exploration process, guiding the tool out of meaningless testing loops to unexplored GUI pages. We provide a rigorous activity context mock-up approach for triggering existing Android intents to discover more activities with hidden entrances. We conduct experiments to evaluate Delm's effectiveness on activity context mock-up, activity coverage, method coverage, and crash detection. The findings reveal that Delm can mock up more complex activity contexts and significantly outperform state-of-the-art baselines with 27.2\% activity coverage, 21.13\% method coverage, and 23.81\% crash detection.

[170]  arXiv:2404.19310 [pdf, other]
Title: Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Comments: Accepted to VarDial 2024 (the eleventh Workshop on NLP for Similar Languages, Varieties and Dialects 2024), Mexico City
Subjects: Computation and Language (cs.CL)

Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Sch\"onberger et al., 2021), STT4SG-350 (Pl\"uss et al., 2023), and Swiss Parliaments Corpus (Pl\"uss et al., 2021). In addition, we create a new test set for this work, based on short mock clinical interviews.
For automatic evaluation, we used word error rate (WER) and BLEU. In the qualitative analysis, we discuss Whisper's strengths and weaknesses and anylyze some output examples. For the human evaluation, we conducted a survey with 28 participants who were asked to evaluate Whisper's performance.
All of our evaluations suggest that Whisper is a viable ASR system for Swiss German, so long as the Standard German output is desired.

[171]  arXiv:2404.19311 [pdf, other]
Title: A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images
Comments: accepted by Information Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Matching visible and near-infrared (NIR) images remains a significant challenge in remote sensing image fusion. The nonlinear radiometric differences between heterogeneous remote sensing images make the image matching task even more difficult. Deep learning has gained substantial attention in computer vision tasks in recent years. However, many methods rely on supervised learning and necessitate large amounts of annotated data. Nevertheless, annotated data is frequently limited in the field of remote sensing image matching. To address this challenge, this paper proposes a novel keypoint descriptor approach that obtains robust feature descriptors via a self-supervised matching network. A light-weight transformer network, termed as LTFormer, is designed to generate deep-level feature descriptors. Furthermore, we implement an innovative triplet loss function, LT Loss, to enhance the matching performance further. Our approach outperforms conventional hand-crafted local feature descriptors and proves equally competitive compared to state-of-the-art deep learning-based methods, even amidst the shortage of annotated data.

[172]  arXiv:2404.19314 [pdf, other]
Title: Alternative paths computation for congestion mitigation in segment-routing networks
Comments: 6 pages
Subjects: Discrete Mathematics (cs.DM); Networking and Internet Architecture (cs.NI)

In backbone networks, it is fundamental to quickly protect traffic against any unexpected event, such as failures or congestions, which may impact Quality of Service (QoS). Standard solutions based on Segment Routing (SR), such as Topology-Independent Loop-Free Alternate (TI-LFA), are used in practice to handle failures, but no distributed solutions exist for distributed and tactical congestion mitigation. A promising approach leveraging SR has been recently proposed to quickly steer traffic away from congested links over alternative paths. As the pre-computation of alternative paths plays a paramount role to efficiently mitigating congestions, we investigate the associated path computation problem aiming at maximizing the amount of traffic that can be rerouted as well as the resilience against any 1-link failure. In particular, we focus on two variants of this problem. First, we maximize the residual flow after all possible failures. We show that the problem is NP-Hard, and we solve it via a Benders decomposition algorithm. Then, to provide a practical and scalable solution, we solve a relaxed variant problem, that maximizes, instead of flow, the number of surviving alternative paths after all possible failures. We provide a polynomial algorithm. Through numerical experiments, we compare the two variants and show that they allow to increase the amount of rerouted traffic and the resiliency of the network after any 1-link failure.

[173]  arXiv:2404.19315 [pdf, other]
Title: Modeling Orthographic Variation in Occitan's Dialects
Authors: Zachary William Hopton (Language and Space Lab, University of Zurich), Noëmi Aepli (Department of Computational Linguistics, University of Zurich)
Comments: Accepted at VarDial 2024: The Eleventh Workshop on NLP for Similar Languages, Varieties and Dialects
Subjects: Computation and Language (cs.CL)

Effectively normalizing textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects. Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging and Universal Dependency parsing, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.

[174]  arXiv:2404.19316 [pdf, other]
Title: QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Computation and Language (cs.CL)

Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs. Our work introduces a novel approach, called the ``Query Latent Semantic Calibrator (QLSC)'', designed as an auxiliary module for existing MRC models. We propose a unique scaling strategy to capture latent semantic center features of queries. These features are then seamlessly integrated into traditional query and passage embeddings using an attention mechanism. By deepening the comprehension of the semantic queries-passage relationship, our approach diminishes sensitivity to variations in text format and boosts the model's capability in pinpointing accurate answers. Experimental results on robust Question-Answer datasets confirm that our approach effectively handles format-variant but semantically identical queries, highlighting the effectiveness and adaptability of our proposed method.

[175]  arXiv:2404.19317 [pdf, other]
Title: Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM, RIMES, and NorHand v2 - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.

[176]  arXiv:2404.19318 [pdf, other]
Title: Enhancing Trust in LLM-Generated Code Summaries with Calibrated Confidence Scores
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)

A good summary can often be very useful during program comprehension. While a brief, fluent, and relevant summary can be helpful, it does require significant human effort to produce. Often, good summaries are unavailable in software projects, thus making maintenance more difficult. There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced. Measures such as BERTScore and BLEU have been suggested and evaluated with human-subject studies.
However, LLMs often err and generate something quite unlike what a human might say. Given an LLM-produced code summary, is there a way to gauge whether it's likely to be sufficiently similar to a human produced summary, or not? In this paper, we study this question, as a calibration problem: given a summary from an LLM, can we compute a confidence measure, which is a good indication of whether the summary is sufficiently similar to what a human would have produced in this situation? We examine this question using several LLMs, for several languages, and in several different settings. We suggest an approach which provides well-calibrated predictions of likelihood of similarity to human summaries.

[177]  arXiv:2404.19319 [pdf, other]
Title: Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Comments: Accepted to the 5th Workshop on Insights from Negative Results in NLP at NAACL 2024
Subjects: Computation and Language (cs.CL)

Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. To test this, we compare pretraining from scratch against several KD strategies for masked language modeling (MLM) in a fair experimental setup, with respect to amount of computation as well as pretraining data. Downstream results on GLUE, however, do not confirm our hypothesis: while pretraining from scratch performs comparably to ordinary KD under a fixed computation budget, more sophisticated KD strategies, namely TinyBERT (Jiao et al., 2020) and MiniLM (Wang et al., 2023), outperform it by a notable margin. We further find that KD yields larger gains over pretraining from scratch when the data must be repeated under the fixed computation budget.

[178]  arXiv:2404.19326 [pdf, other]
Title: LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation
Comments: LVOS V2
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets restricts further investigation of VOS in realistic scenarios. Thus, we propose a novel benchmark named LVOS, comprising 720 videos with 296,401 frames and 407,945 high-quality annotations. Videos in LVOS last 1.14 minutes on average, approximately 5 times longer than videos in existing datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objects. Compared to previous benchmarks, our LVOS better reflects VOS models' performance in real scenarios. Based on LVOS, we evaluate 20 existing VOS models under 4 different settings and conduct a comprehensive analysis. On LVOS, these models suffer a large performance drop, highlighting the challenge of achieving precise tracking and segmentation in real-world scenarios. Attribute-based analysis indicates that key factor to accuracy decline is the increased video length, emphasizing LVOS's crucial role. We hope our LVOS can advance development of VOS in real scenes. Data and code are available at https://lingyihongfd.github.io/lvos.github.io/.

[179]  arXiv:2404.19328 [pdf, other]
Title: Computational Approaches for Integrating out Subjectivity in Cognate Synonym Selection
Comments: Experiments available on GitHub (this https URL, this https URL)
Subjects: Computation and Language (cs.CL); Populations and Evolution (q-bio.PE)

Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.

[180]  arXiv:2404.19329 [pdf, other]
Title: End-to-end information extraction in handwritten documents: Understanding Paris marriage records from 1880 to 1940
Comments: To be published in: International Conference on Document Analysis and Recognition - ICDAR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The EXO-POPP project aims to establish a comprehensive database comprising 300,000 marriage records from Paris and its suburbs, spanning the years 1880 to 1940, which are preserved in over 130,000 scans of double pages. Each marriage record may encompass up to 118 distinct types of information that require extraction from plain text. In this paper, we introduce the M-POPP dataset, a subset of the M-POPP database with annotations for full-page text recognition and information extraction in both handwritten and printed documents, and which is now publicly available. We present a fully end-to-end architecture adapted from the DAN, designed to perform both handwritten text recognition and information extraction directly from page images without the need for explicit segmentation. We showcase the information extraction capabilities of this architecture by achieving a new state of the art for full-page Information Extraction on Esposalles and we use this architecture as a baseline for the M-POPP dataset. We also assess and compare how different encoding strategies for named entities in the text affect the performance of jointly recognizing handwritten text and extracting information, from full pages.

[181]  arXiv:2404.19330 [pdf, other]
Title: G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction
Comments: Accepted by IJCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in kinematically infeasible predictions. To address these issues, in this paper, we propose G2LTraj, a plug-and-play global-to-local generation approach for trajectory prediction. Specifically, we generate a series of global key steps that uniformly cover the entire future time range. Subsequently, the local intermediate steps between the adjacent key steps are recursively filled in. In this way, we prevent the accumulated error from propagating beyond the adjacent key steps. Moreover, to boost the kinematical feasibility, we not only introduce the spatial constraints among key steps but also strengthen the temporal constraints among the intermediate steps. Finally, to ensure the optimal granularity of key steps, we design a selectable granularity strategy that caters to each predicted trajectory. Our G2LTraj significantly improves the performance of seven existing trajectory predictors across the ETH, UCY and nuScenes datasets. Experimental results demonstrate its effectiveness. Code will be available at https://github.com/Zhanwei-Z/G2LTraj.

[182]  arXiv:2404.19331 [pdf, other]
Title: Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Subjects: Performance (cs.PF); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise and pointwise convolutions to overcome the memory access bottleneck. The focus is on fusing these operators on GPUs. The prior art on GPU-based fusion suffers from one or more of the following: (1) fusing either a convolution with an element-wise or multiple non-convolutional operators, (2) not explicitly optimizing for memory accesses, (3) not supporting depthwise convolutions. This paper proposes Fused Convolutional Modules (FCMs), a set of novel fused depthwise and pointwise GPU kernels. FCMs significantly reduce pointwise and depthwise convolutions memory accesses, improving execution time and energy efficiency. To evaluate the trade-offs associated with fusion and determine which convolutions are beneficial to fuse and the optimal FCM parameters, we propose FusePlanner. FusePlanner consists of cost models to estimate the memory accesses of depthwise, pointwise, and FCM kernels given GPU characteristics. Our experiments on three GPUs using representative CNNs and ViTs demonstrate that FCMs save up to 83% of the memory accesses and achieve speedups of up to 3.7x compared to cuDNN. Complete model implementations of various CNNs using our modules outperform TVMs' achieving speedups of up to 1.8x and saving up to two-thirds of the energy.

[183]  arXiv:2404.19334 [pdf, other]
Title: Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images
Comments: 9 pages, 6 figures, accepted by ICME2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between different types of biological entities(\textit{e.g.}, cell cluster and tissue block) across multiple scales, while such interactions are crucial for patient survival prediction. In light of this, we propose a novel multi-scale heterogeneity-aware hypergraph representation framework. Specifically, our framework first constructs a multi-scale heterogeneity-aware hypergraph and assigns each node with its biological entity type. It then mines diverse interactions between nodes on the graph structure to obtain a global representation. Experimental results demonstrate that our method outperforms state-of-the-art approaches on three benchmark datasets. Code is publicly available at \href{https://github.com/Hanminghao/H2GT}{https://github.com/Hanminghao/H2GT}.

[184]  arXiv:2404.19335 [pdf, other]
Title: StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation
Comments: Submitted to ACL 2024
Subjects: Computation and Language (cs.CL)

Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 7.20% in accuracy and reduces the standard deviation by 2.02 on average. Furthermore, extensive experiments underscore its robustness and stability across 7 datasets covering various tasks.

[185]  arXiv:2404.19336 [pdf, ps, other]
Title: Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts
Comments: 12 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

LLMs trained in the understanding of programming syntax are now providing effective assistance to developers and are being used in programming education such as in generation of coding problem examples or providing code explanations. A key aspect of programming education is understanding and dealing with error message. However, 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. In this study, building on existing research on programming errors, we first define the types of logical errors that can occur in programming in general. Based on the definition, we propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts. The experimental results indicate that when such logical error descriptions in the prompt are used, the average classifition performance is about 21% higher than the ones without them. We also conducted an experiment for exploiting the relations among errors in generating a new logical error dataset using LLMs. As there is very limited dataset for logical errors such benchmark dataset can be very useful for various programming related applications. We expect that our work can assist novice programmers in identifying the causes of code errors and correct them more effectively.

[186]  arXiv:2404.19337 [pdf, other]
Title: Design of a Representation Information Repository for the Long-Term Usability of Digital Building Documents
Comments: 14 pages, 4 figures
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY)

The long-term usability of digital building documents is essential for the maintenance and optimization of infrastructure portfolios. It supports the preservation of building-specific knowledge and the cultural heritage hidden within. However, having to do this throughout the lifecycle of a building - or even indefinitely - remains a major challenge. This is especially true for organizations responsible for large collections of digital building documents, such as public administrations or archives. In this article, we first describe the challenges and requirements associated with preservation tasks, and then introduce the concept of so-called representation information within BIM (Building Information Modeling). This type of information is important to give meaning to the stored bit sequences for a particular community. Then, we design a repository for representation information and introduce some so-called 23 BIMcore content elements. Finally, we focus on BIM and the construction sector and explain how the proposed repository can be used to implement the two concepts introduced in the ISO reference model OAIS (Open Archival Information System), namely the representation information and the context information, as well as the concept of significant properties, which has not yet been explicitly modeled in OAIS.

[187]  arXiv:2404.19341 [pdf, other]
Title: Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods.

[188]  arXiv:2404.19346 [pdf, other]
Title: Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
Comments: Accepted by Artificial Intelligence (AIJ)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems. See https://github.com/Baichenjia/UTDS for the datasets and code.

[189]  arXiv:2404.19349 [pdf, other]
Title: Human-AI Interaction in Industrial Robotics: Design and Empirical Evaluation of a User Interface for Explainable AI-Based Robot Program Optimization
Comments: 6 pages, 4 figures, accepted at the 2024 CIRP International Conference on Manufacturing Systems (CMS)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

While recent advances in deep learning have demonstrated its transformative potential, its adoption for real-world manufacturing applications remains limited. We present an Explanation User Interface (XUI) for a state-of-the-art deep learning-based robot program optimizer which provides both naive and expert users with different user experiences depending on their skill level, as well as Explainable AI (XAI) features to facilitate the application of deep learning methods in real-world applications. To evaluate the impact of the XUI on task performance, user satisfaction and cognitive load, we present the results of a preliminary user survey and propose a study design for a large-scale follow-up study.

[190]  arXiv:2404.19350 [pdf, ps, other]
Title: Transform Dialect Tutorial
Subjects: Programming Languages (cs.PL)

Transform Dialect in MLIR provides operations that can be used to control transformation of the Intermediate Representation (IR) using a different portion of the IR. It refers to the IR being transformed as payload IR, and to the IR guiding the transformation as transform IR.
The main use case for this dialect is orchestrating fine-grain transformations on individual IR objects (operations or values) or sets thereof. For example, it may involve finding loop-like operations with specific properties (e.g., large size) in the payload IR, applying loop tiling to those and only those operations, and then applying loop unrolling to the inner loops produced by the previous transformations. As such, it is not intended as a replacement for the pass infrastructure, nor for the pattern rewriting infrastructure. In the most common case, the transform IR will be processed and applied to the payload IR by a pass. Transformations expressed by the Transform dialect may be implemented using the pattern infrastructure or any other relevant MLIR component.
The rest of this document explains the main concepts and usage scenario of the MLIR Transform Dialect combined with structured operations.

[191]  arXiv:2404.19354 [pdf, other]
Title: PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Authors: Lucas Grativol Ribeiro (IMT Atlantique - MEE, Lab\_STICC\_BRAIn, Lab-STICC\_2AI, LHC), Lubin Gauthier (Lab\_STICC\_BRAIn, IMT Atlantique - MEE), Mathieu Leonardon (IMT Atlantique - MEE, Lab\_STICC\_BRAIn), Jérémy Morlier (IMT Atlantique - MEE, Lab\_STICC\_BRAIn), Antoine Lavrard-Meyer (IMT Atlantique), Guillaume Muller (Mines Saint-Étienne MSE, FAYOL-ENSMSE, FAYOL-ENSMSE), Virginie Fresse (LHC, TSE), Matthieu Arzel (IMT Atlantique - MEE, Lab-STICC\_2AI)
Journal-ref: ISCAS 2024 : IEEE International Symposium on Circuits and Systems, May 2024, Singapore, Singapore
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

This paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, a vital approach for adapting to diverse classification tasks, especially when the costs of data acquisition or labeling prove to be prohibitively high. Our contributions encompass the development of an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA SoCs. The pipeline is built on top of the Tensil open-source framework, facilitating the design, training, evaluation, and deployment of DNN backbones tailored for few-shot learning. Additionally, we showcase our work's potential by building and deploying a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture. The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.

[192]  arXiv:2404.19356 [pdf, ps, other]
Title: A Concept for Semi-Automatic Configuration of Sufficiently Valid Simulation Setups for Automated Driving Systems
Comments: 8 pages, 3 figures. Submitted for publication
Subjects: Systems and Control (eess.SY)

As simulation is increasingly used in scenario-based approaches to test Automated Driving Systems, the credibility of simulation results is a major concern. Arguably, credibility depends on the validity of the simulation setup and simulation models. When selecting appropriate simulation models, a trade-off must be made between validity, often connected to the model's fidelity, and cost of computation. However, due to the large number of test cases, expert-based methods to create sufficiently valid simulation setups seem infeasible. We propose using design contracts in order to semi-automatically compose simulation setups for given test cases from simulation models and to derive requirements for the simulation models, supporting separation of concerns between simulation model developers and users. Simulation model contracts represent their validity domains by capturing a validity guarantee and the associated operating conditions in an assumption. We then require the composition of the simulation model contracts to refine a test case contract. The latter contract captures the operating conditions of the test case in its assumption and validity requirements in its guarantee. Based on this idea, we present a framework that supports the compositional configuration of simulation setups based on the contracts and a method to derive runtime monitors for these simulation setups.

[193]  arXiv:2404.19357 [pdf, other]
Title: Interest Clock: Time Perception in Real-Time Streaming Recommendation System
Comments: Accepted by SIGIR 2024
Subjects: Information Retrieval (cs.IR)

User preferences follow a dynamic pattern over a day, e.g., at 8 am, a user might prefer to read news, while at 8 pm, they might prefer to watch movies. Time modeling aims to enable recommendation systems to perceive time changes to capture users' dynamic preferences over time, which is an important and challenging problem in recommendation systems. Especially, streaming recommendation systems in the industry, with only available samples of the current moment, present greater challenges for time modeling. There is still a lack of effective time modeling methods for streaming recommendation systems. In this paper, we propose an effective and universal method Interest Clock to perceive time information in recommendation systems. Interest Clock first encodes users' time-aware preferences into a clock (hour-level personalized features) and then uses Gaussian distribution to smooth and aggregate them into the final interest clock embedding according to the current time for the final prediction. By arming base models with Interest Clock, we conduct online A/B tests, obtaining +0.509% and +0.758% improvements on user active days and app duration respectively. Besides, the extended offline experiments show improvements as well. Interest Clock has been deployed on Douyin Music App.

[194]  arXiv:2404.19358 [pdf, other]
Title: QML-IB: Quantized Collaborative Intelligence between Multiple Devices and the Mobile Network
Subjects: Information Theory (cs.IT)

The integration of artificial intelligence (AI) and mobile networks is regarded as one of the most important scenarios for 6G. In 6G, a major objective is to realize the efficient transmission of task-relevant data. Then a key problem arises, how to design collaborative AI models for the device side and the network side, so that the transmitted data between the device and the network is efficient enough, which means the transmission overhead is low but the AI task result is accurate. In this paper, we propose the multi-link information bottleneck (ML-IB) scheme for such collaborative models design. We formulate our problem based on a novel performance metric, which can evaluate both task accuracy and transmission overhead. Then we introduce a quantizer that is adjustable in the quantization bit depth, amplitudes, and breakpoints. Given the infeasibility of calculating our proposed metric on high-dimensional data, we establish a variational upper bound for this metric. However, due to the incorporation of quantization, the closed form of the variational upper bound remains uncomputable. Hence, we employ the Log-Sum Inequality to derive an approximation and provide a theoretical guarantee. Based on this, we devise the quantized multi-link information bottleneck (QML-IB) algorithm for collaborative AI models generation. Finally, numerical experiments demonstrate the superior performance of our QML-IB algorithm compared to the state-of-the-art algorithm.

[195]  arXiv:2404.19359 [pdf, other]
Title: Evaluating Lexicon Incorporation for Depression Symptom Estimation
Comments: Accepted to Clinical NLP workshop at NAACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation. Lexicon information is added by marking the words in the input transcripts of patient-therapist conversations as well as in social media posts. Overall results show that the introduction of external knowledge within pre-trained language models can be beneficial for prediction performance, while different lexicons show distinct behaviours depending on the targeted task. Additionally, new state-of-the-art results are obtained for the estimation of depression level over patient-therapist interviews.

[196]  arXiv:2404.19360 [pdf, other]
Title: Large Language Model Informed Patent Image Retrieval
Comments: 8 pages. Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.

[197]  arXiv:2404.19361 [pdf, ps, other]
Title: A Negotiator's Backup Plan: Optimal Concessions with a Reservation Value
Comments: Accepted at AAMAS 2024
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

Automated negotiation is a well-known mechanism for autonomous agents to reach agreements. To realize beneficial agreements quickly, it is key to employ a good bidding strategy. When a negotiating agent has a good back-up plan, i.e., a high reservation value, failing to reach an agreement is not necessarily disadvantageous. Thus, the agent can adopt a risk-seeking strategy, aiming for outcomes with a higher utilities.
Accordingly, this paper develops an optimal bidding strategy called MIA-RVelous for bilateral negotiations with private reservation values. The proposed greedy algorithm finds the optimal bid sequence given the agent's beliefs about the opponent in $O(n^2D)$ time, with $D$ the maximum number of rounds and $n$ the number of outcomes. The results obtained here can pave the way to realizing effective concurrent negotiations, given that concurrent negotiations can serve as a (probabilistic) backup plan.

[198]  arXiv:2404.19363 [pdf, other]
Title: Expressivity and Speech Synthesis
Comments: Invited contribution. Under review
Subjects: Computation and Language (cs.CL)

Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many years of research, it appears that we are on the cusp of achieving this when it comes to single, isolated utterances. This unveils an abundance of potential avenues to explore when it comes to combining these single utterances with the aim of synthesising more complex, longer-term behaviours. In the present chapter, we outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity. We also discuss the societal implications coupled with rapidly advancing expressive speech synthesis (ESS) technology and highlight ways to mitigate those risks and ensure the alignment of ESS capabilities with ethical norms.

[199]  arXiv:2404.19364 [pdf, other]
Title: Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models
Subjects: Computation and Language (cs.CL)

Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, we conducted an analysis comparing encoding performance of various neural language models and psychologically plausible models. Our study utilized extensive multi-modal cognitive datasets, examining bilingual word and discourse levels. Surprisingly, our findings revealed that psychologically plausible models outperformed neural language models across diverse contexts, encompassing different modalities such as fMRI and eye-tracking, and spanning languages from English to Chinese. Among psychologically plausible models, the one incorporating embodied information emerged as particularly exceptional. This model demonstrated superior performance at both word and discourse levels, exhibiting robust prediction of brain activation across numerous regions in both English and Chinese.

[200]  arXiv:2404.19368 [pdf, other]
Title: Exploring Multi-Lingual Bias of Large Code Models in Code Generation
Comments: 12 pages
Subjects: Software Engineering (cs.SE)

Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despite the effectiveness, we observe a noticeable multilingual bias in the generation performance of LCMs. Specifically, LCMs demonstrate proficiency in generating solutions when provided with instructions in English, yet may falter when faced with semantically equivalent instructions in other NLs such as Chinese. Moreover, the ability of LCMs to generate code exhibits variety across different programming languages (PLs), such as Python and C++. The observed phenomenon indicates the presence of multi-lingual bias within the generative capabilities of LCMs, which has remained unexplored.
In this paper, we aim to investigate the multi-lingual bias that exists in current LCMs. First, we initiate our investigation by constructing the first multi-lingual evaluation benchmark X-HumanEval-X, enabling us to systematically evaluate the extent of multi-lingual bias that exists in current LCMs. In our large-scale experiments on nine popular LCMs, we observe a pronounced multi-lingual bias of LCMs in code generation, including multi-NL and multi-PL bias. Specifically, when using Chinese instructions, the code generation capabilities of LCMs decrease by at least 13% in terms of the Pass@1 metric. Furthermore, LCMs perform variously across different programming languages, e.g., the performance gap between Python and C++ reaches as high as 20.9%. ...

[201]  arXiv:2404.19369 [pdf, ps, other]
Title: Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini
Subjects: Computation and Language (cs.CL)

The growing prominence of large language models (LLMs) necessitates the exploration of their capabilities beyond English. This research investigates the Telugu language proficiency of ChatGPT and Gemini, two leading LLMs. Through a designed set of 20 questions encompassing greetings, grammar, vocabulary, common phrases, task completion, and situational reasoning, the study delves into their strengths and weaknesses in handling Telugu. The analysis aims to identify the LLM that demonstrates a deeper understanding of Telugu grammatical structures, possesses a broader vocabulary, and exhibits superior performance in tasks like writing and reasoning. By comparing their ability to comprehend and use everyday Telugu expressions, the research sheds light on their suitability for real-world language interaction. Furthermore, the evaluation of adaptability and reasoning capabilities provides insights into how each LLM leverages Telugu to respond to dynamic situations. This comparative analysis contributes to the ongoing discussion on multilingual capabilities in AI and paves the way for future research in developing LLMs that can seamlessly integrate with Telugu-speaking communities.

[202]  arXiv:2404.19370 [pdf, other]
Title: Numeric Reward Machines
Comments: ICAPS 2024; Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reward machines inform reinforcement learning agents about the reward structure of the environment and often drastically speed up the learning process. However, reward machines only accept Boolean features such as robot-reached-gold. Consequently, many inherently numeric tasks cannot profit from the guidance offered by reward machines. To address this gap, we aim to extend reward machines with numeric features such as distance-to-gold. For this, we present two types of reward machines: numeric-Boolean and numeric. In a numeric-Boolean reward machine, distance-to-gold is emulated by two Boolean features distance-to-gold-decreased and robot-reached-gold. In a numeric reward machine, distance-to-gold is used directly alongside the Boolean feature robot-reached-gold. We compare our new approaches to a baseline reward machine in the Craft domain, where the numeric feature is the agent-to-target distance. We use cross-product Q-learning, Q-learning with counter-factual experiences, and the options framework for learning. Our experimental results show that our new approaches significantly outperform the baseline approach. Extending reward machines with numeric features opens up new possibilities of using reward machines in inherently numeric tasks.

[203]  arXiv:2404.19371 [pdf, ps, other]
Title: Fairness in AI: challenges in bridging the gap between algorithms and law
Comments: Preprint. Accepted in Fairness in AI Workshop @ ICDE 2024
Subjects: Computers and Society (cs.CY)

In this paper we examine algorithmic fairness from the perspective of law aiming to identify best practices and strategies for the specification and adoption of fairness definitions and algorithms in real-world systems and use cases. We start by providing a brief introduction of current anti-discrimination law in the European Union and the United States and discussing the concepts of bias and fairness from an legal and ethical viewpoint. We then proceed by presenting a set of algorithmic fairness definitions by example, aiming to communicate their objectives to non-technical audiences. Then, we introduce a set of core criteria that need to be taken into account when selecting a specific fairness definition for real-world use case applications. Finally, we enumerate a set of key considerations and best practices for the design and employment of fairness methods on real-world AI applications

[204]  arXiv:2404.19372 [pdf, other]
Title: AutoNet: Automatic Reachability Policy Management in Public Cloud Networks
Subjects: Networking and Internet Architecture (cs.NI)

Virtual Private Cloud (VPC) is the main network abstraction technology used in public cloud systems. VPCs are composed of a set of network services that permit the definition of complex network reachability properties among internal and external cloud entities such as tenants' VMs or some generic internet nodes. Although hiding the underlying complexity through a comprehensible abstraction layer, manually enforcing particular reachability intents in VPC networks is still notably error-prone and complex. In this paper, we propose AutoNet, a new model for assisting cloud tenants in managing reachability-based policies in VPC networks. AutoNet is capable of safely generating incremental VPC configurations while satisfying some metric-based high-level intent defined by the tenants. To achieve this goal, we leverage a MaxSAT-based encoding of the network configuration combined with several optimizations to scale to topologies with thousands of nodes. Our results show that the developed system is capable of achieving a sub-second response time for production VPC deployments while still providing fine-grained control over the generated configurations.

[205]  arXiv:2404.19379 [pdf, other]
Title: SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs
Comments: 8 pages, 6 figures, submitted to RA-L
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Trajectory prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene including traffic participants, road topology, traffic signs as well as their semantic relations to each other. Despite increased attention to this issue, most approaches in trajectory prediction do not consider all of these factors sufficiently. This paper describes a method SemanticFormer to predict multimodal trajectories by reasoning over a semantic traffic scene graph using a hybrid approach. We extract high-level information in the form of semantic meta-paths from a knowledge graph which is then processed by a novel pipeline based on multiple attention mechanisms to predict accurate trajectories. The proposed architecture comprises a hierarchical heterogeneous graph encoder, which can capture spatio-temporal and relational information across agents and between agents and road elements, and a predictor that fuses the different encodings and decodes trajectories with probabilities. Finally, a refinement module evaluates permitted meta-paths of trajectories and speed profiles to obtain final predicted trajectories. Evaluation of the nuScenes benchmark demonstrates improved performance compared to the state-of-the-art methods.

[206]  arXiv:2404.19381 [pdf, other]
Title: Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
Subjects: Hardware Architecture (cs.AR)

To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overcome such limitations, prior works propose application-specific HW units that are not suitable for practical CXL memory-based systems that should support various applications. On the other hand, existing CPU or GPU cores are not cost-effective for NDP because they are not optimized for memory-bound applications. In addition, the communication between the host processor and CXL controller for NDP offloading should achieve low latency, but the CXL$.$io (or PCIe) protocol incurs $\mu$s-scale latency and is not suitable for fine-grain NDP.
To achieve high-performance NDP end-to-end, we propose a low-overhead general-purpose NDP architecture for CXL memory referred to as Memory-Mapped NDP (M$^2$NDP), which comprises memory-mapped functions (M$^2$func) and memory-mapped $\mu$threading (M$^2\mu$thr). The M$^2$func is a CXL.mem-compatible low-overhead communication mechanism between the host processor and NDP controller in the CXL memory. The M$^2\mu$thr enables low-cost, general-purpose NDP unit design by introducing lightweight $\mu$threads that support highly concurrent execution of NDP kernels with minimal resource wastage. By combining them, our M$^2$NDP achieves significant speedups for various applications, including in-memory OLAP, key-value store, large language model, recommendation model, and graph analytics by up to 128$\times$ (11.5$\times$ overall) and reduces energy by up to 87.9\% (80.1\% overall) compared to a baseline CPU or GPU host with passive CXL memory.

[207]  arXiv:2404.19382 [pdf, other]
Title: Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image mapping and preserve the visual content within the generative space of diffusion models, leaving a fatal flaw for restoring these erased concepts. This erasure trustworthiness problem needs probe, but previous methods are sub-optimal from two perspectives: (1) Lack of transferability: Some methods operate within a white-box setting, requiring access to the unlearned model. And the learned adversarial input often fails to transfer to other unlearned models for concept restoration; (2) Limited attack: The prompt-level methods struggle to restore narrow concepts from unlearned models, such as celebrity identity. Therefore, this paper aims to leverage the transferability of the adversarial attack to probe the unlearning robustness under a black-box setting. This challenging scenario assumes that the unlearning method is unknown and the unlearned model is inaccessible for optimization, requiring the attack to be capable of transferring across different unlearned models. Specifically, we employ an adversarial search strategy to search for the adversarial embedding which can transfer across different unlearned models. This strategy adopts the original Stable Diffusion model as a surrogate model to iteratively erase and search for embeddings, enabling it to find the embedding that can restore the target concept for different unlearning methods. Extensive experiments demonstrate the transferability of the searched adversarial embedding across several state-of-the-art unlearning methods and its effectiveness for different levels of concepts.

[208]  arXiv:2404.19383 [pdf, other]
Title: Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others.

[209]  arXiv:2404.19384 [pdf, other]
Title: Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Comments: Accepted by CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previous techniques mitigate this by reweighting these boxes as pseudo labels, but these boxes can still poison the training process. To resolve this problem, in this paper, we propose a novel pseudo label refinery framework. Specifically, in the selection process, to improve the reliability of pseudo boxes, we propose a complementary augmentation strategy. This strategy involves either removing all points within an unreliable box or replacing it with a high-confidence box. Moreover, the point numbers of instances in high-beam datasets are considerably higher than those in low-beam datasets, also degrading the quality of pseudo labels during the training process. We alleviate this issue by generating additional proposals and aligning RoI features across different domains. Experimental results demonstrate that our method effectively enhances the quality of pseudo labels and consistently surpasses the state-of-the-art methods on six autonomous driving benchmarks. Code will be available at https://github.com/Zhanwei-Z/PERE.

[210]  arXiv:2404.19387 [pdf, other]
Title: Online Electricity Purchase for Data Center with Dynamic Virtual Battery from Flexibility Aggregation
Subjects: Systems and Control (eess.SY)

As a critical component of modern infrastructure, data centers account for a huge amount of power consumption and greenhouse gas emission. This paper studies the electricity purchase strategy for a data center to lower its energy cost while integrating local renewable generation under uncertainty. To facilitate efficient and scalable decision-making, we propose a two-layer hierarchy where the lower layer consists of the operation of all electrical equipment in the data center and the upper layer determines the procurement and dispatch of electricity. At the lower layer, instead of device-level scheduling in real time, we propose to exploit the inherent flexibility in demand, such as thermostatically controlled loads and flexible computing tasks, and aggregate them into virtual batteries. By this means, the upper-layer decision only needs to take into account these virtual batteries, the size of which is generally small and independent of the data center scale. We further propose an online algorithm based on Lyapunov optimization to purchase electricity from the grid with a manageable energy cost, even though the prices, renewable availability, and battery specifications are uncertain and dynamic. In particular, we show that, under mild conditions, our algorithm can achieve bounded loss compared with the offline optimal cost, while strictly respecting battery operational constraints. Extensive simulation studies validate the theoretical analysis and illustrate the tradeoff between optimality and conservativeness.

[211]  arXiv:2404.19391 [pdf, other]
Title: ZSMILES: an approach for efficient SMILES storage for random access in Virtual Screening
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Virtual screening is a technique used in drug discovery to select the most promising molecules to test in a lab. To perform virtual screening, we need a large set of molecules as input, and storing these molecules can become an issue. In fact, extreme-scale high-throughput virtual screening applications require a big dataset of input molecules and produce an even bigger dataset as output. These molecules' databases occupy tens of TB of storage space, and domain experts frequently sample a small portion of this data. In this context, SMILES is a popular data format for storing large sets of molecules since it requires significantly less space to represent molecules than other formats (e.g., MOL2, SDF). This paper proposes an efficient dictionary-based approach to compress SMILES-based datasets. This approach takes advantage of domain knowledge to provide a readable output with separable SMILES, enabling random access. We examine the benefits of storing these datasets using ZSMILES to reduce the cold storage footprint in HPC systems. The main contributions concern a custom dictionary-based approach and a data pre-processing step. From experimental results, we can notice how ZSMILES leverage domain knowledge to compress x1.13 more than state of the art in similar scenarios and up to $0.29$ compression ratio. We tested a CUDA version of ZSMILES targetting NVIDIA's GPUs, showing a potential speedup of 7x.

[212]  arXiv:2404.19394 [pdf, other]
Title: CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

State space models and Mamba-based models have been increasingly applied across various domains, achieving state-of-the-art performance. This technical report introduces the first attempt to train a transferable Mamba model utilizing contrastive language-image pretraining (CLIP). We have trained Mamba models of varying sizes and undertaken comprehensive evaluations of these models on 26 zero-shot classification datasets and 16 out-of-distribution (OOD) datasets. Our findings reveal that a Mamba model with 67 million parameters is on par with a 307 million-parameter Vision Transformer (ViT) model in zero-shot classification tasks, highlighting the parameter efficiency of Mamba models. In tests of OOD generalization, Mamba-based models exhibit exceptional performance in conditions of OOD image contrast or when subjected to high-pass filtering. However, a Hessian analysis indicates that Mamba models feature a sharper and more non-convex landscape compared to ViT-based models, making them more challenging to train. The source code is available at https://github.com/raytrun/mamba-clip.

[213]  arXiv:2404.19397 [pdf, other]
Title: Can humans teach machines to code?
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The goal of inductive program synthesis is for a machine to automatically generate a program from user-supplied examples of the desired behaviour of the program. A key underlying assumption is that humans can provide examples of sufficient quality to teach a concept to a machine. However, as far as we are aware, this assumption lacks both empirical and theoretical support. To address this limitation, we explore the question `Can humans teach machines to code?'. To answer this question, we conduct a study where we ask humans to generate examples for six programming tasks, such as finding the maximum element of a list. We compare the performance of a program synthesis system trained on (i) human-provided examples, (ii) randomly sampled examples, and (iii) expert-provided examples. Our results show that, on most of the tasks, non-expert participants did not provide sufficient examples for a program synthesis system to learn an accurate program. Our results also show that non-experts need to provide more examples than both randomly sampled and expert-provided examples.

[214]  arXiv:2404.19398 [pdf, other]
Title: 3D Gaussian Blendshapes for Head Avatar Animation
Comments: ACM SIGGRAPH Conference Proceedings 2024
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

We introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain a few properties to depict the avatar appearance. The avatar model of an arbitrary expression can be effectively generated by combining the neutral model and expression blendshapes through linear blending of Gaussians with the expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details exhibited in input video, and achieves superior rendering performance.

[215]  arXiv:2404.19401 [pdf, other]
Title: UniFS: Universal Few-shot Instance Perception with Point Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of tasks, presumably due to the challenges involved in designing a generic model capable of representing diverse tasks in a unified manner. In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework. Additionally, we propose Structure-Aware Point Learning (SAPL) to exploit the higher-order structural relationship among points to further enhance representation learning. Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models. Codes will be released soon.

[216]  arXiv:2404.19402 [pdf, ps, other]
Title: Complexity of Round-Robin Allocation with Potentially Noisy Queries
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Information Theory (cs.IT)

We study the complexity of a fundamental algorithm for fairly allocating indivisible items, the round-robin algorithm. For $n$ agents and $m$ items, we show that the algorithm can be implemented in time $O(nm\log(m/n))$ in the worst case. If the agents' preferences are uniformly random, we establish an improved (expected) running time of $O(nm + m\log m)$. On the other hand, assuming comparison queries between items, we prove that $\Omega(nm + m\log m)$ queries are necessary to implement the algorithm, even when randomization is allowed. We also derive bounds in noise models where the answers to queries are incorrect with some probability. Our proofs involve novel applications of tools from multi-armed bandit, information theory, as well as posets and linear extensions.

[217]  arXiv:2404.19403 [pdf, other]
Title: Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), which synergizes an Environmental Information Semantic Encoder (EISE) with a Motion Planning Transformer (MPT). EISE converts environmental data into semantic environmental information (SEI), providing MPT with an enriched environmental comprehension. MPT leverages an attention mechanism to dynamically recalibrate its focus on SEI, task objectives, and historical planning data, refining the sampling node generation. To demonstrate the capabilities of TEMP, we train our model using a dataset comprised of planning results produced by the RRT*. EISE and MPT are collaboratively trained, enabling EISE to autonomously learn and extract patterns from environmental data, thereby forming semantic representations that MPT could more effectively interpret and utilize for motion planning. Subsequently, we conducted a systematic evaluation of TEMP's efficacy across diverse task dimensions, which demonstrates that TEMP achieves exceptional performance metrics and a heightened degree of generalizability compared to state-of-the-art SBMPs.

[218]  arXiv:2404.19409 [pdf, other]
Title: Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Subjects: Computation and Language (cs.CL)

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization: the reward function itself. Inspired by demonstration-guided RL, we here introduce the Reward Calibration from Demonstration (RCfD), which leverages human demonstrations and a reward model to recalibrate the reward objective. Formally, given a prompt, the RCfD objective minimizes the distance between the demonstrations' and LLM's rewards rather than directly maximizing the reward function. This objective shift avoids incentivizing the LLM to exploit the reward model and promotes more natural and diverse language generation. We show the effectiveness of RCfD on three language tasks, which achieves comparable performance to carefully tuned baselines while mitigating ROO.

[219]  arXiv:2404.19412 [pdf, ps, other]
Title: Enhancing Robotic Adaptability: Integrating Unsupervised Trajectory Segmentation and Conditional ProMPs for Dynamic Learning Environments
Authors: Tianci Gao
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

We propose a novel framework for enhancing robotic adaptability and learning efficiency, which integrates unsupervised trajectory segmentation with adaptive probabilistic movement primitives (ProMPs). By employing a cutting-edge deep learning architecture that combines autoencoders and Recurrent Neural Networks (RNNs), our approach autonomously pinpoints critical transitional points in continuous, unlabeled motion data, thus significantly reducing dependence on extensively labeled datasets. This innovative method dynamically adjusts motion trajectories using conditional variables, significantly enhancing the flexibility and accuracy of robotic actions under dynamic conditions while also reducing the computational overhead associated with traditional robotic programming methods. Our experimental validation demonstrates superior learning efficiency and adaptability compared to existing techniques, paving the way for advanced applications in industrial and service robotics.

[220]  arXiv:2404.19415 [pdf, other]
Title: Two-Stage Robust Planning Model for Park-Level Integrated Energy System Considering Uncertain Equipment Contingency
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In this paper, we propose a two-stage robust planning model for an Integrated Energy System (IES) that serves an industrial park. The term 'Park-level IES' is used to refers to IES of a smaller scale but have high demands for various forms of energy. The proposed planning model considers uncertainties like load demand fluctuations and equipment contingencies, and provides a reliable scheme of equipment selection and sizing for IES investors. Inspired by the unit commitment problem, we formulate an equipment contingency uncertainty set to accurately describe the potential equipment contingencies which happen and can be repaired within a day. Then, a novel and modified nested column-and-constraint generation algorithm is applied to solve this two-stage robust planning model with integer recourse efficiently. In the case study, the role of energy storage system for IES reliability enhancement is analyzed in detail. Computational results demonstrate the advantage of the proposed models over the deterministic planning model in terms of improving reliability.

[221]  arXiv:2404.19417 [pdf, other]
Title: Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Comments: To appear in CVPR 2024.11pages, 8 figures and 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the context of backdoor attacks, spanning both the digital and physical realms. We introduce two novel types of backdoor attacks on TIOD, each offering unique capabilities: Object-affecting Attack and Range-affecting Attack. We conduct a comprehensive analysis of key factors influencing trigger design, which include temperature, size, material, and concealment. These factors, especially temperature, significantly impact the efficacy of backdoor attacks on TIOD. A thorough understanding of these factors will serve as a foundation for designing physical triggers and temperature controlling experiments. Our study includes extensive experiments conducted in both digital and physical environments. In the digital realm, we evaluate our approach using benchmark datasets for TIOD, achieving an Attack Success Rate (ASR) of up to 98.21%. In the physical realm, we test our approach in two real-world settings: a traffic intersection and a parking lot, using a thermal infrared camera. Here, we attain an ASR of up to 98.38%.

[222]  arXiv:2404.19418 [pdf, other]
Title: Energy Cyber Attacks to Smart Healthcare Devices: A Testbed
Journal-ref: Bio-inspired Information and Communications Technologies, 2023
Subjects: Cryptography and Security (cs.CR)

The Internet of Things (IoT) has garnered significant interest in both research and industry due to its profound impact on human life. The rapid expansion of IoT technology has ushered in smart healthcare, smart devices, smart cities, and smart grids. However, the security of IoT devices, particularly in healthcare, has become a major concern, with recent attacks revealing serious vulnerabilities. In IoT networks, where connected devices are susceptible to resource-constraint attacks, such as energy consumption attacks, security is paramount.
This paper explores the impact of Distributed Denial of Service (DDoS) and Fake Access Points (F-APs) attacks on WiFi-enabled smart healthcare devices. Specifically, it investigates how these attacks can disrupt service on victim devices and Access Points (APs), focusing on device connectivity and energy consumption during attacks. Key findings include identifying the attack rates of DDoS attacks that disrupt services and quantifying the energy consumption impact of Energy Consumption Distributed Denial of Service (EC-DDoS) and F-APs attacks on smart healthcare devices.
The study highlights communication protocols, attack rates, payload sizes, and port states of victim devices as critical factors influencing energy consumption. These insights provide a comprehensive understanding of IoT device vulnerabilities in smart healthcare environments and lay the groundwork for future defense strategies.

[223]  arXiv:2404.19419 [pdf, other]
Title: Active Dendrites Enable Efficient Continual Learning in Time-To-First-Spike Neural Networks
Comments: This work was accepted and presented at AICAS 2024
Subjects: Neural and Evolutionary Computing (cs.NE)

While the human brain efficiently adapts to new tasks from a continuous stream of information, neural network models struggle to learn from sequential information without catastrophically forgetting previously learned tasks. This limitation presents a significant hurdle in deploying edge devices in real-world scenarios where information is presented in an inherently sequential manner. Active dendrites of pyramidal neurons play an important role in the brain ability to learn new tasks incrementally. By exploiting key properties of time-to-first-spike encoding and leveraging its high sparsity, we present a novel spiking neural network model enhanced with active dendrites. Our model can efficiently mitigate catastrophic forgetting in temporally-encoded SNNs, which we demonstrate with an end-of-training accuracy across tasks of 88.3% on the test set using the Split MNIST dataset. Furthermore, we provide a novel digital hardware architecture that paves the way for real-world deployment in edge devices. Using a Xilinx Zynq-7020 SoC FPGA, we demonstrate a 100-% match with our quantized software model, achieving an average inference time of 37.3 ms and an 80.0% accuracy.

[224]  arXiv:2404.19420 [pdf, other]
Title: Let's Focus: Focused Backdoor Attack against Federated Transfer Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Federated Transfer Learning (FTL) is the most general variation of Federated Learning. According to this distributed paradigm, a feature learning pre-step is commonly carried out by only one party, typically the server, on publicly shared data. After that, the Federated Learning phase takes place to train a classifier collaboratively using the learned feature extractor. Each involved client contributes by locally training only the classification layers on a private training set. The peculiarity of an FTL scenario makes it hard to understand whether poisoning attacks can be developed to craft an effective backdoor. State-of-the-art attack strategies assume the possibility of shifting the model attention toward relevant features introduced by a forged trigger injected in the input data by some untrusted clients. Of course, this is not feasible in FTL, as the learned features are fixed once the server performs the pre-training step. Consequently, in this paper, we investigate this intriguing Federated Learning scenario to identify and exploit a vulnerability obtained by combining eXplainable AI (XAI) and dataset distillation. In particular, the proposed attack can be carried out by one of the clients during the Federated Learning phase of FTL by identifying the optimal local for the trigger through XAI and encapsulating compressed information of the backdoor class. Due to its behavior, we refer to our approach as a focused backdoor approach (FB-FTL for short) and test its performance by explicitly referencing an image classification scenario. With an average 80% attack success rate, obtained results show the effectiveness of our attack also against existing defenses for Federated Learning.

[225]  arXiv:2404.19422 [pdf, other]
Title: Efficient Algorithms for Earliest and Fastest Paths in Public Transport Networks
Subjects: Data Structures and Algorithms (cs.DS)

Public transport administrators rely on efficient algorithms for various problems that arise in public transport networks. In particular, our study focused on designing linear-time algorithms for two fundamental path problems: the earliest arrival time (\textsc{eat}) and the fastest path duration (\textsc{fpd}) on public transportation data. We conduct a comparative analysis with state-of-the-art algorithms. The results are quite promising, indicating substantial efficiency improvements. Specifically, the fastest path problem shows a remarkable 34-fold speedup, while the earliest arrival time problem exhibits an even more impressive 183-fold speedup. These findings highlight the effectiveness of our algorithms to solve \textsc{eat} and \textsc{fpd} problems in public transport, and eventually help public administrators to enrich the urban transport experience.

[226]  arXiv:2404.19427 [pdf, ps, other]
Title: InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

[227]  arXiv:2404.19429 [pdf, other]
Title: Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Comments: 11 pages, 16 figures. Published in MLSys'24
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters. However, it faces the challenge of extended all-to-all communication latency during the training process. Existing methods attempt to mitigate this issue by overlapping all-to-all with expert computation. Yet, these methods frequently fall short of achieving sufficient overlap, consequently restricting the potential for performance enhancements. In our study, we extend the scope of this challenge by considering overlap at the broader training graph level. During the forward pass, we enable non-MoE computations to overlap with all-to-all through careful partitioning and pipelining. In the backward pass, we achieve overlap with all-to-all by scheduling gradient weight computations. We implement these techniques in Lancet, a system using compiler-based optimization to automatically enhance MoE model training. Our extensive evaluation reveals that Lancet significantly reduces the time devoted to non-overlapping communication, by as much as 77%. Moreover, it achieves a notable end-to-end speedup of up to 1.3 times when compared to the state-of-the-art solutions.

[228]  arXiv:2404.19430 [pdf, other]
Title: Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation
Comments: Accepted to *SEM 2024
Subjects: Computation and Language (cs.CL)

We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms. The proposed approach is applied to an existing Estonian language lexicon resource, S\~onaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic search.
The performance of the system is evaluated using both an existing labeled English dataset of words and definitions that is extended to contain also Estonian and Russian translations, and a novel unlabeled evaluation approach that extracts the evaluation data from the lexicon resource itself using synonymy relations.
Evaluation results indicate that the information retrieval based semantic search approach without any model training is feasible, producing median rank of 1 in the monolingual setting and median rank of 2 in the cross-lingual setting using the unlabeled evaluation approach, with models trained for cross-lingual retrieval and including Estonian in their training data showing superior performance in our particular task.

[229]  arXiv:2404.19431 [pdf, ps, other]
Title: Integrated Sensing and Communications for Unsourced Random Access: Fundamental Limits
Subjects: Information Theory (cs.IT)

This work considers the problem of integrated sensing and communication (ISAC) with a massive number of unsourced and uncoordinated users. In the proposed model, known as the unsourced ISAC system (UNISAC), all active communication and sensing users share a short frame to transmit their signals, without requiring scheduling with the base station (BS). Hence, the signal received from each user is affected by significant interference from numerous interfering users, making it challenging to extract the transmitted signals. UNISAC aims to decode the transmitted message sequences from communication users while simultaneously detect active sensing users, regardless of the identity of the decoded and detected users. In this paper, we derive an achievable performance limit for UNISAC and demonstrate its superiority over conventional approaches such as ALOHA, time-division multiple access, treating interference as noise, and multiple signal classification. Through numerical simulations, we validate the UNISAC's effectiveness in detecting and decoding a large number of users.

[230]  arXiv:2404.19432 [pdf, other]
Title: Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Two major areas of interest in the era of Large Language Models regard questions of what do LLMs know, and if and how they may be able to reason (or rather, approximately reason). Since to date these lines of work progressed largely in parallel (with notable exceptions), we are interested in investigating the intersection: probing for reasoning about the implicitly-held knowledge. Suspecting the performance to be lacking in this area, we use a very simple set-up of comparisons between cardinalities associated with elements of various subjects (e.g. the number of legs a bird has versus the number of wheels on a tricycle). We empirically demonstrate that although LLMs make steady progress in knowledge acquisition and (pseudo)reasoning with each new GPT release, their capabilities are limited to statistical inference only. It is difficult to argue that pure statistical learning can cope with the combinatorial explosion inherent in many commonsense reasoning tasks, especially once arithmetical notions are involved. Further, we argue that bigger is not always better and chasing purely statistical improvements is flawed at the core, since it only exacerbates the dangerous conflation of the production of correct answers with genuine reasoning ability.

[231]  arXiv:2404.19434 [pdf, other]
Title: Detection of Energy Consumption Cyber Attacks on Smart Devices
Journal-ref: Springer Nature Switzerland,2023
Subjects: Cryptography and Security (cs.CR)

With the rapid development of Internet of Things (IoT) technology, intelligent systems are increasingly integrating into everyday life and people's homes. However, the proliferation of these technologies raises concerns about the security of smart home devices. These devices often face resource constraints and may connect to unreliable networks, posing risks to the data they handle. Securing IoT technology is crucial due to the sensitive data involved.
Preventing energy attacks and ensuring the security of IoT infrastructure are key challenges in modern smart homes. Monitoring energy consumption can be an effective approach to detecting abnormal behavior and IoT cyberattacks. Lightweight algorithms are necessary to accommodate the resource limitations of IoT devices.
This paper presents a lightweight technique for detecting energy consumption attacks on smart home devices by analyzing received packets. The proposed algorithm considers TCP, UDP, and MQTT protocols, as well as device statuses (Idle, active, under attack). It accounts for resource constraints and promptly alerts administrators upon detecting an attack. The proposed approach effectively identifies energy consumption attacks by measuring packet reception rates for different protocols.

[232]  arXiv:2404.19438 [pdf, other]
Title: Neuro-Vision to Language: Image Reconstruction and Interaction via Non-invasive Brain Recordings
Subjects: Neural and Evolutionary Computing (cs.NE)

Decoding non-invasive brain recordings is crucial for advancing our understanding of human cognition, yet faces challenges from individual differences and complex neural signal representations. Traditional methods require custom models and extensive trials, and lack interpretability in visual reconstruction tasks. Our framework integrating integrates 3D brain structures with visual semantics by Vision Transformer 3D. The unified feature extractor aligns fMRI features with multiple levels of visual embeddings efficiently, removing the need for individual-specific models and allowing extraction from single-trial data. This extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with various fMRI-image related textual data to support multimodal large model development. The integration with LLMs enhances decoding capabilities, enabling tasks like brain captioning, question-answering, detailed descriptions, complex reasoning, and visual reconstruction. Our approach not only shows superior performance across these tasks but also precisely identifies and manipulates language-based concepts within brain signals, enhancing interpretability and providing deeper neural process insights. These advances significantly broaden non-invasive brain decoding applicability in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.

[233]  arXiv:2404.19441 [pdf, other]
Title: ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Authors: Yuzhe Gu, Enmao Diao
Comments: Preprint
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Existing neural audio codecs usually sacrifice computational complexity for audio quality. They build the feature transformation layers mainly on convolutional blocks, which are not inherently appropriate for capturing local redundancies of audio signals. As compensation, either adversarial losses from a discriminator or a large number of model parameters are required to improve the codec. To that end, we propose Efficient Speech Codec (ESC), a lightweight parameter-efficient codec laid on cross-scale residual vector quantization and transformers. Our model leverages mirrored hierarchical window-attention transformer blocks and performs step-wise decoding from coarse-to-fine feature representations. To enhance codebook utilization, we design a learning paradigm that involves a pre-training stage to assist with codec training. Extensive results show that ESC can achieve high audio quality with much lower complexity, which is a prospective alternative in place of existing codecs.

[234]  arXiv:2404.19442 [pdf, other]
Title: Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages
Comments: Working paper
Subjects: Computation and Language (cs.CL)

Naija is the Nigerian-Pidgin spoken by approx. 120M speakers in Nigeria and it is a mixed language (e.g., English, Portuguese and Indigenous languages). Although it has mainly been a spoken language until recently, there are currently two written genres (BBC and Wikipedia) in Naija. Through statistical analyses and Machine Translation experiments, we prove that these two genres do not represent each other (i.e., there are linguistic differences in word order and vocabulary) and Generative AI operates only based on Naija written in the BBC genre. In other words, Naija written in Wikipedia genre is not represented in Generative AI.

[235]  arXiv:2404.19444 [pdf, other]
Title: AnomalyXFusion: Multi-modal Anomaly Synthesis with Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we present the AnomalyXFusion framework, designed to harness multi-modality information to enhance the quality of synthesized abnormal samples. The AnomalyXFusion framework comprises two distinct yet synergistic modules: the Multi-modal In-Fusion (MIF) module and the Dynamic Dif-Fusion (DDF) module. The MIF module refines modality alignment by aggregating and integrating various modality features into a unified embedding space, termed X-embedding, which includes image, text, and mask features. Concurrently, the DDF module facilitates controlled generation through an adaptive adjustment of X-embedding conditioned on the diffusion steps. In addition, to reveal the multi-modality representational power of AnomalyXFusion, we propose a new dataset, called MVTec Caption. More precisely, MVTec Caption extends 2.2k accurate image-mask-text annotations for the MVTec AD and LOCO datasets. Comprehensive evaluations demonstrate the effectiveness of AnomalyXFusion, especially regarding the fidelity and diversity for logical anomalies. Project page: http:github.com/hujiecpp/MVTec-Caption

[236]  arXiv:2404.19448 [pdf, other]
Title: Sensorized Soft Skin for Dexterous Robotic Hands
Authors: Jana Egli (1), Benedek Forrai (1), Thomas Buchner (1), Jiangtao Su (2), Xiaodong Chen (2), Robert K. Katzschmann (1) ((1) ETH Zurich, (2) Nanyang Technological University Singapore)
Comments: 6 pages, 9 figures ICRA 2024
Subjects: Robotics (cs.RO); Hardware Architecture (cs.AR)

Conventional industrial robots often use two-fingered grippers or suction cups to manipulate objects or interact with the world. Because of their simplified design, they are unable to reproduce the dexterity of human hands when manipulating a wide range of objects. While the control of humanoid hands evolved greatly, hardware platforms still lack capabilities, particularly in tactile sensing and providing soft contact surfaces. In this work, we present a method that equips the skeleton of a tendon-driven humanoid hand with a soft and sensorized tactile skin. Multi-material 3D printing allows us to iteratively approach a cast skin design which preserves the robot's dexterity in terms of range of motion and speed. We demonstrate that a soft skin enables firmer grasps and piezoresistive sensor integration enhances the hand's tactile sensing capabilities.

[237]  arXiv:2404.19449 [pdf, other]
Title: AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks
Comments: This paper has been accepted by IEEE TVT
Subjects: Information Theory (cs.IT)

This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissions. We formulate a multi-stage stochastic optimization problem to minimize the long-term time-averaged age-of-information (AoI) by jointly optimizing the GUs' access control, the UAVs' beamforming, and trajectory planning strategies. To solve this problem, we first model the dynamics of the GUs' AoI statuses by virtual queueing systems, and then propose the AoI-aware sensing scheduling and trajectory optimization (AoI-STO) algorithm. This allows us to transform the multi-stage AoI minimization problem into a series of per-slot control problems by using the Lyapunov optimization framework. In each time slot, the GUs' access control, the UAVs' beamforming, and mobility control strategies are updated by using the block coordinate descent (BCD) method according to the instant GUs' AoI statuses. Simulation results reveal that the proposed AoI-STO algorithm can reduce the overall AoI by more than 50%. The GUs' scheduling fairness is also improved greatly by adapting the GUs' access control compared with typical baseline schemes.

[238]  arXiv:2404.19452 [pdf, other]
Title: How to Sustainably Monitor ML-Enabled Systems? Accuracy and Energy Efficiency Tradeoffs in Concept Drift Detection
Comments: Accepted for publication at the International Conference on Information and Communications Technology for Sustainability 2024 (ICT4S'24)
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

ML-enabled systems that are deployed in a production environment typically suffer from decaying model prediction quality through concept drift, i.e., a gradual change in the statistical characteristics of a certain real-world domain. To combat this, a simple solution is to periodically retrain ML models, which unfortunately can consume a lot of energy. One recommended tactic to improve energy efficiency is therefore to systematically monitor the level of concept drift and only retrain when it becomes unavoidable. Different methods are available to do this, but we know very little about their concrete impact on the tradeoff between accuracy and energy efficiency, as these methods also consume energy themselves.
To address this, we therefore conducted a controlled experiment to study the accuracy vs. energy efficiency tradeoff of seven common methods for concept drift detection. We used five synthetic datasets, each in a version with abrupt and one with gradual drift, and trained six different ML models as base classifiers. Based on a full factorial design, we tested 420 combinations (7 drift detectors * 5 datasets * 2 types of drift * 6 base classifiers) and compared energy consumption and drift detection accuracy.
Our results indicate that there are three types of detectors: a) detectors that sacrifice energy efficiency for detection accuracy (KSWIN), b) balanced detectors that consume low to medium energy with good accuracy (HDDM_W, ADWIN), and c) detectors that consume very little energy but are unusable in practice due to very poor accuracy (HDDM_A, PageHinkley, DDM, EDDM). By providing rich evidence for this energy efficiency tactic, our findings support ML practitioners in choosing the best suited method of concept drift detection for their ML-enabled systems.

[239]  arXiv:2404.19453 [pdf, other]
Title: Structural Parameters for Dense Temporal Graphs
Comments: 27 pages, 2 figures
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Temporal graphs provide a useful model for many real-world networks. Unfortunately the majority of algorithmic problems we might consider on such graphs are intractable. There has been recent progress in defining structural parameters which describe tractable cases by simultaneously restricting the underlying structure and the times at which edges appear in the graph. These all rely on the temporal graph being sparse in some sense. We introduce temporal analogues of three increasingly restrictive static graph parameters -- cliquewidth, modular-width and neighbourhood diversity -- which take small values for highly structured temporal graphs, even if a large number of edges are active at each timestep. The computational problems solvable efficiently when the temporal cliquewidth of the input graph is bounded form a subset of those solvable efficiently when the temporal modular-width is bounded, which is in turn a subset of problems efficiently solvable when the temporal neighbourhood diversity is bounded. By considering specific temporal graph problems, we demonstrate that (up to standard complexity theoretic assumptions) these inclusions are strict.

[240]  arXiv:2404.19454 [pdf, other]
Title: Optimized neural forms for solving ordinary differential equations
Subjects: Artificial Intelligence (cs.AI)

A critical issue in approximating solutions of ordinary differential equations using neural networks is the exact satisfaction of the boundary or initial conditions. For this purpose, neural forms have been introduced, i.e., functional expressions that depend on neural networks which, by design, satisfy the prescribed conditions exactly. Expanding upon prior progress, the present work contributes in three distinct aspects. First, it presents a novel formalism for crafting optimized neural forms. Second, it outlines a method for establishing an upper bound on the absolute deviation from the exact solution. Third, it introduces a technique for converting problems with Neumann or Robin conditions into equivalent problems with parametric Dirichlet conditions. The proposed optimized neural forms were numerically tested on a set of diverse problems, encompassing first-order and second-order ordinary differential equations, as well as first-order systems. Stiff and delay differential equations were also considered. The obtained solutions were compared against solutions obtained via Runge-Kutta methods and exact solutions wherever available. The reported results and analysis verify that in addition to the exact satisfaction of the boundary or initial conditions, optimized neural forms provide closed-form solutions of superior interpolation capability and controllable overall accuracy.

[241]  arXiv:2404.19456 [pdf, other]
Title: Imitation Learning: A Survey of Learning Methods, Environments and Metrics
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Imitation learning is an approach in which an agent learns how to execute a task by trying to mimic how one or more teachers perform it. This learning approach offers a compromise between the time it takes to learn a new task and the effort needed to collect teacher samples for the agent. It achieves this by balancing learning from the teacher, who has some information on how to perform the task, and deviating from their examples when necessary, such as states not present in the teacher samples. Consequently, the field of imitation learning has received much attention from researchers in recent years, resulting in many new methods and applications. However, with this increase in published work and past surveys focusing mainly on methodology, a lack of standardisation became more prominent in the field. This non-standardisation is evident in the use of environments, which appear in no more than two works, and evaluation processes, such as qualitative analysis, that have become rare in current literature. In this survey, we systematically review current imitation learning literature and present our findings by (i) classifying imitation learning techniques, environments and metrics by introducing novel taxonomies; (ii) reflecting on main problems from the literature; and (iii) presenting challenges and future directions for researchers.

[242]  arXiv:2404.19459 [pdf, other]
Title: Adaptive Gaussian Process Regression for Bayesian inverse problems
Comments: 12 pages, 4 figures, presented at ALGORITMY 2024
Subjects: Numerical Analysis (math.NA)

We introduce a novel adaptive Gaussian Process Regression (GPR) methodology for efficient construction of surrogate models for Bayesian inverse problems with expensive forward model evaluations. An adaptive design strategy focuses on optimizing both the positioning and simulation accuracy of training data in order to reduce the computational cost of simulating training data without compromising the fidelity of the posterior distributions of parameters. The method interleaves a goal-oriented active learning algorithm selecting evaluation points and tolerances based on the expected impact on the Kullback-Leibler divergence of surrogated and true posterior with a Markov Chain Monte Carlo sampling of the posterior. The performance benefit of the adaptive approach is demonstrated for two simple test problems.

[243]  arXiv:2404.19460 [pdf, other]
Title: AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples
Comments: this https URL
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than 100 attack implementations with a total of over 800 different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.

[244]  arXiv:2404.19462 [pdf, other]
Title: Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation
Comments: Published at ECML 2023
Subjects: Machine Learning (cs.LG)

We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain.

[245]  arXiv:2404.19467 [pdf, ps, other]
Title: Bayesian Functional Connectivity and Graph Convolutional Network for Working Memory Load Classification
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Brain responses related to working memory originate from distinct brain areas and oscillate at different frequencies. EEG signals with high temporal correlation can effectively capture these responses. Therefore, estimating the functional connectivity of EEG for working memory protocols in different frequency bands plays a significant role in analyzing the brain dynamics with increasing memory and cognitive loads, which remains largely unexplored. The present study introduces a Bayesian structure learning algorithm to learn the functional connectivity of EEG in sensor space. Next, the functional connectivity graphs are taken as input to the graph convolutional network to classify the working memory loads. The intrasubject (subject-specific) classification performed on 154 subjects for six different verbal working memory loads produced the highest classification accuracy of 96% and average classification accuracy of 89%, outperforming state-of-the-art classification models proposed in the literature. Furthermore, the proposed Bayesian structure learning algorithm is compared with state-of-the-art functional connectivity estimation methods through intersubject and intrasubject statistical analysis of variance. The results also show that the alpha and theta bands have better classification accuracy than the beta band.

[246]  arXiv:2404.19468 [pdf, ps, other]
Title: Compute-Forward Multiple Access for Gaussian Fast Fading Channels
Comments: ISIT'2024
Subjects: Information Theory (cs.IT)

Compute-forward multiple access (CFMA) is a transmission strategy which allows the receiver in a multiple access channel (MAC) to first decode linear combinations of the transmitted signals and then solve for individual messages. Compared to existing MAC strategies such as joint decoding or successive interference cancellation (SIC), CFMA was shown to achieve the MAC capacity region for fixed channels under certain signal-to-noise (SNR) conditions without time-sharing using only single-user decoders. This paper studies the CFMA scheme for a two-user Gaussian fast fading MAC with channel state information only available at the receiver (CSIR). We develop appropriate lattice decoding schemes for the fading MAC and derive the achievable rate pairs for decoding linear combinations of codewords with any integer coefficients. We give a sufficient and necessary condition under which the proposed scheme can achieve the ergodic sum capacity. Furthermore, we investigate the impact of channel statistics on the capacity achievability of the CFMA scheme. In general, the sum capacity is achievable if the channel variance is small compared to the mean value of the channel strengths. Various numerical results are presented to illustrate the theoretical findings.

[247]  arXiv:2404.19475 [pdf, other]
Title: TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have emerged as effective tools for generating diverse and high-quality content. However, their capability in high-resolution image generation, particularly for panoramic images, still faces challenges such as visible seams and incoherent transitions. In this paper, we propose TwinDiffusion, an optimized framework designed to address these challenges through two key innovations: Crop Fusion for quality enhancement and Cross Sampling for efficiency optimization. We introduce a training-free optimizing stage to refine the similarity of the adjacent image areas, as well as an interleaving sampling strategy to yield dynamic patches during the cropping process. A comprehensive evaluation is conducted to compare TwinDiffusion with the existing methods, considering factors including coherence, fidelity, compatibility, and efficiency. The results demonstrate the superior performance of our approach in generating seamless and coherent panoramas, setting a new standard in quality and efficiency for panoramic image generation.

[248]  arXiv:2404.19479 [pdf, other]
Title: Reachability in temporal graphs under perturbation
Comments: 36 pages, 3 figures
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Reachability and other path-based measures on temporal graphs can be used to understand spread of infection, information, and people in modelled systems. Due to delays and errors in reporting, temporal graphs derived from data are unlikely to perfectly reflect reality, especially with respect to the precise times at which edges appear. To reflect this uncertainty, we consider a model in which some number $\zeta$ of edge appearances may have their timestamps perturbed by $\pm\delta$ for some $\delta$. Within this model, we investigate temporal reachability and consider the problem of determining the maximum number of vertices any vertex can reach under these perturbations. We show that this problem is intractable in general but is efficiently solvable when $\zeta$ is sufficiently large. We also give algorithms which solve this problem in several restricted settings. We complement this with some contrasting results concerning the complexity of related temporal eccentricity problems under perturbation.

[249]  arXiv:2404.19480 [pdf, other]
Title: Mitigating and Analysis of Memory Usage Attack in IoE System
Journal-ref: Industrial Networks and Intelligent Systems,2023
Subjects: Cryptography and Security (cs.CR)

Internet of Everything (IoE) is a newly emerging trend, especially in homes. Marketing forces toward smart homes are also accelerating the spread of IoE devices in households. An obvious risk of the rapid adoption of these smart devices is that many lack controls for protecting the privacy and security of end users from attacks designed to disrupt lives and incur financial losses. Today the smart home is a system for managing the basic life support processes of both small systems, e.g., commercial, office premises, apartments, cottages, and largely automated complexes, e.g., commercial and industrial complexes. One of the critical tasks to be solved by the concept of a modern smart home is the problem of preventing the usage of IoE resources. Recently, there has been a rapid increase in attacks on consumer IoE devices.
Memory corruption vulnerabilities constitute a significant class of vulnerabilities in software security through which attackers can gain control of an entire system. Numerous memory corruption vulnerabilities have been found in IoE firmware already deployed in the consumer market. This paper aims to analyze and explain the resource usage attack and create a low-cost simulation environment to aid in the dynamic analysis of the attack. Further, we perform controlled resource usage attacks while measuring resource consumption on resource-constrained victims' IoE devices, such as CPU and memory utilization. We also build a lightweight algorithm to detect memory usage attacks in the IoE environment. The result shows high efficiency in detecting and mitigating memory usage attacks by detecting when the intruder starts and stops the attack.

[250]  arXiv:2404.19482 [pdf, other]
Title: FactCheck Editor: Multilingual Text Editor with End-to-End fact-checking
Authors: Vinay Setty
Comments: Accepted in SIGIR 2024 (demo track)
Subjects: Computation and Language (cs.CL)

We introduce 'FactCheck Editor', an advanced text editor designed to automate fact-checking and correct factual inaccuracies. Given the widespread issue of misinformation, often a result of unintentional mistakes by content creators, our tool aims to address this challenge. It supports over 90 languages and utilizes transformer models to assist humans in the labor-intensive process of fact verification. This demonstration showcases a complete workflow that detects text claims in need of verification, generates relevant search engine queries, and retrieves appropriate documents from the web. It employs Natural Language Inference (NLI) to predict the veracity of claims and uses LLMs to summarize the evidence and suggest textual revisions to correct any errors in the text. Additionally, the effectiveness of models used in claim detection and veracity assessment is evaluated across multiple languages.

[251]  arXiv:2404.19484 [pdf, other]
Title: More Compute Is What You Need
Authors: Zhen Guo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.

[252]  arXiv:2404.19485 [pdf, other]
Title: IID Relaxation by Logical Expressivity: A Research Agenda for Fitting Logics to Neurosymbolic Requirements
Comments: 12 pages, 2 figures, submitted to NeSy 2024
Subjects: Artificial Intelligence (cs.AI)

Neurosymbolic background knowledge and the expressivity required of its logic can break Machine Learning assumptions about data Independence and Identical Distribution. In this position paper we propose to analyze IID relaxation in a hierarchy of logics that fit different use case requirements. We discuss the benefits of exploiting known data dependencies and distribution constraints for Neurosymbolic use cases and argue that the expressivity required for this knowledge has implications for the design of underlying ML routines. This opens a new research agenda with general questions about Neurosymbolic background knowledge and the expressivity required of its logic.

[253]  arXiv:2404.19486 [pdf, other]
Title: Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Current text generation models are trained using real data which can potentially contain sensitive information, such as confidential patient information and the like. Under certain conditions output of the training data which they have memorised can be triggered, exposing sensitive data. To mitigate against this risk we propose a safer alternative which sees fragmented data in the form of domain-specific short phrases randomly grouped together shared instead of full texts. Thus, text fragments that could re-identify an individual cannot be reproduced by the model in one sequence, giving significant protection against linkage attacks. We fine-tune several state-of-the-art LLMs using meaningful syntactic chunks to explore their utility. In particular, we fine-tune BERT-based models to predict two cardiovascular diagnoses. Our results demonstrate the capacity of LLMs to benefit from the pre-trained knowledge and deliver classification results when fine-tuned with fragmented data comparable to fine-tuning with full training data.

[254]  arXiv:2404.19487 [pdf, ps, other]
Title: Finetuning greedy kernel models by exchange algorithms
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Kernel based approximation offers versatile tools for high-dimensional approximation, which can especially be leveraged for surrogate modeling. For this purpose, both "knot insertion" and "knot removal" approaches aim at choosing a suitable subset of the data, in order to obtain a sparse but nevertheless accurate kernel model. In the present work, focussing on kernel based interpolation, we aim at combining these two approaches to further improve the accuracy of kernel models, without increasing the computational complexity of the final kernel model. For this, we introduce a class of kernel exchange algorithms (KEA). The resulting KEA algorithm can be used for finetuning greedy kernel surrogate models, allowing for an reduction of the error up to 86.4% (17.2% on average) in our experiments.

[255]  arXiv:2404.19489 [pdf, other]
Title: EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Comments: 12 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

Edge vision systems combining sensing and embedded processing promise low-latency, decentralized, and energy-efficient solutions that forgo reliance on the cloud. As opposed to conventional frame-based vision sensors, event-based cameras deliver a microsecond-scale temporal resolution with sparse information encoding, thereby outlining new opportunities for edge vision systems. However, mainstream algorithms for frame-based vision, which mostly rely on convolutional neural networks (CNNs), can hardly exploit the advantages of event-based vision as they are typically optimized for dense matrix-vector multiplications. While event-driven graph neural networks (GNNs) have recently emerged as a promising solution for sparse event-based vision, their irregular structure is a challenge that currently hinders the design of efficient hardware accelerators. In this paper, we propose EvGNN, the first event-driven GNN accelerator for low-footprint, ultra-low-latency, and high-accuracy edge vision with event-based cameras. It relies on three central ideas: (i) directed dynamic graphs exploiting single-hop nodes with edge-free storage, (ii) event queues for the efficient identification of local neighbors within a spatiotemporally decoupled search range, and (iii) a novel layer-parallel processing scheme enabling the low-latency execution of multi-layer GNNs. We deployed EvGNN on a Xilinx KV260 Ultrascale+ MPSoC platform and benchmarked it on the N-CARS dataset for car recognition, demonstrating a classification accuracy of 87.8% and an average latency per event of 16$\mu$s, thereby enabling real-time, microsecond-resolution event-based vision at the edge.

[256]  arXiv:2404.19491 [pdf, ps, other]
Title: Construction of 2D explicit cubic quasi-interpolating splines in Bernstein-Bézier form
Subjects: Numerical Analysis (math.NA)

In this paper, the construction of $C^{1}$ cubic quasi-interpolants on a three-direction mesh of $\RR^{2}$ is addressed. The quasi-interpolating splines are defined by directly setting their Bernstein-B\'{e}zier coefficients relative to each triangle from point and gradient values in order to reproduce the polynomials of the highest possible degree. Moreover, additional global properties are required. Finally, we provide some numerical tests confirming the approximation properties.

[257]  arXiv:2404.19492 [pdf, other]
Title: Reducing Communication Overhead in the IoT-Edge-Cloud Continuum: A Survey on Protocols and Data Reduction Strategies
Subjects: Networking and Internet Architecture (cs.NI)

The adoption of the Internet of Things (IoT) deployments has led to a sharp increase in network traffic as a vast number of IoT devices communicate with each other and IoT services through the IoT-edge-cloud continuum. This network traffic increase poses a major challenge to the global communications infrastructure since it hinders communication performance and also puts significant strain on the energy consumption of IoT devices. To address these issues, efficient and collaborative IoT solutions which enable information exchange while reducing the transmitted data and associated network traffic are crucial. This survey provides a comprehensive overview of the communication technologies and protocols as well as data reduction strategies that contribute to this goal. First, we present a comparative analysis of prevalent communication technologies in the IoT domain, highlighting their unique characteristics and exploring the potential for protocol composition and joint usage to enhance overall communication efficiency within the IoT-edge-cloud continuum. Next, we investigate various data traffic reduction techniques tailored to the IoT-edge-cloud context and evaluate their applicability and effectiveness on resource-constrained and devices. Finally, we investigate the emerging concepts that have the potential to further reduce the communication overhead in the IoT-edge-cloud continuum, including cross-layer optimization strategies and Edge AI techniques for IoT data reduction. The paper offers a comprehensive roadmap for developing efficient and scalable solutions across the layers of the IoT-edge-cloud continuum that are beneficial for real-time processing to alleviate network congestion in complex IoT environments.

[258]  arXiv:2404.19500 [pdf, other]
Title: Towards Real-world Video Face Restoration: A New Benchmark
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.

[259]  arXiv:2404.19501 [pdf, other]
Title: A Unified Theory of Exact Inference and Learning in Exponential Family Latent Variable Models
Authors: Sacha Sokoloski
Subjects: Machine Learning (cs.LG)

Bayes' rule describes how to infer posterior beliefs about latent variables given observations, and inference is a critical step in learning algorithms for latent variable models (LVMs). Although there are exact algorithms for inference and learning for certain LVMs such as linear Gaussian models and mixture models, researchers must typically develop approximate inference and learning algorithms when applying novel LVMs. In this paper we study the line that separates LVMs that rely on approximation schemes from those that do not, and develop a general theory of exponential family, latent variable models for which inference and learning may be implemented exactly. Firstly, under mild assumptions about the exponential family form of a given LVM, we derive necessary and sufficient conditions under which the LVM prior is in the same exponential family as its posterior, such that the prior is conjugate to the posterior. We show that all models that satisfy these conditions are constrained forms of a particular class of exponential family graphical model. We then derive general inference and learning algorithms, and demonstrate them on a variety of example models. Finally, we show how to compose our models into graphical models that retain tractable inference and learning. In addition to our theoretical work, we have implemented our algorithms in a collection of libraries with which we provide numerous demonstrations of our theory, and with which researchers may apply our theory in novel statistical settings.

[260]  arXiv:2404.19503 [pdf, ps, other]
Title: Kuroda's Translation for Higher-Order Logic
Authors: Thomas Traversié (MICS, DEDUCTEAM)
Subjects: Logic in Computer Science (cs.LO)

In 1951, Kuroda defined an embedding of classical first-order logic into intuitionistic logic, such that a formula and its translation are equivalent in classical logic. Recently, Brown and Rizkallah extended this translation to higher-order logic, but did not prove the classical equivalence, and showed that the embedding fails in the presence of functional extensionality. We prove that functional extensionality and propositional extensionality are sufficient to derive the classical equivalence between a higher-order formula and its translation. We emphasize a condition under which Kuroda's translation works with functional extensionality.

[261]  arXiv:2404.19505 [pdf, other]
Title: Context-Aware Machine Translation with Source Coreference Explanation
Comments: Accepted to TACL. This is a pre-MIT Press publication version
Subjects: Computation and Language (cs.CL)

Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases. One of the main reasons is that they fail to utilize the correct features from context when the context is too long or their models are overly complex. This can lead to the explain-away effect, wherein the models only consider features easier to explain predictions, resulting in inaccurate translations. To address this issue, we propose a model that explains the decisions made for translation by predicting coreference features in the input. We construct a model for input coreference by exploiting contextual features from both the input and translation output representations on top of an existing MT model. We evaluate and analyze our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset, demonstrating an improvement of over 1.0 BLEU score when compared with other context-aware models.

[262]  arXiv:2404.19507 [pdf, other]
Title: Choosing a consultant in a dynamic investment problem
Subjects: Information Theory (cs.IT)

Consider a dynamic decision-making scenario where at every stage the investor has to choose between investing in one of two projects or gathering more information. At each stage, the investor may seek counsel from one of several consultants, who, for a fixed cost, provide partial information about the realized state. We explore the optimal strategy and its dependence on the belief and the consultation cost. Our analysis reveals that if one of the consultants discloses the state with a nonzero probability, this consultant will be used in any optimal strategy, provided the consultation cost is sufficiently small.

[263]  arXiv:2404.19508 [pdf, other]
Title: Temporal Graph ODEs for Irregularly-Sampled Time Series
Comments: Preprint. Accepted at IJCAI 2024
Subjects: Machine Learning (cs.LG)

Modern graph representation learning works mostly under the assumption of dealing with regularly sampled temporal graph snapshots, which is far from realistic, e.g., social networks and physical systems are characterized by continuous dynamics and sporadic observations. To address this limitation, we introduce the Temporal Graph Ordinary Differential Equation (TG-ODE) framework, which learns both the temporal and spatial dynamics from graph streams where the intervals between observations are not regularly spaced. We empirically validate the proposed approach on several graph benchmarks, showing that TG-ODE can achieve state-of-the-art performance in irregular graph stream tasks.

[264]  arXiv:2404.19509 [pdf, other]
Title: Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Comments: 14 pages, 8 tables and 5 figures
Subjects: Computation and Language (cs.CL)

Understanding the non-literal meaning of an utterance is critical for large language models (LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp, the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourced from dialogues in the Chinese sitcom $\textit{My Own Swordsman}$. It includes 200 carefully handcrafted questions, all annotated on which Gricean maxims have been violated. We test eight close-source and open-source LLMs under two tasks: a multiple-choice question task and an implicature explanation task. Our results show that GPT-4 attains human-level accuracy (94%) on multiple-choice questions. CausalLM demonstrates a 78.5% accuracy following GPT-4. Other models, including GPT-3.5 and several open-source models, demonstrate a lower accuracy ranging from 20% to 60% on multiple-choice questions. Human raters were asked to rate the explanation of the implicatures generated by LLMs on their reasonability, logic and fluency. While all models generate largely fluent and self-consistent text, their explanations score low on reasonability except for GPT-4, suggesting that most LLMs cannot produce satisfactory explanations of the implicatures in the conversation. Moreover, we find LLMs' performance does not vary significantly by Gricean maxims, suggesting that LLMs do not seem to process implicatures derived from different maxims differently. Our data and code are available at https://github.com/sjtu-compling/llm-pragmatics.

[265]  arXiv:2404.19512 [pdf, other]
Title: Comparison of the high-order Runge-Kutta discontinuous Galerkin method and gas-kinetic scheme for inviscid compressible flow simulations
Subjects: Numerical Analysis (math.NA)

The Runge--Kutta discontinuous Galerkin (RKDG) method is a high-order technique for addressing hyperbolic conservation laws, which has been refined over recent decades and is effective in handling shock discontinuities. Despite its advancements, the RKDG method faces challenges, such as stringent constraints on the explicit time-step size and reduced robustness when dealing with strong discontinuities. On the other hand, the Gas-Kinetic Scheme (GKS) based on a high-order gas evolution model also delivers significant accuracy and stability in solving hyperbolic conservation laws through refined spatial and temporal discretizations. Unlike RKDG, GKS allows for more flexible CFL number constraints and features an advanced flow evolution mechanism at cell interfaces. Additionally, GKS' compact spatial reconstruction enhances the accuracy of the method and its ability to capture stable strong discontinuities effectively. In this study, we conduct a thorough examination of the RKDG method using various numerical fluxes and the GKS method employing both compact and non-compact spatial reconstructions. Both methods are applied under the framework of explicit time discretization and are tested solely in inviscid scenarios. We will present numerous numerical tests and provide a comparative analysis of the outcomes derived from these two computational approaches.

[266]  arXiv:2404.19513 [pdf, ps, other]
Title: A Smartphone-Based Method for Assessing Tomato Nutrient Status through Trichome Density Measurement
Authors: Sho Ueda, Xujun Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurately assessing tomato plant nutrient status is crucial for maintaining high yields. Consequently, accurately identifying fertilizer-induced stress through the morphological traits of tomato plants has become a critical agricultural challenge. Research and development efforts have focused on developing noninvasive diagnostic tools for nutrition that leverage a combination of morphological traits and advanced sensor technologies. Given these advancements, detecting fertilizer stress by observing morphological traits near the growth points of tomatoes is still a significant challenge. To address this challenge, we developed a simple and cost-effective smartphone-based method for measuring trichome density. This method involves transferring trichomes from the surface of a leaf onto cellophane tape and capturing images using a smartphone. The images are processed using computer vision techniques to calculate the trichome density. To assess the efficacy of this method, we performed experiments on hydroponically grown tomato plants subjected to varying fertilizer concentrations. Our results indicate that our novel method for measuring trichome density accurately reflects fertilizer stress in tomato plants. The predictive performance of our model, as evaluated by the mean area under the precision recall curve, was 0.824, despite variations in the measurement data caused by differences in optical conditions. This study introduces an innovative approach for designing diagnostic devices for detecting fertilizer stress in plants by considering the surface structures of plants. Our proposed method represents a straightforward, efficient, and economical approach for evaluating the nutrient status of tomato plants and has the potential to overcome the limitations of conventional noncontact optical methods.

[267]  arXiv:2404.19518 [pdf, other]
Title: MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem
Comments: to be published in IJCAI2024
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)

With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Single-agent pathfinding (DVS). However, this paper demonstrates that the methods based on DVS cannot always obtain the optimal solution. To obtain the optimal result, we propose the Multi-Goal Conflict-Based Search (MGCBS), which is based on Decoupling the goal Safe interval visiting order search and the Single-agent pathfinding (DSS). Additionally, we present the Time-Interval-Space Forest (TIS Forest) to enhance the efficiency of MGCBS by maintaining the shortest paths from any start point at any start time step to each safe interval at the goal points. The experiment demonstrates that our method can consistently obtain optimal results and execute up to 7 times faster than the state-of-the-art method in our evaluation.

[268]  arXiv:2404.19519 [pdf, ps, other]
Title: Generating Robust Counterfactual Witnesses for Graph Neural Networks
Comments: This paper has been accepted by ICDE 2024
Subjects: Machine Learning (cs.LG); Databases (cs.DB)

This paper introduces a new class of explanation structures, called robust counterfactual witnesses (RCWs), to provide robust, both counterfactual and factual explanations for graph neural networks. Given a graph neural network M, a robust counterfactual witness refers to the fraction of a graph G that are counterfactual and factual explanation of the results of M over G, but also remains so for any "disturbed" G by flipping up to k of its node pairs. We establish the hardness results, from tractable results to co-NP-hardness, for verifying and generating robust counterfactual witnesses. We study such structures for GNN-based node classification, and present efficient algorithms to verify and generate RCWs. We also provide a parallel algorithm to verify and generate RCWs for large graphs with scalability guarantees. We experimentally verify our explanation generation process for benchmark datasets, and showcase their applications.

[269]  arXiv:2404.19520 [pdf, ps, other]
Title: Passivation of Clustered DC Microgrids with Non-Monotone Loads
Subjects: Systems and Control (eess.SY)

In this paper, we consider the problem of voltage stability in DC networks containing uncertain loads with non-monotone incremental impedances and where the steady-state power availability is restricted to a subset of the buses in the network. We propose controllers for powered buses that guarantee voltage regulation and output strictly equilibrium independent passivity (OS-EIP) of the controlled buses, while buses without power are equipped with controllers that dampen their transient behaviour. The OS-EIP of a cluster containing both bus types is verified through a linear matrix inequality (LMI) condition, and the asymptotic stability of the overall microgrid with uncertain, non-monotone loads is ensured by interconnecting the OS-EIP clusters. By further employing singular perturbation theory, we show that the OS-EIP property of the clusters is robust against certain network parameter and topology changes.

[270]  arXiv:2404.19523 [pdf, other]
Title: TRAC: a tool for data-aware coordination (with an application to smart contracts)
Subjects: Logic in Computer Science (cs.LO)

We propose TRAC, a tool for the specification and verification of coordinated multiparty distributed systems. Relying on finite-state machines (FSMs) where transition labels look like Hoare triples, \thetool can specify the coordination of the participants of a distributed protocol for instance an execution model akin blockchain smart contracts (SCs). In fact, the transitions of our FSMs yield guards, and assignments over data variables, and with participants binders. The latter allow us to model scenarios with an unbounded number of participants which can vary at run-time. We introduce a notion of well-formedness to rule out meaningless or problematic specifications. This notion is verified with TRAC and demonstrated on several case studies borrowed from the smart contracts domain. Then, we evaluate the performance of TRAC using a set of randomised examples, studying the correlations between the features supported and the time taken to decide well-formedness.

[271]  arXiv:2404.19525 [pdf, other]
Title: MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model. Given the images produced by the diffusion model, SIR reduces NFEs by repeatedly optimizing 3D parameters, unlike the single optimization in SDS, mimicking the 3D reconstruction process. With other improvements including optimization in the pixel space, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splitting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at https://github.com/ML-GSAI/MicroDreamer.

[272]  arXiv:2404.19527 [pdf, other]
Title: Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via imitation, the mixed feature with ambiguous semantics hinders the distillation. To this end, we propose an asymmetric distillation framework by feeding teacher model extra raw data to enlarge the benefit of teacher. Moreover, a joint mutual information loss and a selective relabel strategy are utilized to alleviate the influence of hard mixed samples. Our method successfully mitigates the decline in open-set and outperforms SOTAs by 2%~3% AUROC on the Tiny-ImageNet dataset and experiments on large-scale dataset ImageNet-21K demonstrate the generalization of our method.

[273]  arXiv:2404.19531 [pdf, other]
Title: MoST: Multi-modality Scene Tokenization for Motion Prediction
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e.g., failures in detecting open-vocabulary obstacles) while missing salient information from the scene context (e.g., poor road conditions). An alternative paradigm is end-to-end learning from raw sensors. However, this approach suffers from the lack of interpretability and requires significantly more training resources. In this work, we propose tokenizing the visual world into a compact set of scene elements and then leveraging pre-trained image foundation models and LiDAR neural networks to encode all the scene elements in an open-vocabulary manner. The image foundation model enables our scene tokens to encode the general knowledge of the open world while the LiDAR neural network encodes geometry information. Our proposed representation can efficiently encode the multi-frame multi-modality observations with a few hundred tokens and is compatible with most transformer-based architectures. To evaluate our method, we have augmented Waymo Open Motion Dataset with camera embeddings. Experiments over Waymo Open Motion Dataset show that our approach leads to significant performance improvements over the state-of-the-art.

[274]  arXiv:2404.19532 [pdf, other]
Title: Optimized Soft-Aided Decoding of OFEC and Staircase Codes
Comments: submitted to ECOC 2024
Subjects: Information Theory (cs.IT)

We propose a novel soft-aided hard-decision decoding algorithm for general product-like codes. It achieves error correcting performance similar to that of a soft-decision turbo decoder for staircase and OFEC codes, while maintaining a low complexity.

[275]  arXiv:2404.19534 [pdf, other]
Title: MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results
Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.

[276]  arXiv:2404.19536 [pdf, other]
Title: Physics-Informed Machine Learning On Polar Ice: A Survey
Subjects: Machine Learning (cs.LG)

The mass loss of the polar ice sheets contributes considerably to ongoing sea-level rise and changing ocean circulation, leading to coastal flooding and risking the homes and livelihoods of tens of millions of people globally. To address the complex problem of ice behavior, physical models and data-driven models have been proposed in the literature. Although traditional physical models can guarantee physically meaningful results, they have limitations in producing high-resolution results. On the other hand, data-driven approaches require large amounts of high-quality and labeled data, which is rarely available in the polar regions. Hence, as a promising framework that leverages the advantages of physical models and data-driven methods, physics-informed machine learning (PIML) has been widely studied in recent years. In this paper, we review the existing algorithms of PIML, provide our own taxonomy based on the methods of combining physics and data-driven approaches, and analyze the advantages of PIML in the aspects of accuracy and efficiency. Further, our survey discusses some current challenges and highlights future opportunities, including PIML on sea ice studies, PIML with different combination methods and backbone networks, and neural operator methods.

[277]  arXiv:2404.19541 [pdf, other]
Title: Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging
Comments: Accepted by SIGGRAPH 2024, Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Signal Processing (eess.SP)

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).

[278]  arXiv:2404.19542 [pdf, other]
Title: One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
Comments: The 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary Temporal Action Detection (Open-vocab TAD) is an advanced video analysis approach that expands Closed-vocabulary Temporal Action Detection (Closed-vocab TAD) capabilities. Closed-vocab TAD is typically confined to localizing and classifying actions based on a predefined set of categories. In contrast, Open-vocab TAD goes further and is not limited to these predefined categories. This is particularly useful in real-world scenarios where the variety of actions in videos can be vast and not always predictable. The prevalent methods in Open-vocab TAD typically employ a 2-stage approach, which involves generating action proposals and then identifying those actions. However, errors made during the first stage can adversely affect the subsequent action identification accuracy. Additionally, existing studies face challenges in handling actions of different durations owing to the use of fixed temporal processing methods. Therefore, we propose a 1-stage approach consisting of two primary modules: Multi-scale Video Analysis (MVA) and Video-Text Alignment (VTA). The MVA module captures actions at varying temporal resolutions, overcoming the challenge of detecting actions with diverse durations. The VTA module leverages the synergy between visual and textual modalities to precisely align video segments with corresponding action labels, a critical step for accurate action identification in Open-vocab scenarios. Evaluations on widely recognized datasets THUMOS14 and ActivityNet-1.3, showed that the proposed method achieved superior results compared to the other methods in both Open-vocab and Closed-vocab settings. This serves as a strong demonstration of the effectiveness of the proposed method in the TAD task.

[279]  arXiv:2404.19543 [pdf, other]
Title: RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
Authors: Yucheng Hu, Yuxing Lu
Comments: 30 pages, 7 figures. Draft version 1
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. To mitigate these, recent methodologies have integrated information retrieved from external resources with LLMs, substantially enhancing their performance across NLP tasks. This survey paper addresses the absence of a comprehensive overview on Retrieval-Augmented Language Models (RALMs), both Retrieval-Augmented Generation (RAG) and Retrieval-Augmented Understanding (RAU), providing an in-depth examination of their paradigm, evolution, taxonomy, and applications. The paper discusses the essential components of RALMs, including Retrievers, Language Models, and Augmentations, and how their interactions lead to diverse model structures and applications. RALMs demonstrate utility in a spectrum of tasks, from translation and dialogue systems to knowledge-intensive applications. The survey includes several evaluation methods of RALMs, emphasizing the importance of robustness, accuracy, and relevance in their assessment. It also acknowledges the limitations of RALMs, particularly in retrieval quality and computational efficiency, offering directions for future research. In conclusion, this survey aims to offer a structured insight into RALMs, their potential, and the avenues for their future development in NLP. The paper is supplemented with a Github Repository containing the surveyed works and resources for further study: https://github.com/2471023025/RALM_Survey.

[280]  arXiv:2404.19545 [pdf, ps, other]
Title: Discrete de-Rham complex involving a discontinuous finite element space for velocities: the case of periodic straight triangular and Cartesian meshes
Authors: Vincent Perrier (CAGIRE, LMAP)
Subjects: Numerical Analysis (math.NA)

The aim of this article is to derive discontinuous finite elements vector spaces which can be put in a discrete de-Rham complex for which an harmonic gap property may be proven. First, discontinuous finite element spaces inspired by classical N{\'e}d{\'e}lec or Raviart-Thomas conforming space are considered, and we prove that by relaxing the normal or tangential constraint, discontinuous spaces ensuring the harmonic gap property can be built. Then the triangular case is addressed, for which we prove that such a property holds for the classical discontinuous finite element space for vectors. On Cartesian meshes, this result does not hold for the classical discontinuous finite element space for vectors. We then show how to use the de-Rham complex found for triangular meshes for enriching the finite element space on Cartesian meshes in order to recover a de-Rham complex, on which the same harmonic gap property is proven.

[281]  arXiv:2404.19546 [pdf, ps, other]
Title: Designing Technology for Positive Solitude
Comments: Published in Modern Behavioral Science Journal, 1(1), 2013
Subjects: Human-Computer Interaction (cs.HC)

This paper discusses Life-Based Design (LBD) methodology within the context of designing technologies for reaching a state of solitude, the state where a person wishes to minimize her social contacts to get space or freedom.

[282]  arXiv:2404.19547 [pdf, other]
Title: Distributed Traffic Signal Control via Coordinated Maximum Pressure-plus-Penalty
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Optimization and Control (math.OC)

This paper develops an adaptive traffic control policy inspired by Maximum Pressure (MP) while imposing coordination across intersections. The proposed Coordinated Maximum Pressure-plus-Penalty (CMPP) control policy features a local objective for each intersection that consists of the total pressure within the neighborhood and a penalty accounting for the queue capacities and continuous green time for certain movements. The corresponding control task is reformulated as a distributed optimization problem and solved via two customized algorithms: one based on the alternating direction method of multipliers (ADMM) and the other follows a greedy heuristic augmented with a majority vote. CMPP not only provides a theoretical guarantee of queuing network stability but also outperforms several benchmark controllers in simulations on a large-scale real traffic network with lower average travel and waiting time per vehicle, as well as less network congestion. Furthermore, CPMM with the greedy algorithm enjoys comparable computational efficiency as fully decentralized controllers without significantly compromising the control performance, which highlights its great potential for real-world deployment.

[283]  arXiv:2404.19548 [pdf, ps, other]
Title: An Extensive Survey of Digital Image Steganography: State of the Art
Subjects: Emerging Technologies (cs.ET); Cryptography and Security (cs.CR)

The need to protect sensitive information privacy duringinformation exchange over the internet/intranet has led towider adoption of cryptography and steganography. The cryptography approaches convert the information into an unreadable format however draws the attention of cryptanalyst owing to the uncommon random nature flow of the bytes when viewing the flowing structured bytes on a computer. While steganography, in contrast, conceals the very existence of covert communication using digital media. Although any digital media (text, image, video, audio) can covey the sensitive information, the media with higher redundant bits are more favorable for embedding the sensitive information without distorting the media. Digital images are majorly used in conveying sensitive information compared to others owing to their higher rate of tolerating distortions, highly available, smaller sizes with high redundant bits. However, the need for maximizing the redundancy bits for the optimum embedding of secret information has been a paramount issue due to the imperceptibility prerequisite which deteriorates with an increase in payload thus, resulting in a tradeoff. This has limited steganography to only applications with lower payload requirements, thus limiting the adoption for wider deployment. This paper critically analyzes the current steganographic techniques, recent trends, and challenges.

[284]  arXiv:2404.19552 [pdf, ps, other]
Title: Type-Based Unsourced Multiple Access
Comments: submitted to the 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
Subjects: Information Theory (cs.IT)

We generalize the type-based multiple access framework proposed by Mergen and Tong (2006) to the case of unsourced multiple access. In the proposed framework, each device tracks the state of a physical/digital process, quantizes this state, and communicates it to a common receiver through a shared channel in an uncoordinated manner. The receiver aims to estimate the type of the states, i.e., the set of states and their multiplicity in the sequence of states reported by all devices. We measure the type estimation error using the Wasserstein distance. Considering an example of multi-target position tracking, we show that type estimation can be performed effectively via approximate message passing. Furthermore, we determine the quantization resolution that minimizes the type estimation error by balancing quantization distortion and communication error.

[285]  arXiv:2404.19553 [pdf, other]
Title: Extending Llama-3's Context Ten-Fold Overnight
Subjects: Computation and Language (cs.CL)

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: \url{https://github.com/FlagOpen/FlagEmbedding}.

[286]  arXiv:2404.19555 [pdf, other]
Title: Transforming Credit Guarantee Schemes with Distributed Ledger Technology
Subjects: Computational Engineering, Finance, and Science (cs.CE); General Economics (econ.GN)

Credit Guarantee Schemes (CGSs) are crucial in mitigating SMEs' financial constraints. However, they are renownedly affected by critical shortcomings, such as a lack of financial sustainability and operational efficiency. Distributed Ledger Technologies (DLTs) have shown significant revolutionary influence in several sectors, including finance and banking, thanks to the full operational traceability they bring alongside verifiable computation. Nevertheless, the potential synergy between DLTs and CGSs has not been thoroughly investigated yet. This paper proposes a comprehensive framework to utilise DLTs, particularly blockchain technologies, in CGS processes to improve operational efficiency and effectiveness. To this end, we compare key architectural characteristics considering access level, governance structure, and consensus method, to examine their fit with CGS processes. We believe this study can guide policymakers and stakeholders, thereby stimulating further innovation in this promising field.

[287]  arXiv:2404.19559 [pdf, other]
Title: Computational study of numerical flux schemes for mesoscale atmospheric flows in a Finite Volume framework
Comments: 22 pages
Subjects: Numerical Analysis (math.NA); Atmospheric and Oceanic Physics (physics.ao-ph)

We develop, and implement in a Finite Volume environment, a density-based approach for the Euler equations written in conservative form using density, momentum, and total energy as variables. Under simplifying assumptions, these equations are used to describe non-hydrostatic atmospheric flow. The well-balancing of the approach is ensured by a local hydrostatic reconstruction updated in runtime during the simulation to keep the numerical error under control. To approximate the solution of the Riemann problem, we consider four methods: Roe-Pike, HLLC, AUSM+-up and HLLC-AUSM. We assess our density-based approach and compare the accuracy of these four approximated Riemann solvers using two two classical benchmarks, namely the smooth rising thermal bubble and the density current.

[288]  arXiv:2404.19563 [pdf, other]
Title: RepEval: Effective Text Evaluation with LLM Representation
Subjects: Computation and Language (cs.CL)

Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs for training, and through simple prompt modifications, it can easily transition to various tasks. Results on ten datasets from three tasks demonstrate the high effectiveness of our method, which exhibits stronger correlations with human judgments compared to previous metrics, even outperforming GPT-4. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.

[289]  arXiv:2404.19564 [pdf, other]
Title: Time, Travel, and Energy in the Uniform Dispersion Problem
Comments: Includes and expands results from "Minimizing Travel in the Uniform Dispersal Problem for Robotic Sensors" (AAMAS 2019, arXiv:1903.03259)
Subjects: Robotics (cs.RO); Discrete Mathematics (cs.DM); Multiagent Systems (cs.MA)

We investigate the algorithmic problem of uniformly dispersing a swarm of robots in an unknown, gridlike environment. In this setting, our goal is to comprehensively study the relationships between performance metrics and robot capabilities. We introduce a formal model comparing dispersion algorithms based on makespan, traveled distance, energy consumption, sensing, communication, and memory. Using this framework, we classify several uniform dispersion algorithms according to their capability requirements and performance. We prove that while makespan and travel can be minimized in all environments, energy cannot, as long as the swarm's sensing range is bounded. In contrast, we show that energy can be minimized even by simple, ``ant-like" robots in synchronous settings and asymptotically minimized in asynchronous settings, provided the environment is topologically simply connected. Our findings offer insights into fundamental limitations that arise when designing swarm robotics systems for exploring unknown environments, highlighting the impact of environment's topology on the feasibility of energy-efficient dispersion.

[290]  arXiv:2404.19567 [pdf, other]
Title: Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment
Authors: Lei Wang, Desen Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains unreliable in real-world applications due to its vulnerability to adversarial perturbations and the inexplicit black-box structure. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL), and a score reflection attack method for IQA model. More specifically, we assume that each image is composed of Causal Perception Representation (CPR) and non-causal perception representation (N-CPR). CPR serves as the causation of the subjective quality label, which is invariant to the imperceptible adversarial perturbations. Inversely, N-CPR presents spurious associations with the subjective quality label, which may significantly change with the adversarial perturbations. To extract the CPR from each input image, we develop a soft ranking based channel-wise activation function to mediate the causally sufficient (beneficial for high prediction accuracy) and necessary (beneficial for high robustness) deep features, and based on intervention employ minimax game to optimize. Experiments on four benchmark databases show that the proposed CPRL method outperforms many state-of-the-art adversarial defense methods and provides explicit model interpretation.

[291]  arXiv:2404.19569 [pdf, other]
Title: Consensus + Innovations Approach for Online Distributed Multi-Area Inertia Estimation
Subjects: Systems and Control (eess.SY)

The reduction of overall system inertia in modern power systems due to the increasing deployment of distributed energy resources is generally recognized as a major issue for system stability. Consequently, real-time monitoring of system inertia is critical to ensure a reliable and cost-effective system operation. Large-scale power systems are typically managed by multiple transmission system operators, making it difficult to have a central entity with access to global measurement data, which is usually required for estimating the overall system inertia. We address this problem by proposing a fully distributed inertia estimation algorithm with rigorous analytical convergence guarantees. This method requires only peer-to-peer sharing of local parameter estimates between neighboring control areas, eliminating the need for a centralized collection of real-time measurements. We robustify the algorithm in the presence of typical power system disturbances and demonstrate its performance in simulations based on the well-known New England IEEE-39 bus system.

[292]  arXiv:2404.19573 [pdf, other]
Title: War Elephants: Rethinking Combat AI and Human Oversight
Comments: 15 pages, 2 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

This paper explores the changes that pervasive AI is having on the nature of combat. We look beyond the substitution of AI for experts to an approach where complementary human and machine abilities are blended. Using historical and modern examples, we show how autonomous weapons systems can be effectively managed by teams of human "AI Operators" combined with AI/ML "Proxy Operators." By basing our approach on the principles of complementation, we provide for a flexible and dynamic approach to managing lethal autonomous systems. We conclude by presenting a path to achieving an integrated vision of machine-speed combat where the battlefield AI is operated by AI Operators that watch for patterns of behavior within battlefield to assess the performance of lethal autonomous systems. This approach enables the development of combat systems that are likely to be more ethical, operate at machine speed, and are capable of responding to a broader range of dynamic battlefield conditions than any purely autonomous AI system could support.

[293]  arXiv:2404.19574 [pdf, ps, other]
Title: A Spatio-Temporal based Frame Indexing Algorithm for QoS Improvement in Live Low-Motion Video Streaming
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-time video life streaming of events over a network continued to gain more popularity among the populace. However, there is need to ensure the judicious utilization of allocated bandwidth without compromising the Quality of Service (QoS) of the system. In this regard, this paper presents an approach based on spatio-temporal frame indexing that detects and eliminate redundancy within and across captured frame, prior transmission from the server to clients. The standard and local low motion videos were the two scenarios considered in evaluating the performance of the proposed algorithm. Results obtained showed that the proposed approach achieved an improvement of 5.13%, 15.8% and 5%, 15.6% improvement in terms of the buffer size and compression ratio. Though with a tradeoff of the frame-built time, where both the standard and local frame indexing outperforms the proposed scheme with 10.8% and 8.71% respectively.

[294]  arXiv:2404.19578 [pdf, ps, other]
Title: New EVENODD+ Codes with More Flexible Parameters and Lower Complexity
Authors: Panyu Zhu
Subjects: Information Theory (cs.IT)

EVENODD+ codes are binary maximum distance separable (MDS) array codes for correcting double disk failures in RAID-6 with asymptotically optimal encoding/decoding/update complexities. However, the number of bits stored in each disk of EVENODD+ codes should be an odd number minus one. In this paper, we present a new construction of EVENODD+ codes that have more flexible parameters. The number of bits stored in each disk of our codes is an odd minus one times any positive integer. Moreover, our codes not only have asymptotically optimal encoding/decoding/update complexities but also have lower encoding/decoding/update complexities than the existing EVENODD+ codes.

[295]  arXiv:2404.19582 [pdf, other]
Title: Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

We develop DMAVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information (which was completely ignored in previous attacks): on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstruction performance; on the other hand, computing malicious gradients with label information better mimics the honest training, making the malicious gradients indistinguishable from the honest ones, and the attack much more stealthy. Our comprehensive experiments demonstrate that DMAVFL significantly outperforms existing attacks, and successfully circumvents SOTA defenses for malicious attacks. Additional ablation studies and evaluations on other defenses further underscore the robustness and effectiveness of DMAVFL.

[296]  arXiv:2404.19585 [pdf, other]
Title: Integrating Visuo-tactile Sensing with Haptic Feedback for Teleoperated Robot Manipulation
Subjects: Robotics (cs.RO)

Telerobotics enables humans to overcome spatial constraints and allows them to physically interact with the environment in remote locations. However, the sensory feedback provided by the system to the operator is often purely visual, limiting the operator's dexterity in manipulation tasks. In this work, we address this issue by equipping the robot's end-effector with high-resolution visuotactile GelSight sensors. Using low-cost MANUS-Gloves, we provide the operator with haptic feedback about forces acting at the points of contact in the form of vibration signals. We propose two different methods for estimating these forces; one based on estimating the movement of markers on the sensor surface and one deep-learning approach. Additionally, we integrate our system into a virtual-reality teleoperation pipeline in which a human operator controls both arms of a Tiago robot while receiving visual and haptic feedback. We believe that integrating haptic feedback is a crucial step for dexterous manipulation in teleoperated robotic systems.

[297]  arXiv:2404.19586 [pdf, other]
Title: AI techniques for near real-time monitoring of contaminants in coastal waters on board future Phisat-2 mission
Comments: 11 pages, 9 figures, submitted to IEEE JSTARS
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Differently from conventional procedures, the proposed solution advocates for a groundbreaking paradigm in water quality monitoring through the integration of satellite Remote Sensing (RS) data, Artificial Intelligence (AI) techniques, and onboard processing. The objective is to offer nearly real-time detection of contaminants in coastal waters addressing a significant gap in the existing literature. Moreover, the expected outcomes include substantial advancements in environmental monitoring, public health protection, and resource conservation. The specific focus of our study is on the estimation of Turbidity and pH parameters, for their implications on human and aquatic health. Nevertheless, the designed framework can be extended to include other parameters of interest in the water environment and beyond. Originating from our participation in the European Space Agency (ESA) OrbitalAI Challenge, this article describes the distinctive opportunities and issues for the contaminants monitoring on the Phisat-2 mission. The specific characteristics of this mission, with the tools made available, will be presented, with the methodology proposed by the authors for the onboard monitoring of water contaminants in near real-time. Preliminary promising results are discussed and in progress and future work introduced.

[298]  arXiv:2404.19591 [pdf, other]
Title: Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"
Subjects: Databases (cs.DB); Machine Learning (cs.LG); Software Engineering (cs.SE)

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.

[299]  arXiv:2404.19594 [pdf, other]
Title: Reactive Temporal Logic-based Planning and Control for Interactive Robotic Tasks
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Robots interacting with humans must be safe, reactive and adapt online to unforeseen environmental and task changes. Achieving these requirements concurrently is a challenge as interactive planners lack formal safety guarantees, while safe motion planners lack flexibility to adapt. To tackle this, we propose a modular control architecture that generates both safe and reactive motion plans for human-robot interaction by integrating temporal logic-based discrete task level plans with continuous Dynamical System (DS)-based motion plans. We formulate a reactive temporal logic formula that enables users to define task specifications through structured language, and propose a planning algorithm at the task level that generates a sequence of desired robot behaviors while being adaptive to environmental changes. At the motion level, we incorporate control Lyapunov functions and control barrier functions to compute stable and safe continuous motion plans for two types of robot behaviors: (i) complex, possibly periodic motions given by autonomous DS and (ii) time-critical tasks specified by Signal Temporal Logic~(STL). Our methodology is demonstrated on the Franka robot arm performing wiping tasks on a whiteboard and a mannequin that is compliant to human interactions and adaptive to environmental changes.

[300]  arXiv:2404.19595 [pdf, other]
Title: Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment
Authors: Lei Wang, Desen Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

In this paper, we propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS). Assuming that each SOS is the observed sample of a normal distribution and the MOS is its unknown expectation, the MOS inference is formulated as a maximum likelihood estimation problem, where the perceptual correlation of pairwise images is considered in modeling the likelihood of SOS. More specifically, by means of the quality-aware representations learned from the self-supervised backbone, we introduce a learnable relative quality measure to predict the MOS difference between two images. Then, the current image's maximum likelihood estimation towards MOS is represented by the sum of another reference image's estimated MOS and their relative quality. Ideally, no matter which image is selected as the reference, the MOS of the current image should remain unchanged, which is termed perceptual cons tancy constrained calibration (PC3). Finally, we alternatively optimize the relative quality measure's parameter and the current image's estimated MOS via backpropagation and Newton's method respectively. Experiments show that the proposed method is efficient in calibrating the biased SOS and significantly improves IQA model learning when only SOSs are available.

[301]  arXiv:2404.19596 [pdf, other]
Title: Debiased Collaborative Filtering with Kernel-Based Causal Balancing
Comments: ICLR 24 Spotlight
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing constraints. However, existing methods usually ignore such constraints or implement them with unreasonable approximations, which may affect the accuracy of the learned propensity scores. To bridge this gap, in this paper, we first analyze the gaps between the causal balancing requirements and existing methods such as learning the propensity with cross-entropy loss or manually selecting functions to balance. Inspired by these gaps, we propose to approximate the balancing functions in reproducing kernel Hilbert space and demonstrate that, based on the universal property and representer theorem of kernel functions, the causal balancing constraints can be better satisfied. Meanwhile, we propose an algorithm that adaptively balances the kernel function and theoretically analyze the generalization error bound of our methods. We conduct extensive experiments to demonstrate the effectiveness of our methods, and to promote this research direction, we have released our project at https://github.com/haoxuanli-pku/ICLR24-Kernel-Balancing.

[302]  arXiv:2404.19597 [pdf, other]
Title: Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
Comments: work in progress
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR)

The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. However, the impact of backdoor attacks on multilingual models remains under-explored. Our research focuses on cross-lingual backdoor attacks against multilingual LLMs, particularly investigating how poisoning the instruction-tuning data in one or two languages can affect the outputs in languages whose instruction-tuning data was not poisoned. Despite its simplicity, our empirical analysis reveals that our method exhibits remarkable efficacy in models like mT5, BLOOM, and GPT-3.5-turbo, with high attack success rates, surpassing 95% in several languages across various scenarios. Alarmingly, our findings also indicate that larger models show increased susceptibility to transferable cross-lingual backdoor attacks, which also applies to LLMs predominantly pre-trained on English data, such as Llama2, Llama3, and Gemma. Moreover, our experiments show that triggers can still work even after paraphrasing, and the backdoor mechanism proves highly effective in cross-lingual response settings across 25 languages, achieving an average attack success rate of 50%. Our study aims to highlight the vulnerabilities and significant security risks present in current multilingual LLMs, underscoring the emergent need for targeted security measures.

[303]  arXiv:2404.19605 [pdf, other]
Title: Data-Driven Invertible Neural Surrogates of Atmospheric Transmission
Comments: Manuscript accepted for presentation and publication at the 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)

We present a framework for inferring an atmospheric transmission profile from a spectral scene. This framework leverages a lightweight, physics-based simulator that is automatically tuned - by virtue of autodifferentiation and differentiable programming - to construct a surrogate atmospheric profile to model the observed data. We demonstrate utility of the methodology by (i) performing atmospheric correction, (ii) recasting spectral data between various modalities (e.g. radiance and reflectance at the surface and at the sensor), and (iii) inferring atmospheric transmission profiles, such as absorbing bands and their relative magnitudes.

[304]  arXiv:2404.19609 [pdf, other]
Title: Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of multispectral satellite imagery. We randomly mask time series of satellite images using real-world cloud masks and train each model to reconstruct the missing pixels. The ViT model is fine-tuned from a pretrained model, while the CGAN is trained from scratch. Using quantitative evaluation metrics such as structural similarity index and mean absolute error as well as qualitative visual analysis, we assess imputation accuracy and contextual preservation.

[305]  arXiv:2404.19612 [pdf, other]
Title: Quantum Cloud Computing: Trends and Challenges
Comments: 9 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

Quantum computing (QC) is a new paradigm that will revolutionize various areas of computing, especially cloud computing. QC, still in its infancy, is a costly technology capable of operating in highly isolated environments due to its rapid response to environmental factors. For this reason, it is still a challenging technology for researchers to reach. Integrating QC into an isolated remote server, like a cloud, and making it available to users can overcome these problems. Furthermore, experts predict that QC, with its ability to swiftly resolve complex and computationally intensive operations, will offer significant benefits in systems that process large amounts of data, like cloud computing. This article presents the vision and challenges for the quantum cloud computing (QCC) paradigm that will emerge with the integration of quantum and cloud computing. Next, we present the advantages of QC over classical computing applications. We analyze the effects of QC on cloud systems, such as cost, security, and scalability. Besides all of these advantages, we highlight research gaps in QCC, such as qubit stability and efficient resource allocation. This article identifies QCC's advantages and challenges for future research, highlighting research gaps.

[306]  arXiv:2404.19614 [pdf, ps, other]
Title: COTS: Connected OpenAPI Test Synthesis for RESTful Applications
Subjects: Software Engineering (cs.SE); Logic in Computer Science (cs.LO)

We present a novel model-driven approach for testing RESTful applications. We introduce a (i) domain-specific language for OpenAPI specifications and (ii) a tool to support our methodology. Our DSL is inspired by session types and enables the modelling of communication protocols between a REST client and server. Our tool, dubbed COTS, generates (randomised) model-based test executions and reports software defects. We evaluate the effectiveness of our approach by applying it to test several open source applications. Our findings indicate that our methodology can identify nuanced defects in REST APIs and achieve comparable or superior code coverage when compared to much larger handcrafted test suites.

[307]  arXiv:2404.19615 [pdf, other]
Title: SemiPL: A Semi-supervised Method for Event Sound Source Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many applications, e.g., crowd management, and emergency response services. In this paper, we apply the existing model to a more complex dataset, explore the influence of parameters on the model, and propose a semi-supervised improvement method SemiPL. With the increase in data quantity and the influence of label quality, self-supervised learning will be an unstoppable trend. The experiment shows that the parameter adjustment will positively affect the existing model. In particular, SSPL achieved an improvement of 12.2% cIoU and 0.56% AUC in Chaotic World compared to the results provided. The code is available at: https://github.com/ly245422/SSPL

[308]  arXiv:2404.19618 [pdf, other]
Title: Novel Round Trip Time Estimation in 5G NR
Subjects: Information Theory (cs.IT)

The fifth generation new radio (5G NR) technology is expected to fulfill reliable and accurate positioning requirements of industry use cases, such as autonomous robots, connected vehicles, and future factories. Starting from Third Generation Partnership Project (3GPP) Release-16, several enhanced positioning solutions are featured in the 5G standards, including the multi-cell round trip time (multi-RTT) method. This work presents a novel framework to estimate the round-trip time (RTT) between a user equipment (UE) and a base station (gNB) in 5G NR. Unlike the existing scheme in the standards, RTT can be estimated without the need to send timing measurements from both the gNB and UE to a central node. The proposed method relies on obtaining multiple coherent uplink wide-band channel measurements at the gNB by circumventing the timing advance control loops and the clock drift. The performance is evaluated through experiments leveraging a real world 5G testbed based on OpenAirInterface (OAI). Under a moderate system bandwidth of 40MHz, the experimental results show meter level range accuracy even in low signal-to-noise ratio (SNR) conditions.

[309]  arXiv:2404.19619 [pdf, other]
Title: Physical Non-inertial Poser (PNP): Modeling Non-inertial Effects in Sparse-inertial Human Motion Capture
Comments: Accepted by SIGGRAPH 2024 Project Page: this https URL
Subjects: Graphics (cs.GR)

Existing inertial motion capture techniques use the human root coordinate frame to estimate local poses and treat it as an inertial frame by default. We argue that when the root has linear acceleration or rotation, the root frame should be considered non-inertial theoretically. In this paper, we model the fictitious forces that are non-neglectable in a non-inertial frame by an auto-regressive estimator delicately designed following physics. With the fictitious forces, the force-related IMU measurement (accelerations) can be correctly compensated in the non-inertial frame and thus Newton's laws of motion are satisfied. In this case, the relationship between the accelerations and body motions is deterministic and learnable, and we train a neural network to model it for better motion capture. Furthermore, to train the neural network with synthetic data, we develop an IMU synthesis by simulation strategy to better model the noise model of IMU hardware and allow parameter tuning to fit different hardware. This strategy not only establishes the network training with synthetic data but also enables calibration error modeling to handle bad motion capture calibration, increasing the robustness of the system. Code is available at https://xinyu-yi.github.io/PNP/.

[310]  arXiv:2404.19620 [pdf, other]
Title: Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference
Comments: ICLR 24
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, named neighborhood effect. To fill the gap, this paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference and introduces a treatment representation to capture the neighborhood effect. On this basis, we propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect. We further develop two new estimators for estimating the proposed ideal loss. We theoretically establish the connection between the proposed and previous debiasing methods ignoring the neighborhood effect, showing that the proposed methods can achieve unbiased learning when both selection bias and neighborhood effect are present, while the existing methods are biased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed methods.

[311]  arXiv:2404.19622 [pdf, other]
Title: Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Comments: 13+1 pages, 2 figures, accepted at the Human Motion Generation workshop (HuMoGen) at CVPR 2024
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These technologies hold great promise for more human-like, efficient, expressive, and robust synthetic communication, but are currently held back by the lack of suitably large datasets, as existing methods are trained on parallel data from all constituent modalities. Inspired by student-teacher methods, we propose a straightforward solution to the data shortage, by simply synthesising additional training material. Specifically, we use unimodal synthesis models trained on large datasets to create multimodal (but synthetic) parallel training data, and then pre-train a joint synthesis model on that material. In addition, we propose a new synthesis architecture that adds better and more controllable prosody modelling to the state-of-the-art method in the field. Our results confirm that pre-training on large amounts of synthetic data improves the quality of both the speech and the motion synthesised by the multimodal model, with the proposed architecture yielding further benefits when pre-trained on the synthetic data. See https://shivammehta25.github.io/MAGI/ for example output.

[312]  arXiv:2404.19625 [pdf, other]
Title: Dual-Port Grid-Forming Interconnecting Power Converters in Hybrid AC/DC Grids
Comments: 10 pages, 17 figures, submitted to IEEE Journal of Emerging and Selected Topics in Power Electronics
Subjects: Systems and Control (eess.SY)

Interconnecting power converters (IPC) are the main elements enabling the interconnection of multiple high-voltage alternating current (HVAC) and high-voltage direct current (HVDC) subgrids. These converters can be classified either as grid-forming or grid-following. These roles can be assigned to both ac and dc terminals. This work compares state-of-the-art single-port grid-forming and grid-following control schemes with a dual-port grid-forming control scheme, which can simultaneously form a stable voltage on the ac and the dc sides. The dual-port grid-forming small-signal stability and dynamic behavior under fluctuations in the power flow are studied and compared against state-of-the-art control architectures. Moreover, the dual-port control scheme is validated and tested on a down-scaled laboratory platform with several transient events.

[313]  arXiv:2404.19626 [pdf, other]
Title: Machine learning of continuous and discrete variational ODEs with convergence guarantee and uncertainty quantification
Authors: Christian Offen
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)

The article introduces a method to learn dynamical systems that are governed by Euler--Lagrange equations from data. The method is based on Gaussian process regression and identifies continuous or discrete Lagrangians and is, therefore, structure preserving by design. A rigorous proof of convergence as the distance between observation data points converges to zero is provided. Next to convergence guarantees, the method allows for quantification of model uncertainty, which can provide a basis of adaptive sampling techniques. We provide efficient uncertainty quantification of any observable that is linear in the Lagrangian, including of Hamiltonian functions (energy) and symplectic structures, which is of interest in the context of system identification. The article overcomes major practical and theoretical difficulties related to the ill-posedness of the identification task of (discrete) Lagrangians through a careful design of geometric regularisation strategies and through an exploit of a relation to convex minimisation problems in reproducing kernel Hilbert spaces.

[314]  arXiv:2404.19627 [pdf, ps, other]
Title: Acceso abierto en Argentina: una propuesta para el monitoreo de las publicaciones científicas con OpenAlex
Comments: in Spanish (en Espa\~nol)
Subjects: Digital Libraries (cs.DL)

This study proposes a methodology using OpenAlex (OA) for tracking Open Access publications in the case of Argentina, a country where a self-archiving mandate has been in effect since 2013 ( Law 26.899, 2013). A sample of 167,240 papers by researchers from the National Council for Scientific and Technical Research (CONICET) was created and analyzed using statistical techniques. We estimate that OA is able to capture between 85-93% of authors for all disciplines, with the exception of Social Sciences and Humanities, where it only reaches an estimated 47%. The availability of papers in Open Access was calculated to be 41% for the period 1953-2021 and 46% when considering exclusively the post-law period (2014-2021). In both periods, gold Open Access made up the most common route. When comparing equal periods post and pre-law, we observed that the upward trend of gold Open Access was pre-existing to the legislation and the availability of closed articles in repositories increased by 5% to what is estimated based on existing trends. However, while the green route has had a positive evolution, it has been the publication in gold journals that has boosted access to Argentine production more rapidly. We concluded that the OA-based methodology, piloted here for the first time, is viable for tracking Open Access in Argentina since it yields percentages similar to other national and international studies.
En este estudio se propone una metodolog\'ia utilizando OpenAlex (OA) para monitorear el acceso abierto (AA) a las publicaciones cient\'ificas para el caso de Argentina, pa\'is donde rige el mandato de autoarchivo -Ley 26.899 (2013)-. Se conform\'o una muestra con 167.240 art\'iculos de investigadores del Consejo Nacional de Investigaciones Cient\'ificas y T\'ecnicas (CONICET) que se analizaron con t\'ecnicas estad\'isticas. Se estim\'o que OA puede representar entre 85-93% de los autores para todas las disciplinas, excepto Ciencias Sociales y Humanidades, donde solo alcanza al 47%. Se calcul\'o que 41% de los art\'iculos publicados entre 1953-2021 incluidos en la fuente est\'an en AA, porcentaje que sube a 46% al considerar exclusivamente el periodo post ley (2014-2021). En ambos periodos es la v\'ia dorada la que representa mayor proporci\'on. Al comparar periodos iguales post y pre ley, se observ\'o que la tendencia en alza de la v\'ia dorada era preexistente a la legislaci\'on y la disponibilidad de art\'iculos cerrados en repositorios aument\'o un 5% a lo que se estima en base a tendencias existentes. Se concluye que si bien la v\'ia verde ha tenido una evoluci\'on positiva, ha sido la publicaci\'on en revistas doradas lo que ha impulsado m\'as r\'apidamente el acceso a la producci\'on argentina. Asimismo, que la metodolog\'ia basada en OA, piloteada aqu\'i por primera vez, es viable para monitorear el AA en Argentina ya que arroja porcentajes similares a otros estudios nacionales e internacionales.

[315]  arXiv:2404.19629 [pdf, other]
Title: The Drawback of Insight: Detailed Explanations Can Reduce Agreement with XAI
Comments: ACM CHI 2024 Workshop on Human-Centered Explainable AI (HCXAI), 5 pages
Subjects: Human-Computer Interaction (cs.HC)

With the emergence of Artificial Intelligence (AI)-based decision-making, explanations help increase new technology adoption through enhanced trust and reliability. However, our experimental study challenges the notion that every user universally values explanations. We argue that the agreement with AI suggestions, whether accompanied by explanations or not, is influenced by individual differences in personality traits and the users' comfort with technology. We found that people with higher neuroticism and lower technological comfort showed more agreement with the recommendations without explanations. As more users become exposed to eXplainable AI (XAI) and AI-based systems, we argue that the XAI design should not provide explanations for users with high neuroticism and low technology comfort. Prioritizing user personalities in XAI systems will help users become better collaborators of AI systems.

[316]  arXiv:2404.19630 [pdf, other]
Title: Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction
Comments: 9 pages, 6 figures
Journal-ref: 23rd Conference on Artificial Intelligence for Environmental Science. Jan 2024. Abstract #437874
Subjects: Machine Learning (cs.LG)

The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model sizes and depths, and multi-step fine-tuning to investigate their effect. We also examine the model performance with metrics beyond the typical ACC and RMSE, and investigate how the performance scales with model size.

[317]  arXiv:2404.19631 [pdf, other]
Title: On Training a Neural Network to Explain Binaries
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Software Engineering (cs.SE)

In this work, we begin to investigate the possibility of training a deep neural network on the task of binary code understanding. Specifically, the network would take, as input, features derived directly from binaries and output English descriptions of functionality to aid a reverse engineer in investigating the capabilities of a piece of closed-source software, be it malicious or benign. Given recent success in applying large language models (generative AI) to the task of source code summarization, this seems a promising direction. However, in our initial survey of the available datasets, we found nothing of sufficiently high quality and volume to train these complex models. Instead, we build our own dataset derived from a capture of Stack Overflow containing 1.1M entries. A major result of our work is a novel dataset evaluation method using the correlation between two distances on sample pairs: one distance in the embedding space of inputs and the other in the embedding space of outputs. Intuitively, if two samples have inputs close in the input embedding space, their outputs should also be close in the output embedding space. We found this Embedding Distance Correlation (EDC) test to be highly diagnostic, indicating that our collected dataset and several existing open-source datasets are of low quality as the distances are not well correlated. We proceed to explore the general applicability of EDC, applying it to a number of qualitatively known good datasets and a number of synthetically known bad ones and found it to be a reliable indicator of dataset value.

[318]  arXiv:2404.19632 [pdf, other]
Title: Behavioural Metrics: Compositionality of the Kantorovich Lifting and an Application to Up-To Techniques
Subjects: Logic in Computer Science (cs.LO)

Behavioural distances of transition systems modelled as coalgebras for endofunctors generalize the traditional notions of behavioural equivalence to a quantitative setting, in which states are equipped with a measure of how (dis)similar they are. Endowing transition systems with such distances essentially relies on the ability to lift functors describing the one-step behavior of the transition systems to the category of pseudometric spaces. We consider the Kantorovich lifting of a functor on quantale-valued relations, which subsumes equivalences, preorders and (directed) metrics. We use tools from fibred category theory, which allow one to see the Kantorovich lifting as arising from an appropriate fibred adjunction. Our main contributions are compositionality results for the Kantorovich lifting, where we show that that the lifting of a composed functor coincides with the composition of the liftings. In addition we describe how to lift distributive laws in the case where one of the two functors is polynomial. These results are essential ingredients for adopting up-to-techniques to the case of quantale-valued behavioural distances. Up-to techniques are a well-known coinductive technique for efficiently showing lower bounds for behavioural distances. We conclude by illustrating the results of our paper in two case studies.

[319]  arXiv:2404.19633 [pdf, other]
Title: SEArch: an execution infrastructure for service-based software systems
Subjects: Software Engineering (cs.SE)

The shift from monolithic applications to composition of distributed software initiated in the early twentieth, is based on the vision of software-as-service. This vision, found in many technologies such as RESTful APIs, advocates globally available services cooperating through an infrastructure providing (access to) distributed computational resources. Choreographies can support this vision by abstracting away local computation and rendering interoperability with message-passing: cooperation is achieved by sending and receiving messages. Following this choreographic paradigm, we develop SEArch, after Service Execution Architecture, a language-independent execution infrastructure capable of performing transparent dynamic reconfiguration of software artefacts. Choreographic mechanisms are used in SEArch to specify interoperability contracts, thus providing the support needed for automatic discovery and binding of services at runtime.

[320]  arXiv:2404.19634 [pdf, other]
Title: DF Louvain: Fast Incrementally Expanding Approach for Community Detection on Dynamic Graphs
Authors: Subhajit Sahu
Comments: 22 pages, 15 figures, 3 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

Community detection is the problem of recognizing natural divisions in networks. A relevant challenge in this problem is to find communities on rapidly evolving graphs. In this report we present our Parallel Dynamic Frontier (DF) Louvain algorithm, which given a batch update of edge deletions and insertions, incrementally identifies and processes an approximate set of affected vertices in the graph with minimal overhead, while using a novel approach of incrementally updating weighted-degrees of vertices and total edge weights of communities. We also present our parallel implementations of Naive-dynamic (ND) and Delta-screening (DS) Louvain. On a server with a 64-core AMD EPYC-7742 processor, our experiments show that DF Louvain obtains speedups of 179x, 7.2x, and 5.3x on real-world dynamic graphs, compared to Static, ND, and DS Louvain, respectively, and is 183x, 13.8x, and 8.7x faster, respectively, on large graphs with random batch updates. Moreover, DF Louvain improves its performance by 1.6x for every doubling of threads.

[321]  arXiv:2404.19638 [pdf, other]
Title: SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Existing 3D algorithms for distributed-memory sparse kernels suffer from limited scalability due to reliance on bulk sparsity-agnostic communication. While easier to use, sparsity-agnostic communication leads to unnecessary bandwidth and memory consumption. We present SpComm3D, a framework for enabling sparsity-aware communication and minimal memory footprint such that no unnecessary data is communicated or stored in memory. SpComm3D performs sparse communication efficiently with minimal or no communication buffers to further reduce memory consumption. SpComm3D detaches the local computation at each processor from the communication, allowing flexibility in choosing the best accelerated version for computation. We build 3D algorithms with SpComm3D for the two important sparse ML kernels: Sampled Dense-Dense Matrix Multiplication (SDDMM) and Sparse matrix-matrix multiplication (SpMM). Experimental evaluations on up to 1800 processors demonstrate that SpComm3D has superior scalability and outperforms state-of-the-art sparsity-agnostic methods with up to 20x improvement in terms of communication, memory, and runtime of SDDMM and SpMM. The code is available at: https://github.com/nfabubaker/SpComm3D

[322]  arXiv:2404.19639 [pdf, other]
Title: ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, zero-shot learning has attracted the focus of many researchers, due to its flexibility and generality. Many approaches have been proposed to achieve the zero-shot classification of the point clouds for 3D object understanding, following the schema of CLIP. However, in the real world, the point clouds could be extremely sparse, dramatically limiting the effectiveness of the 3D point cloud encoders, and resulting in the misalignment of point cloud features and text embeddings. To the point cloud encoders to fit the extremely sparse point clouds without re-running the pre-training procedure which could be time-consuming and expensive, in this work, we propose an unsupervised model adaptation approach to enhance the point cloud encoder for the extremely sparse point clouds. We propose a novel fused-cross attention layer that expands the pre-trained self-attention layer with additional learnable tokens and attention blocks, which effectively modifies the point cloud features while maintaining the alignment between point cloud features and text embeddings. We also propose a complementary learning-based self-distillation schema that encourages the modified features to be pulled apart from the irrelevant text embeddings without overfitting the feature space to the observed text embeddings. Extensive experiments demonstrate that the proposed approach effectively increases the zero-shot capability on extremely sparse point clouds, and overwhelms other state-of-the-art model adaptation approaches.

[323]  arXiv:2404.19640 [pdf, other]
Title: Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME); Machine Learning (stat.ML)

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.

[324]  arXiv:2404.19643 [pdf, other]
Title: Cybersecurity Pathways Towards CE-Certified Autonomous Forestry Machines
Subjects: Software Engineering (cs.SE)

The increased importance of cybersecurity in autonomous machinery is becoming evident in the forestry domain. Forestry worksites are becoming more complex with the involvement of multiple systems and system of systems. Hence, there is a need to investigate how to address cybersecurity challenges for autonomous systems of systems in the forestry domain. Using a literature review and adapting standards from similar domains, as well as collaborative sessions with domain experts, we identify challenges towards CE-certified autonomous forestry machines focusing on cybersecurity and safety. Furthermore, we discuss the relationship between safety and cybersecurity risk assessment and their relation to AI, highlighting the need for a holistic methodology for their assurance.

[325]  arXiv:2404.19644 [pdf, other]
Title: MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation
Comments: ICLR 24
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Out-of-distribution (OOD) problems in few-shot classification (FSC) occur when novel classes sampled from testing distributions differ from base classes drawn from training distributions, which considerably degrades the performance of deep learning models deployed in real-world applications. Recent studies suggest that the OOD problems in FSC mainly including: (a) cross-domain few-shot classification (CD-FSC) and (b) spurious-correlation few-shot classification (SC-FSC). Specifically, CD-FSC occurs when a classifier learns transferring knowledge from base classes drawn from seen training distributions but recognizes novel classes sampled from unseen testing distributions. In contrast, SC-FSC arises when a classifier relies on non-causal features (or contexts) that happen to be correlated with the labels (or concepts) in base classes but such relationships no longer hold during the model deployment. Despite CD-FSC has been extensively studied, SC-FSC remains understudied due to lack of the corresponding evaluation benchmarks. To this end, we present Meta Concept Context (MetaCoCo), a benchmark with spurious-correlation shifts collected from real-world scenarios. Moreover, to quantify the extent of spurious-correlation shifts of the presented MetaCoCo, we further propose a metric by using CLIP as a pre-trained vision-language model. Extensive experiments on the proposed benchmark are performed to evaluate the state-of-the-art methods in FSC, cross-domain shifts, and self-supervised learning. The experimental results show that the performance of the existing methods degrades significantly in the presence of spurious-correlation shifts. We open-source all codes of our benchmark and hope that the proposed MetaCoCo can facilitate future research on spurious-correlation shifts problems in FSC. The code is available at: https://github.com/remiMZ/MetaCoCo-ICLR24.

[326]  arXiv:2404.19649 [pdf, other]
Title: Landmark Alternating Diffusion
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essence of AD while offering superior computational efficiency. We provide a series of theoretical analyses of LAD under the manifold setup and apply it to the automatic sleep stage annotation problem with two electroencephalogram channels to demonstrate its application.

[327]  arXiv:2404.19651 [pdf, other]
Title: Provably Robust Conformal Prediction with Improved Efficiency
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Conformal prediction is a powerful tool to generate uncertainty sets with guaranteed coverage using any predictive model, under the assumption that the training and test data are i.i.d.. Recently, it has been shown that adversarial examples are able to manipulate conformal methods to construct prediction sets with invalid coverage rates, as the i.i.d. assumption is violated. To address this issue, a recent work, Randomized Smoothed Conformal Prediction (RSCP), was first proposed to certify the robustness of conformal prediction methods to adversarial noise. However, RSCP has two major limitations: (i) its robustness guarantee is flawed when used in practice and (ii) it tends to produce large uncertainty sets. To address these limitations, we first propose a novel framework called RSCP+ to provide provable robustness guarantee in evaluation, which fixes the issues in the original RSCP method. Next, we propose two novel methods, Post-Training Transformation (PTT) and Robust Conformal Training (RCT), to effectively reduce prediction set size with little computation overhead. Experimental results in CIFAR10, CIFAR100, and ImageNet suggest the baseline method only yields trivial predictions including full label set, while our methods could boost the efficiency by up to $4.36\times$, $5.46\times$, and $16.9\times$ respectively and provide practical robustness guarantee. Our codes are available at https://github.com/Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction.

[328]  arXiv:2404.19652 [pdf, other]
Title: VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data. The code and datasets will be made available at the https://VimTextSpotter.github.io.

[329]  arXiv:2404.19654 [pdf, other]
Title: Masked Multi-Query Slot Attention for Unsupervised Object Discovery
Comments: Paper accepted for presentation at IJCNN 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Unsupervised object discovery is becoming an essential line of research for tackling recognition problems that require decomposing an image into entities, such as semantic segmentation and object detection. Recently, object-centric methods that leverage self-supervision have gained popularity, due to their simplicity and adaptability to different settings and conditions. However, those methods do not exploit effective techniques already employed in modern self-supervised approaches. In this work, we consider an object-centric approach in which DINO ViT features are reconstructed via a set of queried representations called slots. Based on that, we propose a masking scheme on input features that selectively disregards the background regions, inducing our model to focus more on salient objects during the reconstruction phase. Moreover, we extend the slot attention to a multi-query approach, allowing the model to learn multiple sets of slots, producing more stable masks. During training, these multiple sets of slots are learned independently while, at test time, these sets are merged through Hungarian matching to obtain the final slots. Our experimental results and ablations on the PASCAL-VOC 2012 dataset show the importance of each component and highlight how their combination consistently improves object localization. Our source code is available at: https://github.com/rishavpramanik/maskedmultiqueryslot

[330]  arXiv:2404.19656 [pdf, other]
Title: Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving
Comments: Accepted to be published at the 2024 35th IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Korea, June 2 - 5, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The foundational role of datasets in defining the capabilities of deep learning models has led to their rapid proliferation. At the same time, published research focusing on the process of dataset development for environment perception in automated driving has been scarce, thereby reducing the applicability of openly available datasets and impeding the development of effective environment perception systems. Sensor-based, mapless automated driving is one of the contexts where this limitation is evident. While leveraging real-time sensor data, instead of pre-defined HD maps promises enhanced adaptability and safety by effectively navigating unexpected environmental changes, it also increases the demands on the scope and complexity of the information provided by the perception system.
To address these challenges, we propose a scenario- and capability-based approach to dataset development. Grounded in the principles of ISO 21448 (safety of the intended functionality, SOTIF), extended by ISO/TR 4804, our approach facilitates the structured derivation of dataset requirements. This not only aids in the development of meaningful new datasets but also enables the effective comparison of existing ones. Applying this methodology to a broad range of existing lane detection datasets, we identify significant limitations in current datasets, particularly in terms of real-world applicability, a lack of labeling of critical features, and an absence of comprehensive information for complex driving maneuvers.

[331]  arXiv:2404.19660 [pdf, other]
Title: Decoder Decomposition for the Analysis of the Latent Space of Nonlinear Autoencoders With Wind-Tunnel Experimental Data
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)

Turbulent flows are chaotic and multi-scale dynamical systems, which have large numbers of degrees of freedom. Turbulent flows, however, can be modelled with a smaller number of degrees of freedom when using the appropriate coordinate system, which is the goal of dimensionality reduction via nonlinear autoencoders. Autoencoders are expressive tools, but they are difficult to interpret. The goal of this paper is to propose a method to aid the interpretability of autoencoders. This is the decoder decomposition. First, we propose the decoder decomposition, which is a post-processing method to connect the latent variables to the coherent structures of flows. Second, we apply the decoder decomposition to analyse the latent space of synthetic data of a two-dimensional unsteady wake past a cylinder. We find that the dimension of latent space has a significant impact on the interpretability of autoencoders. We identify the physical and spurious latent variables. Third, we apply the decoder decomposition to the latent space of wind-tunnel experimental data of a three-dimensional turbulent wake past a bluff body. We show that the reconstruction error is a function of both the latent space dimension and the decoder size, which are correlated. Finally, we apply the decoder decomposition to rank and select latent variables based on the coherent structures that they represent. This is useful to filter unwanted or spurious latent variables, or to pinpoint specific coherent structures of interest. The ability to rank and select latent variables will help users design and interpret nonlinear autoencoders.

[332]  arXiv:2404.19664 [pdf, other]
Title: Towards Generalist Robot Learning from Internet Video: A Survey
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for developing general-purpose robots.
We open with an overview of fundamental concepts relevant to the LfV-for-robotics setting. This includes a discussion of the exciting benefits LfV methods can offer (e.g., improved generalization beyond the available robot data) and commentary on key LfV challenges (e.g., challenges related to missing information in video and LfV distribution shifts). Our literature review begins with an analysis of video foundation model techniques that can extract knowledge from large, heterogeneous video datasets. Next, we review methods that specifically leverage video data for robot learning. Here, we categorise work according to which RL knowledge modality benefits from the use of video data. We additionally highlight techniques for mitigating LfV challenges, including reviewing action representations that address the issue of missing action labels in video.
Finally, we examine LfV datasets and benchmarks, before concluding the survey by discussing challenges and opportunities in LfV. Here, we advocate for scalable approaches that can leverage the full range of available data and that target the key benefits of LfV. Overall, we hope this survey will serve as a comprehensive reference for the emerging field of LfV, catalysing further research in the area, and ultimately facilitating progress towards obtaining general-purpose robots.

[333]  arXiv:2404.19666 [pdf, other]
Title: Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity
Authors: Lei Wang, Desen Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account extensive information about the image itself, which limits their performance in less annotated scenarios. Generally speaking, image quality datasets usually contain similar scenes or distortions, and it is inevitable for subjects to compare images to score a reasonable score when scoring. Therefore, In this paper, we proposed Subjective Image Quality Score Preprocessing Method perceptual similarity Subjective Preprocessing (PSP), which exploit the perceptual similarity between images to alleviate subjective bias in less annotated scenarios. Specifically, we model subjective scoring as a conditional probability model based on perceptual similarity with previously scored images, called subconscious reference scoring. The reference images are stored by a neighbor dictionary, which is obtained by a normalized vector dot-product based nearest neighbor search of the images' perceptual depth features. Then the preprocessed score is updated by the exponential moving average (EMA) of the subconscious reference scoring, called similarity regularized EMA. Our experiments on multiple datasets (LIVE, TID2013, CID2013) show that this method can effectively remove the bias of the subjective scores. Additionally, Experiments prove that the Preprocesed dataset can improve the performance of downstream IQA tasks very well.

[334]  arXiv:2404.19668 [pdf, other]
Title: SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks
Comments: 10 pages, 4 figures, accepted at NICE 2024
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quantization, quantization-aware training (QAT), and their application to SNNs, the precision reduction of state variables during training has been largely overlooked, potentially diminishing inference performance. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets. We provide an ablation analysis of the effects of weight and state quantization, both individually and combined, and how they impact models. Our comprehensive empirical evaluation includes full precision, 8-bit, 4-bit, and 2-bit quantized SNNs, using QAT, stateful QAT (SQUAT), and post-training quantization methods. The findings indicate that the combination of QAT and SQUAT enhance performance the most, but given the choice of one or the other, QAT improves performance by the larger degree. These trends are consistent all datasets. Our methods have been made available in our Python library snnTorch: https://github.com/jeshraghian/snntorch.

[335]  arXiv:2404.19669 [pdf, other]
Title: Enhancing Predictive Accuracy in Pharmaceutical Sales Through An Ensemble Kernel Gaussian Process Regression Approach
Comments: 6 pages, 5 figures
Subjects: Machine Learning (cs.LG)

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Mat\'ern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Mat\'ern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an \( R^2 \) score near 1.0, and significantly lower values in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

[336]  arXiv:2404.19671 [pdf, other]
Title: ML-based handover prediction over a real O-RAN deployment using RAN Intelligent controller
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

O-RAN introduces intelligent and flexible network control in all parts of the network. The use of controllers with open interfaces allow us to gather real time network measurements and make intelligent/informed decision. The work in this paper focuses on developing a use-case for open and reconfigurable networks to investigate the possibility to predict handover events and understand the value of such predictions for all stakeholders that rely on the communication network to conduct their business. We propose a Long-Short Term Memory Machine Learning approach that takes standard Radio Access Network measurements to predict handover events. The models were trained on real network data collected from a commercial O-RAN setup deployed in our OpenIreland testbed. Our results show that the proposed approach can be optimized for either recall or precision, depending on the defined application level objective. We also link the performance of the Machine Learning (ML) algorithm to the network operation cost. Our results show that ML-based matching between the required and available resources can reduce operational cost by more than 80%, compared to long term resource purchases.

[337]  arXiv:2404.19673 [pdf, ps, other]
Title: Neural Controlled Differential Equations with Quantum Hidden Evolutions
Comments: Code available at: this https URL
Subjects: Machine Learning (cs.LG)

We introduce a class of neural controlled differential equation inspired by quantum mechanics. Neural quantum controlled differential equations (NQDEs) model the dynamics by analogue of the Schr\"{o}dinger equation. Specifically, the hidden state represents the wave function, and its collapse leads to an interpretation of the classification probability. We implement and compare the results of four variants of NQDEs on a toy spiral classification problem.

[338]  arXiv:2404.19675 [pdf, other]
Title: Deep Learning for Educational Data Science
Comments: 18 pages. To be published in Trust and Inclusion in AI-Mediated Education: Where Human Learning Meets Learning Machines by Springer International
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science -- a field consisting of various interrelated research communities -- have turned their attention to leveraging these powerful algorithms within the domain of education. Use cases range from advanced knowledge tracing models that can leverage open-ended student essays or snippets of code to automatic affect and behavior detectors that can identify when a student is frustrated or aimlessly trying to solve problems unproductively -- and much more. This chapter provides a brief introduction to deep learning, describes some of its advantages and limitations, presents a survey of its many uses in education, and discusses how it may further come to shape the field of educational data science.

[339]  arXiv:2404.19677 [pdf, ps, other]
Title: A Comprehensive Analysis of Pegasus Spyware and Its Implications for Digital Privacy and Security
Authors: Karwan Kareem
Comments: 13 Pages
Subjects: Cryptography and Security (cs.CR)

This paper comprehensively analyzes the Pegasus spyware and its implications for digital privacy and security. The Israeli cyber intelligence company NSO Group's Pegasus has gained recognition as a potent surveillance tool capable of hacking into smartphones and extracting data without the user's knowledge [49], [50]. The research emphasizes the technical aspects of this spyware, its deployment methods, and the controversies surrounding its use. The research also emphasizes the growing worries surrounding digital privacy and security as a result of the prevalent use of advanced spyware. By delving into legal, ethical, and policy issues, the objective of this study is to deliver a holistic understanding of the challenges posed by Pegasus and similar spyware tools. Through a comprehensive examination of the subject, the paper presents potential solutions to mitigate the threats and protect users from invasive surveillance techniques.

[340]  arXiv:2404.19683 [pdf, other]
Title: Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning
Authors: Hao Qin, Weishi Zhang
Subjects: Computer Science and Game Theory (cs.GT)

To address the low efficiency in priority signal control within intelligent transportation systems, this study introduces a novel eight-phase priority signal control method, CBQL-TSP, leveraging a hybrid decision-making framework that integrates cooperative game theory and reinforcement learning. This approach conceptualizes the allocation of bus signal priorities as a multi-objective decision-making problem across an eight-phase signal sequence, differentiating between priority and non-priority phases. It employs a cooperative game model to facilitate this differentiation. The developed hybrid decision-making algorithm, CBQL, effectively tackles the multi-objective decision-making challenges inherent in the eight-phase signal sequence. By computing the Shapley value function, it quantifies the marginal contributions of each participant, which in turn inform the construction of a state transition probability equation based on Shapley value ratios. Compared to conventional control methods, the CBQL-TSP method not only upholds the fairness principles of cooperative game theory but also harnesses the adaptive learning capabilities of Q-Learning. This enables dynamic adjustments to signal timing in response to real-time traffic conditions, significantly enhancing the flexibility and efficiency of priority signal control.

[341]  arXiv:2404.19686 [pdf, other]
Title: ColosSUMO: Evaluating Cooperative Driving Applications with Colosseum
Subjects: Networking and Internet Architecture (cs.NI)

The quest for safer and more efficient transportation through cooperative, connected and automated mobility (CCAM) calls for realistic performance analysis tools, especially with respect to wireless communications. While the simulation of existing and emerging communication technologies is an option, the most realistic results can be obtained by employing real hardware, as done for example in field operational tests (FOTs). For CCAM, however, performing FOTs requires vehicles, which are generally expensive. and performing such tests can be very demanding in terms of manpower, let alone considering safety issues. Mobility simulation with hardware-in-the-loop (HIL) serves as a middle ground, but current solutions lack flexibility and reconfigurability. This work thus proposes ColosSUMO as a way to couple Colosseum, the world's largest wireless network emulator, with the SUMO mobility simulator, showing its design concept, how it can be exploited to simulate realistic vehicular environments, and its flexibility in terms of communication technologies.

[342]  arXiv:2404.19693 [pdf, other]
Title: SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration
Comments: 11 pages, 13 figures
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

Generating preferred images using generative adversarial networks (GANs) is challenging owing to the high-dimensional nature of latent space. In this study, we propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN, creating meaningful subspaces. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user. Experiments show that our method is more efficient in generating preferred images than the baseline methods. Furthermore, changes in preferred images during image generation or the display of entirely different image styles were observed to provide new inspirations, subsequently altering user preferences. This highlights the dynamic nature of user preferences, which our proposed approach recognizes and enhances.

[343]  arXiv:2404.19696 [pdf, other]
Title: Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Comments: CVPR 2024. The first two authors contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

3D visual grounding is a challenging task that often requires direct and dense supervision, notably the semantic label for each object in the scene. In this paper, we instead study the naturally supervised setting that learns from only 3D scene and QA pairs, where prior works underperform. We propose the Language-Regularized Concept Learner (LARC), which uses constraints from language as regularization to significantly improve the accuracy of neuro-symbolic concept learners in the naturally supervised setting. Our approach is based on two core insights: the first is that language constraints (e.g., a word's relation to another) can serve as effective regularization for structured representations in neuro-symbolic models; the second is that we can query large language models to distill such constraints from language properties. We show that LARC improves performance of prior works in naturally supervised 3D visual grounding, and demonstrates a wide range of 3D visual reasoning capabilities-from zero-shot composition, to data efficiency and transferability. Our method represents a promising step towards regularizing structured visual reasoning frameworks with language-based priors, for learning in settings without dense supervision.

[344]  arXiv:2404.19702 [pdf, other]
Title: GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks. Our project webpage is available at: https://sai-bi.github.io/project/gs-lrm/ .

[345]  arXiv:2404.19705 [pdf, other]
Title: When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.

[346]  arXiv:2404.19706 [pdf, other]
Title: RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Comments: To be published in ACM SIGGRAPH 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose RTG-SLAM, a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. RTG-SLAM features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of real large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.

[347]  arXiv:2404.19708 [pdf, other]
Title: Harmonic LLMs are Trustworthy
Comments: 15 pages, 4 figures, 14 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time, based upon the local deviation from harmoniticity, denoted as $\gamma$. To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. We conduct human annotation experiments to show the positive correlation of $\gamma$ with false or misleading answers, and demonstrate that following the gradient of $\gamma$ in stochastic gradient ascent efficiently exposes adversarial prompts. Measuring $\gamma$ across thousands of queries in popular LLMs (GPT-4, ChatGPT, Claude-2.1, Mixtral-8x7B, Smaug-72B, Llama2-7B, and MPT-7B) allows us to estimate the liklihood of wrong or hallucinatory answers automatically and quantitatively rank the reliability of these models in various objective domains (Web QA, TruthfulQA, and Programming QA). Across all models and domains tested, human ratings confirm that $\gamma \to 0$ indicates trustworthiness, and the low-$\gamma$ leaders among these models are GPT-4, ChatGPT, and Smaug-72B.

[348]  arXiv:2404.19710 [pdf, other]
Title: A rank decomposition for the topological classification of neural representations
Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Neurons and Cognition (q-bio.NC)

Neural networks can be thought of as applying a transformation to an input dataset. The way in which they change the topology of such a dataset often holds practical significance for many tasks, particularly those demanding non-homeomorphic mappings for optimal solutions, such as classification problems. In this work, we leverage the fact that neural networks are equivalent to continuous piecewise-affine maps, whose rank can be used to pinpoint regions in the input space that undergo non-homeomorphic transformations, leading to alterations in the topological structure of the input dataset. Our approach enables us to make use of the relative homology sequence, with which one can study the homology groups of the quotient of a manifold $\mathcal{M}$ and a subset $A$, assuming some minimal properties on these spaces.
As a proof of principle, we empirically investigate the presence of low-rank (topology-changing) affine maps as a function of network width and mean weight. We show that in randomly initialized narrow networks, there will be regions in which the (co)homology groups of a data manifold can change. As the width increases, the homology groups of the input manifold become more likely to be preserved. We end this part of our work by constructing highly non-random wide networks that do not have this property and relating this non-random regime to Dale's principle, which is a defining characteristic of biological neural networks.
Finally, we study simple feedforward networks trained on MNIST, as well as on toy classification and regression tasks, and show that networks manipulate the topology of data differently depending on the continuity of the task they are trained on.

[349]  arXiv:2404.19713 [pdf, ps, other]
Title: Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models
Authors: Scott Sumpter
Comments: 22 pages but 12 are appendices which are examples of the main text. 3 figures, 4 tables
Subjects: Computation and Language (cs.CL)

This study introduces a transformative framework for medical education by integrating semi-structured data with Large Language Models (LLMs), primarily OpenAIs ChatGPT3.5, to automate the creation of medical simulation scenarios. Traditionally, developing these scenarios was a time-intensive process with limited flexibility to meet diverse educational needs. The proposed approach utilizes AI to efficiently generate detailed, clinically relevant scenarios that are tailored to specific educational objectives. This innovation has significantly reduced the time and resources required for scenario development, allowing for a broader variety of simulations. Preliminary feedback from educators and learners has shown enhanced engagement and improved knowledge acquisition, confirming the effectiveness of this AI-enhanced methodology in simulation-based learning. The integration of structured data with LLMs not only streamlines the creation process but also offers a scalable, dynamic solution that could revolutionize medical training, highlighting the critical role of AI in advancing educational outcomes and patient care standards.

[350]  arXiv:2404.19714 [pdf, other]
Title: ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents
Comments: 4 pages
Subjects: Computation and Language (cs.CL)

This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop, explicitly targeting the classification challenges within tweet data. Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety. Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children. We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets. We also presented some data augmentation methods to see their impact on the model performance. Finally, the systems obtained the best F1 score of 0.627 in Task 3 and the best F1 score of 0.841 in Task 5.

[351]  arXiv:2404.19715 [pdf, other]
Title: Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns
Subjects: Cryptography and Security (cs.CR)

The integration of large language models (LLMs) into various pipelines is increasingly widespread, effectively automating many manual tasks and often surpassing human capabilities. Cybersecurity researchers and practitioners have recognised this potential. Thus, they are actively exploring its applications, given the vast volume of heterogeneous data that requires processing to identify anomalies, potential bypasses, attacks, and fraudulent incidents. On top of this, LLMs' advanced capabilities in generating functional code, comprehending code context, and summarising its operations can also be leveraged for reverse engineering and malware deobfuscation. To this end, we delve into the deobfuscation capabilities of state-of-the-art LLMs. Beyond merely discussing a hypothetical scenario, we evaluate four LLMs with real-world malicious scripts used in the notorious Emotet malware campaign. Our results indicate that while not absolutely accurate yet, some LLMs can efficiently deobfuscate such payloads. Thus, fine-tuning LLMs for this task can be a viable potential for future AI-powered threat intelligence pipelines in the fight against obfuscated malware.

[352]  arXiv:2404.19717 [pdf, other]
Title: Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and ORNL, was performed largely automatically by a simple replication tool, a script that invoked Globus to transfer large bundles of files while tracking progress in a database. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable benefits that can accrue from the adoption of performant data replication infrastructure.

[353]  arXiv:2404.19719 [pdf, other]
Title: The lazy (NTK) and rich ($μ$P) regimes: a gentle tutorial
Authors: Dhruva Karkada
Comments: 22 pages, 7 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A central theme of the modern machine learning paradigm is that larger neural networks achieve better performance on a variety of metrics. Theoretical analyses of these overparameterized models have recently centered around studying very wide neural networks. In this tutorial, we provide a nonrigorous but illustrative derivation of the following fact: in order to train wide networks effectively, there is only one degree of freedom in choosing hyperparameters such as the learning rate and the size of the initial weights. This degree of freedom controls the richness of training behavior: at minimum, the wide network trains lazily like a kernel machine, and at maximum, it exhibits feature learning in the so-called $\mu$P regime. In this paper, we explain this richness scale, synthesize recent research results into a coherent whole, offer new perspectives and intuitions, and provide empirical evidence supporting our claims. In doing so, we hope to encourage further study of the richness scale, as it may be key to developing a scientific theory of feature learning in practical deep neural networks.

[354]  arXiv:2404.19721 [pdf, ps, other]
Title: PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input.

[355]  arXiv:2404.19722 [pdf, other]
Title: PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the challenge of content diversity and controllability in pedestrian simulation for driving scenarios. Recent pedestrian animation frameworks have a significant limitation wherein they primarily focus on either following trajectory [46] or the content of the reference video [57], consequently overlooking the potential diversity of human motion within such scenarios. This limitation restricts the ability to generate pedestrian behaviors that exhibit a wider range of variations and realistic motions and therefore restricts its usage to provide rich motion content for other components in the driving simulation system, e.g., suddenly changed motion to which the autonomous vehicle should respond. In our approach, we strive to surpass the limitation by showcasing diverse human motions obtained from various sources, such as generated human motions, in addition to following the given trajectory. The fundamental contribution of our framework lies in combining the motion tracking task with trajectory following, which enables the tracking of specific motion parts (e.g., upper body) while simultaneously following the given trajectory by a single policy. This way, we significantly enhance both the diversity of simulated human motion within the given scenario and the controllability of the content, including language-based control. Our framework facilitates the generation of a wide range of human motions, contributing to greater realism and adaptability in pedestrian simulations for driving scenarios. More information is on our project page https://wangjingbo1219.github.io/papers/CVPR2024_PACER_PLUS/PACERPLUSPage.html .

[356]  arXiv:2404.19724 [pdf, ps, other]
Title: Sound and Complete Proof Rules for Probabilistic Termination
Subjects: Logic in Computer Science (cs.LO)

Termination is a fundamental question in the analysis of probabilistic imperative programs. We consider the qualitative and quantitative probabilistic termination problems for an imperative programming model with discrete probabilistic choice and demonic bounded nondeterminism. The qualitative question asks if the program terminates almost surely, no matter how nondeterminism is resolved; the quantitative question asks for a bound on the probability of termination. Despite a long and rich literature on the topic, no sound and relatively complete proof systems were known for this problem. We provide the first sound and relatively complete proof rules for proving qualitative and quantitative termination in the assertion language of arithmetic. Our proof rules use supermartingales as estimates of likelihood of the prgroam's evolution - the key insight is to use appropriately defined finite-state sub-instances. Our completeness result shows how to construct a suitable supermartingales from an almost-surely terminating program. We also show that proofs of termination in many existing proof systems can be transformed to proofs in our system, pointing to its applicability in practice. As an application of our proof rule, we show a proof of almost sure termination for the two-dimensional random walker.

[357]  arXiv:2404.19725 [pdf, other]
Title: Fairness Without Demographics in Human-Centered Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) enables collaborative model training while preserving data privacy, making it suitable for decentralized human-centered AI applications. However, a significant research gap remains in ensuring fairness in these systems. Current fairness strategies in FL require knowledge of bias-creating/sensitive attributes, clashing with FL's privacy principles. Moreover, in human-centered datasets, sensitive attributes may remain latent. To tackle these challenges, we present a novel bias mitigation approach inspired by "Fairness without Demographics" in machine learning. The presented approach achieves fairness without needing knowledge of sensitive attributes by minimizing the top eigenvalue of the Hessian matrix during training, ensuring equitable loss landscapes across FL participants. Notably, we introduce a novel FL aggregation scheme that promotes participating models based on error rates and loss landscape curvature attributes, fostering fairness across the FL system. This work represents the first approach to attaining "Fairness without Demographics" in human-centered FL. Through comprehensive evaluation, our approach demonstrates effectiveness in balancing fairness and efficacy across various real-world applications, FL setups, and scenarios involving single and multiple bias-inducing factors, representing a significant advancement in human-centered FL.

[358]  arXiv:2404.19729 [pdf, ps, other]
Title: A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

External knowledge graphs (KGs) can be used to augment large language models (LLMs), while simultaneously providing an explainable knowledge base of facts that can be inspected by a human. This approach may be particularly valuable in domains where explainability is critical, like human trafficking data analysis. However, creating KGs can pose challenges. KGs parsed from documents may comprise explicit connections (those directly stated by a document) but miss implicit connections (those obvious to a human although not directly stated). To address these challenges, this preliminary research introduces the GAME-KG framework, standing for "Gaming for Augmenting Metadata and Enhancing Knowledge Graphs." GAME-KG is a federated approach to modifying explicit as well as implicit connections in KGs by using crowdsourced feedback collected through video games. GAME-KG is shown through two demonstrations: a Unity test scenario from Dark Shadows, a video game that collects feedback on KGs parsed from US Department of Justice (DOJ) Press Releases on human trafficking, and a following experiment where OpenAI's GPT-4 is prompted to answer questions based on a modified and unmodified KG. Initial results suggest that GAME-KG can be an effective framework for enhancing KGs, while simultaneously providing an explainable set of structured facts verified by humans.

[359]  arXiv:2404.19733 [pdf, other]
Title: Iterative Reasoning Preference Optimization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy for Llama-2-70B-Chat from 55.6% to 81.6% on GSM8K (and 88.7% with majority voting out of 32 samples), from 12.5% to 20.8% on MATH, and from 77.8% to 86.7% on ARC-Challenge, which outperforms other Llama-2-based models not relying on additionally sourced datasets.

[360]  arXiv:2404.19735 [pdf, ps, other]
Title: Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher
Subjects: Hardware Architecture (cs.AR); Performance (cs.PF); Software Engineering (cs.SE)

Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i)~accelerate designing new graph algorithms, (ii)~to evaluate the contributions on a wide range of graph algorithms, and (iii)~to facilitate easy and fast comparison over different graph frameworks.
To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats.
ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

[361]  arXiv:2404.19737 [pdf, other]
Title: Better & Faster Large Language Models via Multi-token Prediction
Subjects: Computation and Language (cs.CL)

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.

[362]  arXiv:2404.19738 [pdf, other]
Title: DiaryHelper: Exploring the Use of an Automatic Contextual Information Recording Agent for Elicitation Diary Study
Comments: CHI 2024
Subjects: Human-Computer Interaction (cs.HC)

Elicitation diary studies, a type of qualitative, longitudinal research method, involve participants to self-report aspects of events of interest at their occurrences as memory cues for providing details and insights during post-study interviews. However, due to time constraints and lack of motivation, participants' diary entries may be vague or incomplete, impairing their later recall. To address this challenge, we designed an automatic contextual information recording agent, DiaryHelper, based on the theory of episodic memory. DiaryHelper can predict five dimensions of contextual information and confirm with participants. We evaluated the use of DiaryHelper in both the recording period and the elicitation interview through a within-subject study (N=12) over a period of two weeks. Our results demonstrated that DiaryHelper can assist participants in capturing abundant and accurate contextual information without significant burden, leading to a more detailed recall of recorded events and providing greater insights.

[363]  arXiv:2404.19740 [pdf, ps, other]
Title: Almost Envy-Freeness under Weakly Lexicographic Preferences
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

In fair division of indivisible items, domain restriction has played a key role in escaping from negative results and providing structural insights into the computational and axiomatic boundaries of fairness. One notable subdomain of additive preferences, the lexicographic domain, has yielded several positive results in dealing with goods, chores, and mixtures thereof. However, the majority of work within this domain primarily consider strict linear orders over items, which do not allow the modeling of more expressive preferences that contain indifferences (ties). We investigate the most prominent fairness notions of envy-freeness up to any (EFX) or some (EF1) item under weakly lexicographic preferences. For the goods-only setting, we develop an algorithm that can be customized to guarantee EF1, EFX, maximin share (MMS), or a combination thereof, along the efficiency notion of Pareto optimality (PO). From the conceptual perspective, we propose techniques such as preference graphs and potential envy that are independently of interest when dealing with ties. Finally, we demonstrate challenges in dealing with chores and highlight key algorithmic and axiomatic differences of finding EFX solutions with the goods-only setting. Nevertheless, we show that there is an algorithm that always returns an EF1 and PO allocation for the chores-only instances.

[364]  arXiv:2404.19744 [pdf, other]
Title: PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.

[365]  arXiv:2404.19745 [pdf, other]
Title: Analyzing Transport Policies in Developing Countries with ABM
Comments: 7 pages, 2 figures, Annual Simulation Conference ANNSIM 2024
Subjects: Multiagent Systems (cs.MA)

Deciphering travel behavior and mode choices is a critical aspect of effective urban transportation system management, particularly in developing countries where unique socio-economic and cultural conditions complicate decision-making. Agent-based simulations offer a valuable tool for modeling transportation systems, enabling a nuanced understanding and policy impact evaluation. This work aims to shed light on the effects of transport policies and analyzes travel behavior by simulating agents making mode choices for their daily commutes. Agents gather information from the environment and their social network to assess the optimal transport option based on personal satisfaction criteria. Our findings, stemming from simulating a free-fare policy for public transit in a developing-country city, reveal a significant influence on decision-making, fostering public service use while positively influencing pollution levels, accident rates, and travel speed.

[366]  arXiv:2404.19748 [pdf, other]
Title: Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning
Comments: The 26th IEEE International Conference on Computational Science and Engineering (CSE-2023)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Every year, plant parasitic nematodes, one of the major groups of plant pathogens, cause a significant loss of crops worldwide. To mitigate crop yield losses caused by nematodes, an efficient nematode monitoring method is essential for plant and crop disease management. In other respects, efficient nematode detection contributes to medical research and drug discovery, as nematodes are model organisms. With the rapid development of computer technology, computer vision techniques provide a feasible solution for quantifying nematodes or nematode infections. In this paper, we survey and categorise the studies and available datasets on nematode detection through deep-learning models. To stimulate progress in related research, this survey presents the potential state-of-the-art object detection models, training techniques, optimisation techniques, and evaluation metrics for deep learning beginners. Moreover, seven state-of-the-art object detection models are validated on three public datasets and the AgriNema dataset for plant parasitic nematodes to construct a baseline for nematode detection.

[367]  arXiv:2404.19749 [pdf, other]
Title: Scale-Robust Timely Asynchronous Decentralized Learning
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We consider an asynchronous decentralized learning system, which consists of a network of connected devices trying to learn a machine learning model without any centralized parameter server. The users in the network have their own local training data, which is used for learning across all the nodes in the network. The learning method consists of two processes, evolving simultaneously without any necessary synchronization. The first process is the model update, where the users update their local model via a fixed number of stochastic gradient descent steps. The second process is model mixing, where the users communicate with each other via randomized gossiping to exchange their models and average them to reach consensus. In this work, we investigate the staleness criteria for such a system, which is a sufficient condition for convergence of individual user models. We show that for network scaling, i.e., when the number of user devices $n$ is very large, if the gossip capacity of individual users scales as $\Omega(\log n)$, we can guarantee the convergence of user models in finite time. Furthermore, we show that the bounded staleness can only be guaranteed by any distributed opportunistic scheme by $\Omega(n)$ scaling.

[368]  arXiv:2404.19750 [pdf, other]
Title: A Joint Communication and Computation Design for Distributed RISs Assisted Probabilistic Semantic Communication in IIoT
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, the problem of spectral-efficient communication and computation resource allocation for distributed reconfigurable intelligent surfaces (RISs) assisted probabilistic semantic communication (PSC) in industrial Internet-of-Things (IIoT) is investigated. In the considered model, multiple RISs are deployed to serve multiple users, while PSC adopts compute-then-transmit protocol to reduce the transmission data size. To support high-rate transmission, the semantic compression ratio, transmit power allocation, and distributed RISs deployment must be jointly considered. This joint communication and computation problem is formulated as an optimization problem whose goal is to maximize the sum semantic-aware transmission rate of the system under total transmit power, phase shift, RIS-user association, and semantic compression ratio constraints. To solve this problem, a many-to-many matching scheme is proposed to solve the RIS-user association subproblem, the semantic compression ratio subproblem is addressed following greedy policy, while the phase shift of RIS can be optimized using the tensor based beamforming. Numerical results verify the superiority of the proposed algorithm.

[369]  arXiv:2404.19752 [pdf, other]
Title: Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning models propose multiple initial captions; 2) verification, where a large language model (LLM) utilizes tools such as object detection and VQA models to fact-check proposed captions; 3) captioning, where an LLM generates the final caption by summarizing caption proposals and the fact check verification results. In this step, VFC can flexibly generate captions in various styles following complex instructions. We conduct comprehensive captioning evaluations using four metrics: 1) CLIP-Score for image-text similarity; 2) CLIP-Image-Score for measuring the image-image similarity between the original and the reconstructed image generated by a text-to-image model using the caption. 3) human study on Amazon Mechanical Turk; 4) GPT-4V for fine-grained evaluation. Evaluation results show that VFC outperforms state-of-the-art open-sourced captioning methods for 2D images on the COCO dataset and 3D assets on the Objaverse dataset. Our study demonstrates that by combining open-source models into a pipeline, we can attain captioning capability comparable to proprietary models such as GPT-4V, despite being over 10x smaller in model size.

[370]  arXiv:2404.19753 [pdf, other]
Title: DOCCI: Descriptions of Connected and Contrasting Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that were taken, curated and donated by a single researcher intent on capturing key challenges such as spatial relations, counting, text rendering, world knowledge, and more. We instruct human annotators to create comprehensive descriptions for each image; these average 136 words in length and are crafted to clearly distinguish each image from those that are related or similar. Each description is highly compositional and typically encompasses multiple challenges. Through both quantitative and qualitative analyses, we demonstrate that DOCCI serves as an effective training resource for image-to-text generation -- a PaLI 5B model finetuned on DOCCI shows equal or superior results compared to highly-performant larger models like LLaVA-1.5 7B and InstructBLIP 7B. Furthermore, we show that DOCCI is a useful testbed for text-to-image generation, highlighting the limitations of current text-to-image models in capturing long descriptions and fine details.

[371]  arXiv:2404.19755 [pdf, ps, other]
Title: Analysis and Enhancement of Lossless Image Compression in JPEG-XL
Authors: Rustam Mamedov
Subjects: Information Theory (cs.IT)

As the demand for digital information grows in fields like medicine, remote sensing, and archival, efficient image compression becomes crucial. This paper focuses on lossless image compression, vital for managing the increasing volume of image data without quality loss. Current research emphasizes techniques such as predictive coding, transform coding, and context modeling to improve compression ratios. This study evaluates lossless compression in JPEG XL, the latest standard in the JPEG family, and aims to enhance its compression ratio by modifying the codebase. Results show that while overall compression levels are below the original codec, one prediction method improves compression for specific image types. This study offers insights into enhancing lossless compression performance and suggests possibilities for future advancements in this area.

[372]  arXiv:2404.19756 [pdf, other]
Title: KAN: Kolmogorov-Arnold Networks
Comments: 48 pages, 20 figures. Codes are available at this https URL
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

[373]  arXiv:2404.19758 [pdf, other]
Title: Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D scene generation has quickly become a challenging new research direction, fueled by consistent improvements of 2D generative diffusion models. Most prior work in this area generates scenes by iteratively stitching newly generated frames with existing geometry. These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing scene representation. These approaches are then often evaluated via a text metric, measuring the similarity between the generated images and a given text prompt. In this work, we make two fundamental contributions to the field of 3D scene generation. First, we note that lifting images to 3D with a monocular depth estimation model is suboptimal as it ignores the geometry of the existing scene. We thus introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process, resulting in improved geometric coherence of the scene. Second, we introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry, and thus measures the quality of the structure of the scene.

[374]  arXiv:2404.19759 [pdf, other]
Title: MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Comments: MotionLCM project version 1.0
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation. To ensure effective controllability, we incorporate a motion ControlNet within the latent space of MotionLCM and enable explicit control signals (e.g., pelvis trajectory) in the vanilla motion space to control the generation process directly, similar to controlling other latent-free diffusion models for motion generation. By employing these techniques, our approach can generate human motions with text and control signals in real-time. Experimental results demonstrate the remarkable generation and controlling capabilities of MotionLCM while maintaining real-time runtime efficiency.

[375]  arXiv:2404.19760 [pdf, other]
Title: Lightplane: Highly-Scalable Components for Neural 3D Fields
Comments: Project Page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, which significantly reduce memory usage in 2D-3D mapping. These innovations enable the processing of vastly more and higher resolution images with small memory and computational costs. We demonstrate their utility in various applications, from benefiting single-scene optimization with image-level losses to realizing a versatile pipeline for dramatically scaling 3D reconstruction and generation. Code: \url{https://github.com/facebookresearch/lightplane}.

Cross-lists for Wed, 1 May 24

[376]  arXiv:2404.18247 (cross-list from hep-th) [pdf, other]
Title: Classical integrability in the presence of a cosmological constant: analytic and machine learning results
Comments: 32 pages, 7 figures
Subjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); Mathematical Physics (math-ph)

We study the integrability of two-dimensional theories that are obtained by a dimensional reduction of certain four-dimensional gravitational theories describing the coupling of Maxwell fields and neutral scalar fields to gravity in the presence of a potential for the neutral scalar fields. By focusing on a certain solution subspace, we show that a subset of the equations of motion in two dimensions are the compatibility conditions for a modified version of the Breitenlohner-Maison linear system. Subsequently, we study the Liouville integrability of the 2D models encoding the chosen 4D solution subspace from a one-dimensional point of view by constructing Lax pair matrices. In this endeavour, we successfully employ a linear neural network to search for Lax pair matrices for these models, thereby illustrating how machine learning approaches can be effectively implemented to augment the identification of integrable structures in classical systems.

[377]  arXiv:2404.18946 (cross-list from physics.optics) [pdf, other]
Title: Align-Free Multi-Plane Phase Retrieval
Subjects: Optics (physics.optics); Information Retrieval (cs.IR); Image and Video Processing (eess.IV)

The multi-plane phase retrieval method provides a budget-friendly and effective way to perform phase imaging, yet it often encounters alignment challenges due to shifts along the optical axis in experiments. Traditional methods, such as employing beamsplitters instead of mechanical stage movements or adjusting focus using tunable light sources, add complexity to the setup required for multi-plane phase retrieval. Attempts to address these issues computationally face difficulties due to the variable impact of diffraction, which renders conventional homography techniques inadequate. In our research, we introduce a novel Adaptive Cascade Calibrated (ACC) strategy for multi-plane phase retrieval that overcomes misalignment issues. This technique detects feature points within the refocused sample space and calculates the transformation matrix for neighboring planes on-the-fly to digitally adjust measurements, facilitating alignment-free multi-plane phase retrieval. This approach not only avoids the need for complex and expensive optical hardware but also simplifies the imaging setup, reducing overall costs. The effectiveness of our method is validated through simulations and real-world optical experiments.

[378]  arXiv:2404.18953 (cross-list from math.OC) [pdf, other]
Title: A Knowledge-driven Memetic Algorithm for the Energy-efficient Distributed Homogeneous Flow Shop Scheduling Problem
Comments: 14 pages
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

The reduction of carbon emissions in the manufacturing industry holds significant importance in achieving the national "double carbon" target. Ensuring energy efficiency is a crucial factor to be incorporated into future generation manufacturing systems. In this study, energy consumption is considered in the distributed homogeneous flow shop scheduling problem (DHFSSP). A knowledge-driven memetic algorithm (KDMA) is proposed to address the energy-efficient DHFSSP (EEDHFSSP). KDMA incorporates a collaborative initialization strategy to generate high-quality initial populations. Furthermore, several algorithmic improvements including update strategy, local search strategy, and carbon reduction strategy are employed to improve the search performance of the algorithm. The effectiveness of KDMA in solving EEDHFSSP is verified through extensive simulation experiments. It is evident that KDMA outperforms many state-of-the-art algorithms across various evaluation aspects.

[379]  arXiv:2404.18960 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)

The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state of the art techniques for enhanced information extraction and better informed decisions. Whilst multiple methods for measuring phenotypic similarity and interrogating profiles have been developed, the field is severely lacking standardized benchmarks using appropriate data splitting for training and unbiased evaluation of machine learning methods. To address this, we have developed 'Leak Proof CMap' and exemplified its application to a set of common transcriptomic and generic phenotypic similarity methods along with an exemplar triplet loss-based method. Benchmarking in three critical performance areas (compactness, distinctness, and uniqueness) is conducted using carefully crafted data splits ensuring no similar cell lines or treatments with shared or closely matching responses or mechanisms of action are present in training, validation, or test sets. This enables testing of models with unseen samples akin to exploring treatments with novel modes of action in novel patient derived cell lines. With a carefully crafted benchmark and data splitting regime in place, the tooling now exists to create performant phenotypic similarity methods for use in personalized medicine (novel cell lines) and to better augment high throughput phenotypic screening technologies with the L1000 transcriptomic technology.

[380]  arXiv:2404.18981 (cross-list from eess.IV) [pdf, other]
Title: Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Comments: Accepted in ISBI 2024
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.

[381]  arXiv:2404.19005 (cross-list from quant-ph) [pdf, other]
Title: Fault-tolerant compiling of classically hard IQP circuits on hypercubes
Comments: 27 + 20 pages, 13 Figures
Subjects: Quantum Physics (quant-ph); Quantum Gases (cond-mat.quant-gas); Statistical Mechanics (cond-mat.stat-mech); Computational Complexity (cs.CC); Atomic Physics (physics.atom-ph)

Realizing computationally complex quantum circuits in the presence of noise and imperfections is a challenging task. While fault-tolerant quantum computing provides a route to reducing noise, it requires a large overhead for generic algorithms. Here, we develop and analyze a hardware-efficient, fault-tolerant approach to realizing complex sampling circuits. We co-design the circuits with the appropriate quantum error correcting codes for efficient implementation in a reconfigurable neutral atom array architecture, constituting what we call a fault-tolerant compilation of the sampling algorithm. Specifically, we consider a family of $[[2^D , D, 2]]$ quantum error detecting codes whose transversal and permutation gate set can realize arbitrary degree-$D$ instantaneous quantum polynomial (IQP) circuits. Using native operations of the code and the atom array hardware, we compile a fault-tolerant and fast-scrambling family of such IQP circuits in a hypercube geometry, realized recently in the experiments by Bluvstein et al. [Nature 626, 7997 (2024)]. We develop a theory of second-moment properties of degree-$D$ IQP circuits for analyzing hardness and verification of random sampling by mapping to a statistical mechanics model. We provide evidence that sampling from hypercube IQP circuits is classically hard to simulate and analyze the linear cross-entropy benchmark (XEB) in comparison to the average fidelity. To realize a fully scalable approach, we first show that Bell sampling from degree-$4$ IQP circuits is classically intractable and can be efficiently validated. We further devise new families of $[[O(d^D),D,d]]$ color codes of increasing distance $d$, permitting exponential error suppression for transversal IQP sampling. Our results highlight fault-tolerant compiling as a powerful tool in co-designing algorithms with specific error-correcting codes and realistic hardware.

[382]  arXiv:2404.19053 (cross-list from stat.CO) [pdf, other]
Title: Fast Adaptive Fourier Integration for Spectral Densities of Gaussian Processes
Subjects: Computation (stat.CO); Numerical Analysis (math.NA)

The specification of a covariance function is of paramount importance when employing Gaussian process models, but the requirement of positive definiteness severely limits those used in practice. Designing flexible stationary covariance functions is, however, straightforward in the spectral domain, where one needs only to supply a positive and symmetric spectral density. In this work, we introduce an adaptive integration framework for efficiently and accurately evaluating covariance functions and their derivatives at irregular locations directly from \textit{any} continuous, integrable spectral density. In order to make this approach computationally tractable, we employ high-order panel quadrature, the nonuniform fast Fourier transform, and a Nyquist-informed panel selection heuristic, and derive novel algebraic truncation error bounds which are used to monitor convergence. As a result, we demonstrate several orders of magnitude speedup compared to naive uniform quadrature approaches, allowing us to evaluate covariance functions from slowly decaying, singular spectral densities at millions of locations to a user-specified tolerance in seconds on a laptop. We then apply our methodology to perform gradient-based maximum likelihood estimation using a previously numerically infeasible long-memory spectral model for wind velocities below the atmospheric boundary layer.

[383]  arXiv:2404.19073 (cross-list from stat.ML) [pdf, other]
Title: Learning Sparse High-Dimensional Matrix-Valued Graphical Models From Dependent Data
Comments: 16 pages, 2 figures, 1 table
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)

We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional, stationary matrix-variate Gaussian time series. All past work on high-dimensional matrix graphical models assumes that independent and identically distributed (i.i.d.) observations of the matrix-variate are available. Here we allow dependent observations. We consider a sparse-group lasso-based frequency-domain formulation of the problem with a Kronecker-decomposable power spectral density (PSD), and solve it via an alternating direction method of multipliers (ADMM) approach. The problem is bi-convex which is solved via flip-flop optimization. We provide sufficient conditions for local convergence in the Frobenius norm of the inverse PSD estimators to the true value. This result also yields a rate of convergence. We illustrate our approach using numerical examples utilizing both synthetic and real data.

[384]  arXiv:2404.19075 (cross-list from eess.IV) [pdf, other]
Title: Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction
Comments: submitted to Nature Machine Intelligence
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Numerical Analysis (math.NA)

4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.

[385]  arXiv:2404.19083 (cross-list from eess.IV) [pdf, other]
Title: Longitudinal Mammogram Risk Prediction
Comments: Submitted to MICCAI 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Breast cancer is one of the leading causes of mortality among women worldwide. Early detection and risk assessment play a crucial role in improving survival rates. Therefore, annual or biennial mammograms are often recommended for screening in high-risk groups. Mammograms are typically interpreted by expert radiologists based on the Breast Imaging Reporting and Data System (BI-RADS), which provides a uniform way to describe findings and categorizes them to indicate the level of concern for breast cancer. Recently, machine learning (ML) and computational approaches have been developed to automate and improve the interpretation of mammograms. However, both BI-RADS and the ML-based methods focus on the analysis of data from the present and sometimes the most recent prior visit. While it is clear that temporal changes in image features of the longitudinal scans should carry value for quantifying breast cancer risk, no prior work has conducted a systematic study of this. In this paper, we extend a state-of-the-art ML model to ingest an arbitrary number of longitudinal mammograms and predict future breast cancer risk. On a large-scale dataset, we demonstrate that our model, LoMaR, achieves state-of-the-art performance when presented with only the present mammogram. Furthermore, we use LoMaR to characterize the predictive value of prior visits. Our results show that longer histories (e.g., up to four prior annual mammograms) can significantly boost the accuracy of predicting future breast cancer risk, particularly beyond the short-term. Our code and model weights are available at https://github.com/batuhankmkaraman/LoMaR.

[386]  arXiv:2404.19105 (cross-list from quant-ph) [pdf, other]
Title: Optimal tradeoffs for estimating Pauli observables
Comments: 59 pages, 1 figure
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $\rho$, estimate $\text{tr}(P\rho)$ for some set of Pauli operators $P$ to within additive error $\epsilon$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate $|\text{tr}(P\rho)|$ for all $P$ using $O(n/\epsilon^4)$ copies, but with $k\le n$ qubits of memory, $\Omega(2^{(n-k)/3})$ copies are needed.
These results leave open several natural questions. How does this picture change in the physically relevant setting where one only needs to estimate a certain subset of Paulis? What is the optimal dependence on $\epsilon$? What is the optimal tradeoff between quantum memory and sample complexity?
We answer all of these questions. For any subset $A$ of Paulis and any family of measurement strategies, we completely characterize the optimal sample complexity, up to $\log |A|$ factors. We show any protocol that makes $\text{poly}(n)$-copy measurements must make $\Omega(1/\epsilon^4)$ measurements. For any protocol that makes $\text{poly}(n)$-copy measurements and only has $k < n$ qubits of memory, we show that $\widetilde{\Theta}(\min\{2^n/\epsilon^2, 2^{n-k}/\epsilon^4\})$ copies are necessary and sufficient.
The protocols we propose can also estimate the actual values $\text{tr}(P\rho)$, rather than just their absolute values as in prior work. Additionally, as a byproduct of our techniques, we establish tight bounds for the task of purity testing and show that it exhibits an intriguing phase transition not present in the memory-sample tradeoff for Pauli shadow tomography.

[387]  arXiv:2404.19116 (cross-list from econ.TH) [pdf, other]
Title: Disentangling Exploration from Exploitation
Subjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)

Starting from Robbins (1952), the literature on experimentation via multi-armed bandits has wed exploration and exploitation. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined: a policymaker may assess new policies different than the status quo; an investor may evaluate projects outside her portfolio. We characterize the optimal experimentation policy when exploration and exploitation are disentangled in the case of Poisson bandits, allowing for general news structures. The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index a la Gittins. Disentanglement is particularly valuable for intermediate parameter values.

[388]  arXiv:2404.19145 (cross-list from stat.ME) [pdf, other]
Title: Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Machine Learning (stat.ML)

Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval.

[389]  arXiv:2404.19157 (cross-list from stat.ML) [pdf, other]
Title: Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks
Authors: Javier Antoran
Comments: PhD Thesis, University of Cambridge
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Large neural networks trained on large datasets have become the dominant paradigm in machine learning. These systems rely on maximum likelihood point estimates of their parameters, precluding them from expressing model uncertainty. This may result in overconfident predictions and it prevents the use of deep learning models for sequential decision making. This thesis develops scalable methods to equip neural networks with model uncertainty. In particular, we leverage the linearised Laplace approximation to equip pre-trained neural networks with the uncertainty estimates provided by their tangent linear models. This turns the problem of Bayesian inference in neural networks into one of Bayesian inference in conjugate Gaussian-linear models. Alas, the cost of this remains cubic in either the number of network parameters or in the number of observations times output dimensions. By assumption, neither are tractable. We address this intractability by using stochastic gradient descent (SGD) -- the workhorse algorithm of deep learning -- to perform posterior sampling in linear models and their convex duals: Gaussian processes. With this, we turn back to linearised neural networks, finding the linearised Laplace approximation to present a number of incompatibilities with modern deep learning practices -- namely, stochastic optimisation, early stopping and normalisation layers -- when used for hyperparameter learning. We resolve these and construct a sample-based EM algorithm for scalable hyperparameter learning with linearised neural networks. We apply the above methods to perform linearised neural network inference with ResNet-50 (25M parameters) trained on Imagenet (1.2M observations and 1000 output dimensions). Additionally, we apply our methods to estimate uncertainty for 3d tomographic reconstructions obtained with the deep image prior network.

[390]  arXiv:2404.19201 (cross-list from eess.IV) [pdf, other]
Title: Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems
Comments: The source code will be made publicly available at this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Optics (physics.optics)

The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system design. However, the effectiveness of these designs largely depends on the initial setup of the optical system, complicated by a non-convex solution space that impedes reaching a globally optimal solution. In this work, we present Global Search Optics (GSO) to automatically design compact computational imaging systems through two parts: (i) Fused Optimization Method for Automatic Optical Design (OptiFusion), which searches for diverse initial optical systems under certain design specifications; and (ii) Efficient Physic-aware Joint Optimization (EPJO), which conducts parallel joint optimization of initial optical systems and image reconstruction networks with the consideration of physical constraints, culminating in the selection of the optimal solution. Extensive experimental results on the design of three-piece (3P) sphere computational imaging systems illustrate that the GSO serves as a transformative end-to-end lens design paradigm for superior global optimal structure searching ability, which provides compact computational imaging systems with higher imaging quality compared to traditional methods. The source code will be made publicly available at https://github.com/wumengshenyou/GSO.

[391]  arXiv:2404.19203 (cross-list from physics.app-ph) [pdf, ps, other]
Title: Thermal Performance of a Liquid-cooling Assisted Thin Wickless Vapor Chamber
Authors: Arani Mukhopadhyay, Anish Pal, Mohamad Jafari Gukeh, Constantine M. Megaridis (Mechanical and Industrial Engineering, University of Illinois Chicago, IL, US.)
Comments: Presented at IEEE ITherm (Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems) 2023. Orlando, FL, US. Corresponding: cmm@uic.edu
Subjects: Applied Physics (physics.app-ph); Hardware Architecture (cs.AR); Systems and Control (eess.SY)

The ever-increasing need for power consumption in electronic devices, coupled with the requirement for thinner size, calls for the development of efficient heat spreading components. Vapor chambers (VCs), because of their ability to effectively spread heat over a large area by two-phase heat transfer, seem ideal for such applications. However, creating thin and efficient vapor chambers that work over a wide range of power inputs is a persisting challenge. VCs that use wicks for circulating the phase changing media, suffer from capillary restrictions, dry-out, clogging, increase in size and weight, and can often be costly. Recent developments in wick-free wettability patterned vapor chambers replace traditional wicks with laser-fabricated wickless components. An experimental setup allows for fast testing and experimental evaluation of water-charged VCs with liquid-assisted cooling. The sealed chamber can maintain vacuum for long durations, and can be used for testing of very thin wick-free VCs. This work extends our previous study by decreasing overall thickness of the wick-free VC down to 3 mm and evaluates its performance. Furthermore, the impact of wettability patterns on VC performance is investigated, by carrying out experiments both in non-patterned and patterned VCs. Experiments are first carried out on a wick-free VC with no wettability patterns and comprising of an entirely superhydrophilic evaporator coupled with a hydrophobic condenser. Thereafter, wettability patterns that aid the rapid return of water to the heated site on the evaporator and improve condensation on the condenser of the vapor chamber are implemented. The thermal characteristics show that the patterned VCs outperform the non-patterned VCs under all scenarios. The patterned VCs exhibit low thermal resistance independent of fluid charging ratio withstanding higher power inputs without thermal dry-outs.

[392]  arXiv:2404.19206 (cross-list from math.OC) [pdf, ps, other]
Title: Periodic Event-Triggered Boundary Control of Neuron Growth with Actuation at Soma
Comments: Submitted to 2024 Conference on Decision and Control
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

Exploring novel strategies for the regulation of axon growth, we introduce a periodic event-triggered control (PETC) to enhance the practical implementation of the associated PDE backstepping control law. Neurological injuries may impair neuronal function, but therapies like Chondroitinase ABC (ChABC) have shown promise in improving axon elongation by influencing the extracellular matrix. This matrix, composed of extracellular macromolecules and minerals, regulates tubulin protein concentration, potentially aiding in neuronal recovery. The concentration and spatial distribution of tubulin influence axon elongation dynamics. Recent research explores feedback control strategies for this model, leading to the development of an event-triggering control (CETC) approach. In this approach, the control law updates when the monitored triggering condition is met, reducing actuation resource consumption. Through the meticulous redesign of the triggering mechanism, we introduce a periodic event-triggering control (PETC), updating control inputs at specific intervals, but evaluating the event-trigger only periodically, an ideal tool for standard time-sliced actuators like ChABC. PETC is a step forward to the design of practically feasible feedback laws for the neuron growth process. The PETC strategy establishes an upper bound on event triggers between periodic examinations, ensuring convergence and preventing Zeno behavior. Through Lyapunov analysis, we demonstrate the local exponential convergence of the system with the periodic event-triggering mechanism in the $L^2$-norm sense. Numerical examples are presented to confirm the theoretical findings.

[393]  arXiv:2404.19220 (cross-list from stat.ML) [pdf, other]
Title: Regression for matrix-valued data via Kronecker products factorization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study the matrix-variate regression problem $Y_i = \sum_{k} \beta_{1k} X_i \beta_{2k}^{\top} + E_i$ for $i=1,2\dots,n$ in the high dimensional regime wherein the response $Y_i$ are matrices whose dimensions $p_{1}\times p_{2}$ outgrow both the sample size $n$ and the dimensions $q_{1}\times q_{2}$ of the predictor variables $X_i$ i.e., $q_{1},q_{2} \ll n \ll p_{1},p_{2}$. We propose an estimation algorithm, termed KRO-PRO-FAC, for estimating the parameters $\{\beta_{1k}\} \subset \Re^{p_1 \times q_1}$ and $\{\beta_{2k}\} \subset \Re^{p_2 \times q_2}$ that utilizes the Kronecker product factorization and rearrangement operations from Van Loan and Pitsianis (1993). The KRO-PRO-FAC algorithm is computationally efficient as it does not require estimating the covariance between the entries of the $\{Y_i\}$. We establish perturbation bounds between $\hat{\beta}_{1k} -\beta_{1k}$ and $\hat{\beta}_{2k} - \beta_{2k}$ in spectral norm for the setting where either the rows of $E_i$ or the columns of $E_i$ are independent sub-Gaussian random vectors. Numerical studies on simulated and real data indicate that our procedure is competitive, in terms of both estimation error and predictive accuracy, compared to other existing methods.

[394]  arXiv:2404.19230 (cross-list from q-bio.BM) [pdf, ps, other]
Title: Deep Lead Optimization: Leveraging Generative AI for Structural Modification
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)

The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead optimization, which refines existing molecules into drug candidates. Among them, lead optimization plays an important role in real-world drug design. For example, it can enable the development of me-better drugs that are chemically distinct yet more effective than the original drugs. It can also facilitate fragment-based drug design, transforming virtual-screened small ligands with low affinity into first-in-class medicines. Despite its importance, automated lead optimization remains underexplored compared to the well-established de novo generative models, due to its reliance on complex biological and chemical knowledge. To bridge this gap, we conduct a systematic review of traditional computational methods for lead optimization, organizing these strategies into four principal sub-tasks with defined inputs and outputs. This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD. Additionally, we introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization. Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.

[395]  arXiv:2404.19251 (cross-list from quant-ph) [pdf, other]
Title: Quantum control in the presence of strongly coupled non-Markovian noise
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)

Controlling quantum systems under correlated non-Markovian noise, particularly when strongly coupled, poses significant challenges in the development of quantum technologies. Traditional quantum control strategies, heavily reliant on precise models, often fail under these conditions. Here, we address the problem by utilizing a data-driven graybox model, which integrates machine learning structures with physics-based elements. We demonstrate single-qubit control, implementing a universal gate set as well as a random gate set, achieving high fidelity under unknown, strongly-coupled non-Markovian non-Gaussian noise, significantly outperforming traditional methods. Our method is applicable to all open finite-dimensional quantum systems, regardless of the type of noise or the strength of the coupling.

[396]  arXiv:2404.19301 (cross-list from stat.ML) [pdf, ps, other]
Title: Statistics and explainability: a fruitful alliance
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this paper, we propose standard statistical tools as a solution to commonly highlighted problems in the explainability literature. Indeed, leveraging statistical estimators allows for a proper definition of explanations, enabling theoretical guarantees and the formulation of evaluation metrics to quantitatively assess the quality of explanations. This approach circumvents, among other things, the subjective human assessment currently prevalent in the literature. Moreover, we argue that uncertainty quantification is essential for providing robust and trustworthy explanations, and it can be achieved in this framework through classical statistical procedures such as the bootstrap. However, it is crucial to note that while Statistics offers valuable contributions, it is not a panacea for resolving all the challenges. Future research avenues could focus on open problems, such as defining a purpose for the explanations or establishing a statistical framework for counterfactual or adversarial scenarios.

[397]  arXiv:2404.19345 (cross-list from cond-mat.mes-hall) [pdf, other]
Title: Connecting physics to systems with modular spin-circuits
Subjects: Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Emerging Technologies (cs.ET)

An emerging paradigm in modern electronics is that of CMOS + $\sf X$ requiring the integration of standard CMOS technology with novel materials and technologies denoted by $\sf X$. In this context, a crucial challenge is to develop accurate circuit models for $\sf X$ that are compatible with standard models for CMOS-based circuits and systems. In this perspective we present physics-based, experimentally benchmarked modular circuit models that can be used to evaluate a class of CMOS + $\sf X$ systems, where $\sf X$ denotes magnetic and spintronic materials and phenomena. This class of materials is particularly challenging because they go beyond conventional charge-based phenomena and involve the spin degree of freedom which involves non-trivial quantum effects. Starting from density matrices $-$ the central quantity in quantum transport $-$ using well-defined approximations, it is possible to obtain spin-circuits that generalize ordinary circuit theory to 4-component currents and voltages (1 for charge and 3 for spin). With step-by-step examples that progressively go higher in the computing stack, we illustrate how the spin-circuit approach can be used to start from the physics of magnetism and spintronics to enable accurate system-level evaluations. We believe the core approach can be extended to include other quantum degrees of freedom like valley and pseudospins starting from corresponding density matrices.

[398]  arXiv:2404.19351 (cross-list from physics.geo-ph) [pdf, other]
Title: Deep Learning Forecasts Caldera Collapse Events at Kīlauea Volcano
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)

During the three month long eruption of K\=ilauea volcano, Hawaii in 2018, the pre-existing summit caldera collapsed in over 60 quasi-periodic failure events. The last 40 of these events, which generated Mw >5 very long period (VLP) earthquakes, had inter-event times between 0.8 - 2.2 days. These failure events offer a unique dataset for testing methods for predicting earthquake recurrence based on locally recorded GPS, tilt, and seismicity data. In this work, we train a deep learning graph neural network (GNN) to predict the time-to-failure of the caldera collapse events using only a fraction of the data recorded at the start of each cycle. We find that the GNN generalizes to unseen data and can predict the time-to-failure to within a few hours using only 0.5 days of data, substantially improving upon a null model based only on inter-event statistics. Predictions improve with increasing input data length, and are most accurate when using high-SNR tilt-meter data. Applying the trained GNN to synthetic data with different magma pressure decay times predicts failure at a nearly constant stress threshold, revealing that the GNN is sensing the underling physics of caldera collapse. These findings demonstrate the predictability of caldera collapse sequences under well monitored conditions, and highlight the potential of machine learning methods for forecasting real world catastrophic events with limited training data.

[399]  arXiv:2404.19375 (cross-list from eess.AS) [pdf, ps, other]
Title: Deep low-latency joint speech transmission and enhancement over a gaussian channel
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Ensuring intelligible speech communication for hearing assistive devices in low-latency scenarios presents significant challenges in terms of speech enhancement, coding and transmission. In this paper, we propose novel solutions for low-latency joint speech transmission and enhancement, leveraging deep neural networks (DNNs). Our approach integrates two state-of-the-art DNN architectures for low-latency speech enhancement and low-latency analog joint source-channel-based transmission, creating a combined low-latency system and jointly training both systems in an end-to-end approach. Due to the computational demands of the enhancement system, this order is suitable when high computational power is unavailable in the decoder, like hearing assistive devices. The proposed system enables the configuration of total latency, achieving high performance even at latencies as low as 3 ms, which is typically challenging to attain. The simulation results provide compelling evidence that a joint enhancement and transmission system is superior to a simple concatenation system in diverse settings, encompassing various wireless channel conditions, latencies, and background noise scenarios.

[400]  arXiv:2404.19392 (cross-list from math.OC) [pdf, other]
Title: Convergence analysis of the transformed gradient projection algorithms on compact matrix manifolds
Comments: 45 pages, 5 figures, 4 tables
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

In this paper, to address the optimization problem on a compact matrix manifold, we introduce a novel algorithmic framework called the Transformed Gradient Projection (TGP) algorithm, using the projection onto this compact matrix manifold. Compared with the existing algorithms, the key innovation in our approach lies in the utilization of a new class of search directions and various stepsizes, including the Armijo, nonmonotone Armijo, and fixed stepsizes, to guide the selection of the next iterate. Our framework offers flexibility by encompassing the classical gradient projection algorithms as special cases, and intersecting the retraction-based line-search algorithms. Notably, our focus is on the Stiefel or Grassmann manifold, revealing that many existing algorithms in the literature can be seen as specific instances within our proposed framework, and this algorithmic framework also induces several new special cases. Then, we conduct a thorough exploration of the convergence properties of these algorithms, considering various search directions and stepsizes. To achieve this, we extensively analyze the geometric properties of the projection onto compact matrix manifolds, allowing us to extend classical inequalities related to retractions from the literature. Building upon these insights, we establish the weak convergence, convergence rate, and global convergence of TGP algorithms under three distinct stepsizes. In cases where the compact matrix manifold is the Stiefel or Grassmann manifold, our convergence results either encompass or surpass those found in the literature. Finally, through a series of numerical experiments, we observe that the TGP algorithms, owing to their increased flexibility in choosing search directions, outperform classical gradient projection and retraction-based line-search algorithms in several scenarios.

[401]  arXiv:2404.19407 (cross-list from math.OC) [pdf, other]
Title: Comparison of two numerical methods for Riemannian cubic polynomials on Stiefel manifolds
Comments: 12 pages, 3 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Numerical Analysis (math.NA)

In this paper we compare two numerical methods to integrate Riemannian cubic polynomials on the Stiefel manifold $\textbf{St}_{n,k}$. The first one is the adjusted de Casteljau algorithm, and the second one is a symplectic integrator constructed through discretization maps. In particular, we choose the cases of $n=3$ together with $k=1$ and $k=2$. The first case is diffeomorphic to the sphere and the quasi-geodesics appearing in the adjusted de Casteljau algorithm are actually geodesics. The second case is an example where we have a pure quasi-geodesic different from a geodesic. We provide a numerical comparison of both methods and discuss the obtained results to highlight the benefits of each method.

[402]  arXiv:2404.19428 (cross-list from quant-ph) [pdf, ps, other]
Title: From Quantum Mechanics to Quantum Software Engineering
Comments: 8 pages
Subjects: Quantum Physics (quant-ph); Software Engineering (cs.SE)

Victor Hugo's timeless observation, "Nothing is more powerful than an idea whose time has come", resonates today as Quantum Computing, once only a dream of a physicist, stands at the threshold of reality with the potential to revolutionise the world. To comprehend the surge of attention it commands today, one must delve into the motivations that birthed and nurtured Quantum Computing. While the past of Quantum Computing provides insights into the present, the future could unfold through the lens of Quantum Software Engineering. Quantum Software Engineering, guided by its principles and methodologies investigates the most effective ways to interact with Quantum Computers to unlock their true potential and usher in a new era of possibilities. To gain insight into the present landscape and anticipate the trajectory of Quantum Computing and Quantum Software Engineering, this paper embarks on a journey through their evolution and outlines potential directions for future research.

[403]  arXiv:2404.19481 (cross-list from eess.IV) [pdf, other]
Title: SpecstatOR: Speckle statistics-based iOCT Segmentation Network for Ophthalmic Surgery
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This paper presents an innovative approach to intraoperative Optical Coherence Tomography (iOCT) image segmentation in ophthalmic surgery, leveraging statistical analysis of speckle patterns to incorporate statistical pathology-specific prior knowledge. Our findings indicate statistically different speckle patterns within the retina and between retinal layers and surgical tools, facilitating the segmentation of previously unseen data without the necessity for manual labeling. The research involves fitting various statistical distributions to iOCT data, enabling the differentiation of different ocular structures and surgical tools. The proposed segmentation model aims to refine the statistical findings based on prior tissue understanding to leverage statistical and biological knowledge. Incorporating statistical parameters, physical analysis of light-tissue interaction, and deep learning informed by biological structures enhance segmentation accuracy, offering potential benefits to real-time applications in ophthalmic surgical procedures. The study demonstrates the adaptability and precision of using Gamma distribution parameters and the derived binary maps as sole inputs for segmentation, notably enhancing the model's inference performance on unseen data.

[404]  arXiv:2404.19535 (cross-list from physics.app-ph) [pdf, other]
Title: Ferroelectrically-enhanced Schottky barrier transistors for Logic-in-Memory applications
Subjects: Applied Physics (physics.app-ph); Emerging Technologies (cs.ET)

Artificial neural networks (ANNs) have had an enormous impact on a multitude of sectors, from research to industry, generating an unprecedented demand for tailor-suited hardware platforms. Their training and execution is highly memory-intensive, clearly evidencing the limitations affecting the currently available hardware based on the von Neumann architecture, which requires frequent data shuttling due to the physical separation of logic and memory units. This does not only limit the achievable performances but also greatly increases the energy consumption, hindering the integration of ANNs into low-power platforms. New Logic in Memory (LiM) architectures, able to unify memory and logic functionalities into a single component, are highly promising for overcoming these limitations, by drastically reducing the need of data transfers. Recently, it has been shown that a very flexible platform for logic applications can be realized recurring to a multi-gated Schottky-Barrier Field Effect Transistor (SBFET). If equipped with memory capabilities, this architecture could represent an ideal building block for versatile LiM hardware. To reach this goal, here we investigate the integration of a ferroelectric Hf$_{0.5}$Zr$_{0.5}$O$_2$ (HZO) layer onto Dual Top Gated SBFETs. We demonstrate that HZO polarization charges can be successfully employed to tune the height of the two Schottky barriers, influencing the injection behavior, thus defining the transistor mode, switching it between n and p-type transport. The modulation strength is strongly dependent on the polarization pulse height, allowing for the selection of multiple current levels. All these achievable states can be well retained over time, thanks to the HZO stability. The presented result show how ferroelectric-enhanced SBFETs are promising for the realization of novel LiM hardware, enabling low-power circuits for ANNs execution.

[405]  arXiv:2404.19556 (cross-list from math.CO) [pdf, ps, other]
Title: A logarithmic approximation of linearly-ordered colourings
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

A linearly ordered (LO) $k$-colouring of a hypergraph assigns to each vertex a colour from the set $\{0,1,\ldots,k-1\}$ in such a way that each hyperedge has a unique maximum element. Barto, Batistelli, and Berg conjectured that it is NP-hard to find an LO $k$-colouring of an LO 2-colourable 3-uniform hypergraph for any constant $k\geq 2$ [STACS'21] but even the case $k=3$ is still open. Nakajima and \v{Z}ivn\'{y} gave polynomial-time algorithms for finding, given an LO 2-colourable 3-uniform hypergraph, an LO colouring with $O^*(\sqrt{n})$ colours [ICALP'22] and an LO colouring with $O^*(\sqrt[3]{n})$ colours [ACM ToCT'23]. We present a simple polynomial-time algorithm that finds an LO colouring with $\log_2(n)$ colours, which is an exponential improvement.

[406]  arXiv:2404.19557 (cross-list from stat.ML) [pdf, other]
Title: Neural Dynamic Data Valuation
Comments: 43 pages, 19 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Data constitute the foundational component of the data economy and its marketplaces. Efficient and fair data valuation has emerged as a topic of significant interest.\ Many approaches based on marginal contribution have shown promising results in various downstream tasks. However, they are well known to be computationally expensive as they require training a large number of utility functions, which are used to evaluate the usefulness or value of a given dataset for a specific purpose. As a result, it has been recognized as infeasible to apply these methods to a data marketplace involving large-scale datasets. Consequently, a critical issue arises: how can the re-training of the utility function be avoided? To address this issue, we propose a novel data valuation method from the perspective of optimal control, named the neural dynamic data valuation (NDDV). Our method has solid theoretical interpretations to accurately identify the data valuation via the sensitivity of the data optimal control state. In addition, we implement a data re-weighting strategy to capture the unique features of data points, ensuring fairness through the interaction between data points and the mean-field states. Notably, our method requires only training once to estimate the value of all data points, significantly improving the computational efficiency. We conduct comprehensive experiments using different datasets and tasks. The results demonstrate that the proposed NDDV method outperforms the existing state-of-the-art data valuation methods in accurately identifying data points with either high or low values and is more computationally efficient.

[407]  arXiv:2404.19568 (cross-list from eess.IV) [pdf, other]
Title: Enhancing Deep Learning Model Explainability in Brain Tumor Datasets using Post-Heuristic Approaches
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The application of deep learning models in medical diagnosis has showcased considerable efficacy in recent years. Nevertheless, a notable limitation involves the inherent lack of explainability during decision-making processes. This study addresses such a constraint, by enhancing the interpretability robustness. The primary focus is directed towards refining the explanations generated by the LIME Library and LIME image explainer. This is achieved throuhg post-processing mechanisms, based on scenario-specific rules. Multiple experiments have been conducted using publicly accessible datasets related to brain tumor detection. Our proposed post-heuristic approach demonstrates significant advancements, yielding more robust and concrete results, in the context of medical diagnosis.

[408]  arXiv:2404.19579 (cross-list from eess.IV) [pdf, ps, other]
Title: Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Heart diseases are the main international cause of human defunction. According to the WHO, nearly 18 million people decease each year because of heart diseases. Also considering the increase of medical data, much pressure is put on the health industry to develop systems for early and accurate heart disease recognition. In this work, an automatic cardiac pathology recognition system based on a novel deep learning framework is proposed, which analyses in real-time echocardiography video sequences. The system works in two stages. The first one transforms the data included in a database of echocardiography sequences into a machine-learning-compatible collection of annotated images which can be used in the training stage of any kind of machine learning-based framework, and more specifically with deep learning. This includes the use of the Higher Order Dynamic Mode Decomposition (HODMD) algorithm, for the first time to the authors' knowledge, for both data augmentation and feature extraction in the medical field. The second stage is focused on building and training a Vision Transformer (ViT), barely explored in the related literature. The ViT is adapted for an effective training from scratch, even with small datasets. The designed neural network analyses images from an echocardiography sequence to predict the heart state. The results obtained show the superiority of the proposed system and the efficacy of the HODMD algorithm, even outperforming pretrained Convolutional Neural Networks (CNNs), which are so far the method of choice in the literature.

[409]  arXiv:2404.19598 (cross-list from eess.IV) [pdf, other]
Title: Artificial Intelligence in Bone Metastasis Analysis: Current Advancements, Opportunities and Challenges
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In recent years, Artificial Intelligence (AI) has been widely used in medicine, particularly in the analysis of medical imaging, which has been driven by advances in computer vision and deep learning methods. This is particularly important in overcoming the challenges posed by diseases such as Bone Metastases (BM), a common and complex malignancy of the bones. Indeed, there have been an increasing interest in developing Machine Learning (ML) techniques into oncologic imaging for BM analysis. In order to provide a comprehensive overview of the current state-of-the-art and advancements for BM analysis using artificial intelligence, this review is conducted with the accordance with PRISMA guidelines. Firstly, this review highlights the clinical and oncologic perspectives of BM and the used medical imaging modalities, with discussing their advantages and limitations. Then the review focuses on modern approaches with considering the main BM analysis tasks, which includes: classification, detection and segmentation. The results analysis show that ML technologies can achieve promising performance for BM analysis and have significant potential to improve clinician efficiency and cope with time and cost limitations. Furthermore, there are requirements for further research to validate the clinical performance of ML tools and facilitate their integration into routine clinical practice.

[410]  arXiv:2404.19600 (cross-list from physics.flu-dyn) [pdf, other]
Title: Stabilized POD Reduced Order Models for convection-dominated incompressible flows
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

We present a comparative computational study of two stabilized Reduced Order Models (ROMs) for the simulation of convection-dominated incompressible flow (Reynolds number of the order of a few thousands). Representative solutions in the parameter space, which includes either time only or time and Reynolds number, are computed with a Finite Volume method and used to generate a reduced basis via Proper Orthogonal Decomposition (POD). Galerkin projection of the Navier-Stokes equations onto the reduced space is used to compute the ROM solution. To ensure computational efficiency, the number of POD modes is truncated and ROM solution accuracy is recovered through two stabilization methods: i) adding a global constant artificial viscosity to the reduced dimensional model, and ii) adding a different value of artificial viscosity for the different POD modes. We test the stabilized ROMs for fluid flow in an idealized medical device consisting of a conical convergent, a narrow throat, and a sudden expansion. Both stabilization methods significantly improve the ROM solution accuracy over a standard (non-stabilized) POD-Galerkin model.

[411]  arXiv:2404.19602 (cross-list from physics.comp-ph) [pdf, other]
Title: Uncertainty quantification for charge transport in GNRs through particle Galerkin methods for the semiclassical Boltzmann equation
Comments: 26 pages, 6 Figures, 4 Tables
Subjects: Computational Physics (physics.comp-ph); Numerical Analysis (math.NA); Applied Physics (physics.app-ph)

In this article, we investigate some issues related to the quantification of uncertainties associated with the electrical properties of graphene nanoribbons. The approach is suited to understand the effects of missing information linked to the difficulty of fixing some material parameters, such as the band gap, and the strength of the applied electric field. In particular, we focus on the extension of particle Galerkin methods for kinetic equations in the case of the semiclassical Boltzmann equation for charge transport in graphene nanoribbons with uncertainties. To this end, we develop an efficient particle scheme which allows us to parallelize the computation and then, after a suitable generalization of the scheme to the case of random inputs, we present a Galerkin reformulation of the particle dynamics, obtained by means of a generalized polynomial chaos approach, which allows the reconstruction of the kinetic distribution. As a consequence, the proposed particle-based scheme preserves the physical properties and the positivity of the distribution function also in the presence of a complex scattering in the transport equation of electrons. The impact of the uncertainty of the band gap and applied field on the electrical current is analyzed.

[412]  arXiv:2404.19604 (cross-list from eess.IV) [pdf, other]
Title: X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Comments: preprint, project website: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In this work, we present X-Diffusion, a cross-sectional diffusion model tailored for Magnetic Resonance Imaging (MRI) data. X-Diffusion is capable of generating the entire MRI volume from just a single MRI slice or optionally from few multiple slices, setting new benchmarks in the precision of synthesized MRIs from extremely sparse observations. The uniqueness lies in the novel view-conditional training and inference of X-Diffusion on MRI volumes, allowing for generalized MRI learning. Our evaluations span both brain tumour MRIs from the BRATS dataset and full-body MRIs from the UK Biobank dataset. Utilizing the paired pre-registered Dual-energy X-ray Absorptiometry (DXA) and MRI modalities in the UK Biobank dataset, X-Diffusion is able to generate detailed 3D MRI volume from a single full-body DXA. Remarkably, the resultant MRIs not only stand out in precision on unseen examples (surpassing state-of-the-art results by large margins) but also flawlessly retain essential features of the original MRI, including tumour profiles, spine curvature, brain volume, and beyond. Furthermore, the trained X-Diffusion model on the MRI datasets attains a generalization capacity out-of-domain (e.g. generating knee MRIs even though it is trained on brains). The code is available on the project website https://emmanuelleb985.github.io/XDiffusion/ .

[413]  arXiv:2404.19611 (cross-list from eess.SP) [pdf, other]
Title: Radio Resource Management Design for RSMA: Optimization of Beamforming, User Admission, and Discrete/Continuous Rates with Imperfect SIC
Subjects: Signal Processing (eess.SP); Emerging Technologies (cs.ET); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

This paper investigates the radio resource management (RRM) design for multiuser rate-splitting multiple access (RSMA), accounting for various characteristics of practical wireless systems, such as the use of discrete rates, the inability to serve all users, and the imperfect successive interference cancellation (SIC). Specifically, failure to consider these characteristics in RRM design may lead to inefficient use of radio resources. Therefore, we formulate the RRM of RSMA as optimization problems to maximize respectively the weighted sum rate (WSR) and weighted energy efficiency (WEE), and jointly optimize the beamforming, user admission, discrete/continuous rates, accounting for imperfect SIC, which result in nonconvex mixed-integer nonlinear programs that are challenging to solve. Despite the difficulty of the optimization problems, we develop algorithms that can find high-quality solutions. We show via simulations that carefully accounting for the aforementioned characteristics, can lead to significant gains. Precisely, by considering that transmission rates are discrete, the transmit power can be utilized more intelligently, allocating just enough power to guarantee a given discrete rate. Additionally, we reveal that user admission plays a crucial role in RSMA, enabling additional gains compared to random admission by facilitating the servicing of selected users with mutually beneficial channel characteristics. Furthermore, provisioning for possibly imperfect SIC makes RSMA more robust and reliable.

[414]  arXiv:2404.19621 (cross-list from math.CO) [pdf, other]
Title: Fibonacci and Lucas Sequences in Aperiodic Monotile Supertiles
Authors: Shiying Dong
Comments: 10 pages, 21 figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

This paper first discusses the size and orientation of hat supertiles. Fibonacci and Lucas sequences, as well as a third integer sequence linearly related to the Lucas sequence are involved. The result is then generalized to any aperiodic tile in the hat family.

[415]  arXiv:2404.19645 (cross-list from math.CA) [pdf, ps, other]
Title: Best polynomial approximation for non-autonomous linear ODEs in the $\star$-product framework
Authors: Stefano Pozza
Subjects: Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA)

We present the first formulation of the optimal polynomial approximation of the solution of linear non-autonomous systems of ODEs in the framework of the so-called $\star$-product. This product is the basis of new approaches for the solution of such ODEs, both in the analytical and the numerical sense. The paper shows how to formally state the problem and derives upper bounds for its error.

[416]  arXiv:2404.19665 (cross-list from physics.med-ph) [pdf, other]
Title: ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging
Subjects: Medical Physics (physics.med-ph); Artificial Intelligence (cs.AI); Mathematical Software (cs.MS); Software Engineering (cs.SE); Mathematical Physics (math-ph)

AI is revolutionizing MRI along the acquisition and processing chain. Advanced AI frameworks have been developed to apply AI in various successive tasks, such as image reconstruction, quantitative parameter map estimation, and image segmentation. Existing frameworks are often designed to perform tasks independently or are focused on specific models or datasets, limiting generalization. We introduce ATOMMIC, an open-source toolbox that streamlines AI applications for accelerated MRI reconstruction and analysis. ATOMMIC implements several tasks using DL networks and enables MultiTask Learning (MTL) to perform related tasks integrated, targeting generalization in the MRI domain. We first review the current state of AI frameworks for MRI through a comprehensive literature search and by parsing 12,479 GitHub repositories. We benchmark 25 DL models on eight publicly available datasets to present distinct applications of ATOMMIC on accelerated MRI reconstruction, image segmentation, quantitative parameter map estimation, and joint accelerated MRI reconstruction and image segmentation utilizing MTL. Our findings demonstrate that ATOMMIC is the only MTL framework with harmonized complex-valued and real-valued data support. Evaluations on single tasks show that physics-based models, which enforce data consistency by leveraging the physical properties of MRI, outperform other models in reconstructing highly accelerated acquisitions. Physics-based models that produce high reconstruction quality can accurately estimate quantitative parameter maps. When high-performing reconstruction models are combined with robust segmentation networks utilizing MTL, performance is improved in both tasks. ATOMMIC facilitates MRI reconstruction and analysis by standardizing workflows, enhancing data interoperability, integrating unique features like MTL, and effectively benchmarking DL models.

[417]  arXiv:2404.19689 (cross-list from math.AP) [pdf, ps, other]
Title: Continuum limit of $p$-biharmonic equations on graphs
Comments: 20 pages
Subjects: Analysis of PDEs (math.AP); Machine Learning (cs.LG); Numerical Analysis (math.NA)

This paper studies the $p$-biharmonic equation on graphs, which arises in point cloud processing and can be interpreted as a natural extension of the graph $p$-Laplacian from the perspective of hypergraph. The asymptotic behavior of the solution is investigated when the random geometric graph is considered and the number of data points goes to infinity. We show that the continuum limit is an appropriately weighted $p$-biharmonic equation with homogeneous Neumann boundary conditions. The result relies on the uniform $L^p$ estimates for solutions and gradients of nonlocal and graph Poisson equations. The $L^\infty$ estimates of solutions are also obtained as a byproduct.

[418]  arXiv:2404.19723 (cross-list from eess.AS) [pdf, other]
Title: Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skipping and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech and text. We call the attention maps of those heads Alignment-Emerged Attention Maps (AEAMs). Based on this discovery, we propose a novel inference method without altering the training process, named Attention-Constrained Inference (ACI), to facilitate monotonic synthesis. It first identifies AEAMs using the Attention Sweeping algorithm and then applies constraining masks on AEAMs. Our experimental results on decoder-only TTS model VALL-E show that the WER of synthesized speech is reduced by up to 20.5% relatively with ACI while the naturalness and speaker similarity are comparable.

[419]  arXiv:2404.19739 (cross-list from q-bio.BM) [pdf, other]
Title: Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)

Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol

[420]  arXiv:2404.19754 (cross-list from quant-ph) [pdf, other]
Title: Succinct arguments for QMA from standard assumptions via compiled nonlocal games
Comments: 57 pages
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

We construct a succinct classical argument system for QMA, the quantum analogue of NP, from generic and standard cryptographic assumptions. Previously, building on the prior work of Mahadev (FOCS '18), Bartusek et al. (CRYPTO '22) also constructed a succinct classical argument system for QMA. However, their construction relied on post-quantumly secure indistinguishability obfuscation, a very strong primitive which is not known from standard cryptographic assumptions. In contrast, the primitives we use (namely, collapsing hash functions and a mild version of quantum homomorphic encryption) are much weaker and are implied by standard assumptions such as LWE. Our protocol is constructed using a general transformation which was designed by Kalai et al. (STOC '23) as a candidate method to compile any quantum nonlocal game into an argument system. Our main technical contribution is to analyze the soundness of this transformation when it is applied to a succinct self-test for Pauli measurements on maximally entangled states, the latter of which is a key component in the proof of MIP*=RE in quantum complexity.

Replacements for Wed, 1 May 24

[421]  arXiv:1203.0550 (replaced) [pdf, other]
Title: Algorithms for Learning Kernels Based on Centered Alignment
Journal-ref: Journal of Machine Learning Research 13 (2012) 795-828
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[422]  arXiv:1511.05240 (replaced) [pdf, ps, other]
Title: An extension of McDiarmid's inequality
Authors: Richard Combes
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
[423]  arXiv:2007.08784 (replaced) [pdf, other]
Title: Optimal Algorithm for the Planar Two-Center Problem
Comments: To appear in SoCG 2024
Subjects: Computational Geometry (cs.CG)
[424]  arXiv:2102.04394 (replaced) [pdf, other]
Title: Learning with Density Matrices and Random Features
Comments: Final version published in Quantum Mach. Intell. 4, 23 (2022)
Journal-ref: Quantum Mach. Intell. 4, 23 (2022)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
[425]  arXiv:2105.02797 (replaced) [pdf, ps, other]
Title: The replica-symmetric free energy for Ising spin glasses with orthogonally invariant couplings
Authors: Zhou Fan, Yihong Wu
Subjects: Probability (math.PR); Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT); Statistics Theory (math.ST)
[426]  arXiv:2105.08419 (replaced) [pdf, other]
Title: A sparse ADMM-based solver for linear MPC subject to terminal quadratic constraint
Comments: Accepted version of the article published in IEEE Transactions on Control Systems Technology (8 pages, 5 figures)
Journal-ref: IEEE Transactions on Control Systems Technology, 2024
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[427]  arXiv:2105.13857 (replaced) [pdf, other]
Title: Learning Approximate and Exact Numeral Systems via Reinforcement Learning
Comments: CogSci 2021. Fixed typos
Journal-ref: Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 43 (2021)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[428]  arXiv:2106.00599 (replaced) [pdf, other]
Title: ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings
Comments: Published in SAGE Information Visualization journal
Journal-ref: Information Visualization Journal 23(2) 105-122 (2024)
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[429]  arXiv:2110.14865 (replaced) [pdf, other]
Title: Counterbalancing Learning and Strategic Incentives in Allocation Markets
Comments: A preliminary version appeared in the Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Computer Science and Game Theory (cs.GT)
[430]  arXiv:2111.03683 (replaced) [pdf, ps, other]
Title: On Homomorphism Graphs
Subjects: Logic (math.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)
[431]  arXiv:2202.02002 (replaced) [pdf, other]
Title: Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings
Comments: 14 pages. Accepted to Int. J. Comp. Vis. (IJCV)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[432]  arXiv:2203.01014 (replaced) [pdf, other]
Title: Cumulative Merging Percolation: A long-range percolation process in networks
Comments: 12 pages, 6 figures
Journal-ref: Phys. Rev. E 105, 054310 (2022)
Subjects: Statistical Mechanics (cond-mat.stat-mech); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
[433]  arXiv:2205.08858 (replaced) [pdf, other]
Title: Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response
Journal-ref: LNCS 14551 (2024)
Subjects: Cryptography and Security (cs.CR)
[434]  arXiv:2206.08648 (replaced) [pdf, other]
Title: Orthonormal Expansions for Translation-Invariant Kernels
Comments: 23 pages, 8 figures
Subjects: Classical Analysis and ODEs (math.CA); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[435]  arXiv:2207.10170 (replaced) [pdf, other]
Title: Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers
Subjects: Artificial Intelligence (cs.AI)
[436]  arXiv:2208.08690 (replaced) [pdf, other]
Title: Open Information Extraction from 2007 to 2022 -- A Survey
Comments: The first three authors contributed to this work equally. Names are ordered randomly
Subjects: Computation and Language (cs.CL)
[437]  arXiv:2209.05580 (replaced) [pdf, other]
Title: Risk-aware Meta-level Decision Making for Exploration Under Uncertainty
Comments: IEEE International Conference on Control, Decision and Information Technologies
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[438]  arXiv:2209.13836 (replaced) [pdf, other]
Title: MUTE-Reco: MUTual Information Assisted Ensemble Feature RECOmmender System for Healthcare Prognosis
Subjects: Machine Learning (cs.LG)
[439]  arXiv:2210.13027 (replaced) [pdf, other]
Title: E-Valuating Classifier Two-Sample Tests
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[440]  arXiv:2211.09081 (replaced) [pdf, other]
Title: Secure SWIPT in the Multiuser STAR-RIS Aided MISO Rate Splitting Downlink
Comments: 13 pages, journal paper
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[441]  arXiv:2211.17157 (replaced) [pdf, other]
Title: Swarm-Based Gradient Descent Method for Non-Convex Optimization
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
[442]  arXiv:2212.01554 (replaced) [pdf, other]
Title: Distributionally Robust Lyapunov Function Search Under Uncertainty
Comments: 5th Annual Learning for Dynamics & Control Conference
Subjects: Optimization and Control (math.OC); Robotics (cs.RO)
[443]  arXiv:2212.06278 (replaced) [pdf, other]
Title: Efficient Bayesian Uncertainty Estimation for nnU-Net
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[444]  arXiv:2301.02368 (replaced) [pdf, other]
Title: Emergence of simple and complex contagion dynamics from weighted belief networks
Journal-ref: Science Advances.10,eadh4439(2024)
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
[445]  arXiv:2302.00151 (replaced) [pdf, other]
Title: Formalising and Computing the Fourth Homotopy Group of the $3$-Sphere in Cubical Agda
Subjects: Algebraic Topology (math.AT); Logic in Computer Science (cs.LO)
[446]  arXiv:2302.08212 (replaced) [pdf, other]
Title: Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447]  arXiv:2302.12172 (replaced) [pdf, other]
Title: Vision-Language Generative Model for View-Specific Chest X-ray Generation
Comments: Accepted at CHIL 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[448]  arXiv:2302.12872 (replaced) [pdf, other]
Title: Comparisons of two-stage models for flood mitigation of electrical substations
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[449]  arXiv:2303.00614 (replaced) [pdf, other]
Title: A Hybrid Genetic Algorithm with Type-Aware Chromosomes for Traveling Salesman Problems with Drone
Subjects: Neural and Evolutionary Computing (cs.NE)
[450]  arXiv:2303.02243 (replaced) [pdf, other]
Title: Neural Operator Learning for Long-Time Integration in Dynamical Systems with Recurrent Neural Networks
Comments: 8 pages, 5 figures
Subjects: Machine Learning (cs.LG)
[451]  arXiv:2303.02339 (replaced) [pdf, other]
Title: A Nyström Method for Scattering by a Two-layered Medium with a Rough Boundary
Comments: 58 pages with 6 figures
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
[452]  arXiv:2303.07230 (replaced) [pdf, other]
Title: Systematic Evaluation of Deep Learning Models for Failure Prediction
Subjects: Software Engineering (cs.SE)
[453]  arXiv:2303.10802 (replaced) [pdf, other]
Title: PASS: Peer-Agreement based Sample Selection for training with Noisy Labels
Comments: In Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[454]  arXiv:2303.11097 (replaced) [pdf, ps, other]
Title: Time- versus event-triggered consensus of a single-integrator multi-agent system
Comments: 19 pages, 3 figures. This article builds upon our preceding work: this https URL
Journal-ref: Nonlinear Analysis: Hybrid Systems, 53, 101494 (2024)
Subjects: Systems and Control (eess.SY)
[455]  arXiv:2303.14541 (replaced) [pdf, other]
Title: UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
Comments: Project page: this https URL, paper updated according to CVPR24 camera ready version
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456]  arXiv:2304.04076 (replaced) [pdf, other]
Title: On the Optimality of Procrastination Policy for EV charging under Net Energy Metering
Subjects: Systems and Control (eess.SY)
[457]  arXiv:2304.04664 (replaced) [pdf, other]
Title: Inductive biases in deep learning models for weather prediction
Authors: Jannik Thuemmel (1), Matthias Karlbauer (1), Sebastian Otte (1), Christiane Zarfl (1), Georg Martius (2), Nicole Ludwig (1), Thomas Scholten (1), Ulrich Friedrich (3), Volker Wulfmeyer (4), Bedartha Goswami (1), Martin V. Butz (1) ((1) University of Tübingen, (2) Max Planck Institute for Intelligent Systems, (3) Deutscher Wetterdienst, (4) University of Hohenheim)
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
[458]  arXiv:2304.06145 (replaced) [pdf, ps, other]
Title: R-Shiny Applications for Local Clustering to be Included in the growclusters for R Package
Comments: 17 pages, 10 figures, paper presented at 2023 Joint Statistical Meetings
Subjects: Mathematical Software (cs.MS); Machine Learning (cs.LG)
[459]  arXiv:2304.07769 (replaced) [pdf, other]
Title: Spot The Odd One Out: Regularized Complete Cycle Consistent Anomaly Detector GAN
Subjects: Machine Learning (cs.LG)
[460]  arXiv:2304.08235 (replaced) [pdf, other]
Title: A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer in Autonomous Driving
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[461]  arXiv:2305.01658 (replaced) [pdf, other]
Title: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework with Gray Code Representation
Comments: An extend version based on the AAAI version
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[462]  arXiv:2305.02145 (replaced) [pdf, other]
Title: ProgDTD: Progressive Learned Image Compression with Double-Tail-Drop Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[463]  arXiv:2305.06221 (replaced) [pdf, other]
Title: Multi-Prompt with Depth Partitioned Cross-Modal Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[464]  arXiv:2305.12106 (replaced) [pdf, ps, other]
Title: Human-annotated label noise and their impact on ConvNets for remote sensing image scene classification
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[465]  arXiv:2305.18204 (replaced) [pdf, other]
Title: Kernel Density Matrices for Probabilistic Deep Learning
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph); Machine Learning (stat.ML)
[466]  arXiv:2305.18723 (replaced) [pdf, other]
Title: Towards Accurate Post-training Quantization for Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2305.19486 (replaced) [pdf, other]
Title: Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[468]  arXiv:2306.02149 (replaced) [pdf, other]
Title: A General Framework for Interpretable Neural Learning based on Local Information-Theoretic Goal Functions
Comments: 26 pages, 12 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[469]  arXiv:2306.06327 (replaced) [pdf, other]
Title: Any-dimensional equivariant neural networks
Comments: 21 pages, 2 figures
Journal-ref: International Conference on Artificial Intelligence and Statistics. PMLR, 2024. Available from https://proceedings.mlr.press/v238/levin24a.html
Subjects: Machine Learning (cs.LG); Representation Theory (math.RT); Machine Learning (stat.ML)
[470]  arXiv:2306.06945 (replaced) [pdf, other]
Title: Underwater Acoustic Target Recognition based on Smoothness-inducing Regularization and Spectrogram-based Data Augmentation
Journal-ref: Ocean Engineering, 2023, 281: 114926
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[471]  arXiv:2306.08765 (replaced) [pdf, other]
Title: Causal Discovery from Time Series with Hybrids of Constraint-Based and Noise-Based Algorithms
Comments: Accepted in TMLR: this https URL
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[472]  arXiv:2306.11171 (replaced) [pdf, other]
Title: Sim-to-real transfer of active suspension control using deep reinforcement learning
Comments: 15 pages, 18 figures
Subjects: Robotics (cs.RO)
[473]  arXiv:2306.13105 (replaced) [pdf, other]
Title: Multi-task Learning for Radar Signal Characterisation
Comments: 5 pages, 3 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[474]  arXiv:2306.13890 (replaced) [pdf, ps, other]
Title: Virtual element methods for Biot-Kirchhoff poroelasticity
Subjects: Numerical Analysis (math.NA)
[475]  arXiv:2306.15867 (replaced) [pdf, other]
Title: Convergence analysis of a weak Galerkin finite element method on a Shishkin mesh for a singularly perturbed fourth-order problem in 2D
Subjects: Numerical Analysis (math.NA)
[476]  arXiv:2306.16132 (replaced) [pdf, other]
Title: Fast and Accurate Unknown Object Instance Segmentation through Error-Informed Refinement
Comments: 8 pages, 5 figures, project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[477]  arXiv:2307.05590 (replaced) [pdf, other]
Title: Efficient Computation of Magnetic Polarizability Tensor Spectral Signatures for Object Characterisation in Metal Detection
Comments: 25 pages, 18 figures. Updated to include new results and demonstrations of computational performance
Subjects: Numerical Analysis (math.NA)
[478]  arXiv:2307.08149 (replaced) [pdf, other]
Title: Problems in NP can Admit Double-Exponential Lower Bounds when Parameterized by Treewidth or Vertex Cover
Comments: Shortened abstract to meet arxiv requirements. This version contains a subset of the results mentioned in the first version and will be presented at ICALP 2024
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[479]  arXiv:2307.10690 (replaced) [pdf, other]
Title: Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots
Subjects: Robotics (cs.RO)
[480]  arXiv:2307.16670 (replaced) [pdf, other]
Title: Conditioning Generative Latent Optimization for Sparse-View CT Image Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[481]  arXiv:2308.00957 (replaced) [pdf, other]
Title: Causal Inference with Differentially Private (Clustered) Outcomes
Comments: 41 pages, 10 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Methodology (stat.ME)
[482]  arXiv:2308.02887 (replaced) [pdf, other]
Title: The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking
Subjects: Information Retrieval (cs.IR)
[483]  arXiv:2308.14250 (replaced) [pdf, other]
Title: Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[484]  arXiv:2309.02184 (replaced) [pdf, other]
Title: Integral equation methods for acoustic scattering by fractals
Subjects: Numerical Analysis (math.NA)
[485]  arXiv:2309.08313 (replaced) [pdf, other]
Title: Conditional validity of heteroskedastic conformal regression
Comments: 36 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[486]  arXiv:2309.10924 (replaced) [pdf, other]
Title: Change of Scenery: Unsupervised LiDAR Change Detection for Mobile Robots
Comments: 8 pages (7 content, 1 references). 7 figures, Presented at 2024 Conference on Robots and Vision (CRV)
Subjects: Robotics (cs.RO)
[487]  arXiv:2309.11924 (replaced) [pdf, other]
Title: Generic Selfish Mining MDP for DAG Protocols
Authors: Patrik Keller
Comments: v2: update authors, fix my implementation of Roi's model
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[488]  arXiv:2309.14788 (replaced) [pdf, other]
Title: Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares
Comments: Accepted to ISAAC'23
Subjects: Data Structures and Algorithms (cs.DS)
[489]  arXiv:2309.15432 (replaced) [pdf, other]
Title: ComPile: A Large IR Dataset from Production Sources
Subjects: Programming Languages (cs.PL)
[490]  arXiv:2309.15742 (replaced) [pdf, other]
Title: T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble
Comments: Accepted to the Journal of Systems and Software
Subjects: Software Engineering (cs.SE)
[491]  arXiv:2309.17288 (replaced) [pdf, other]
Title: AutoAgents: A Framework for Automatic Agent Generation
Comments: IJCAI 2024
Subjects: Artificial Intelligence (cs.AI)
[492]  arXiv:2310.00001 (replaced) [pdf, other]
Title: AsaPy: A Python Library for Aerospace Simulation Analysis
Subjects: Mathematical Software (cs.MS)
[493]  arXiv:2310.00273 (replaced) [pdf, other]
Title: Safe Stabilizing Control for Polygonal Robots in Dynamic Elliptical Environments
Comments: 2024 American Control Conference
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
[494]  arXiv:2310.02556 (replaced) [pdf, other]
Title: NOLA: Compressing LoRA using Linear Combination of Random Basis
Comments: ICLR 2024. Our code is available here: this https URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[495]  arXiv:2310.03718 (replaced) [pdf, other]
Title: Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[496]  arXiv:2310.05058 (replaced) [pdf, other]
Title: Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Comments: Accepted to BMVC 2023 20pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[497]  arXiv:2310.07027 (replaced) [pdf, other]
Title: Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images
Comments: Accepted by CVPR 2024 Workshop Data Curation and Augmentation in Enhancing Medical Imaging Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[498]  arXiv:2310.07240 (replaced) [pdf, other]
Title: CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[499]  arXiv:2310.07355 (replaced) [pdf, other]
Title: IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[500]  arXiv:2310.11865 (replaced) [pdf, other]
Title: Effective and Efficient Federated Tree Learning on Hybrid Data
Subjects: Machine Learning (cs.LG)
[501]  arXiv:2310.13462 (replaced) [pdf, other]
Title: Computing the matrix exponential and the Cholesky factor of a related finite horizon Gramian
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
[502]  arXiv:2310.13639 (replaced) [pdf, other]
Title: Contrastive Preference Learning: Learning from Human Feedback without RL
Comments: ICLR 2024. Code released at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[503]  arXiv:2310.14821 (replaced) [pdf, other]
Title: Mysticeti: Reaching the Limits of Latency with Uncertified DAGs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[504]  arXiv:2310.16452 (replaced) [pdf, other]
Title: Faithful Path Language Modeling for Explainable Recommendation over Knowledge Graph
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[505]  arXiv:2310.19548 (replaced) [pdf, other]
Title: Approximation Theory, Computing, and Deep Learning on the Wasserstein Space
Comments: Typos fixed
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Functional Analysis (math.FA)
[506]  arXiv:2310.20636 (replaced) [pdf, other]
Title: Using Skew to Assess the Quality of GAN-generated Image Features
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[507]  arXiv:2311.01025 (replaced) [pdf, other]
Title: Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[508]  arXiv:2311.04551 (replaced) [pdf, other]
Title: Earth Observation based multi-scale analysis of crop diversity in the European Union: first insights for agro-environmental policies
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[509]  arXiv:2311.05524 (replaced) [src]
Title: SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification
Comments: This version is essentially an updated version of the initial SeaTurtleID paper (arXiv:2211.10307) and from now on it can be found as a replacement of the latter paper. You can also find the published version here: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[510]  arXiv:2311.07338 (replaced) [pdf, other]
Title: A mathematical model of the visual MacKay effect
Subjects: Optimization and Control (math.OC); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Neurons and Cognition (q-bio.NC)
[511]  arXiv:2311.07614 (replaced) [src]
Title: Application of a Dense Fusion Attention Network in Fault Diagnosis of Centrifugal Fan
Comments: The research direction between authors needs to be adjusted, so the preprint is requested to be revoked
Subjects: Machine Learning (cs.LG)
[512]  arXiv:2311.07978 (replaced) [pdf, other]
Title: How good are Large Language Models on African Languages?
Comments: Under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[513]  arXiv:2311.09175 (replaced) [pdf, other]
Title: Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[514]  arXiv:2311.10292 (replaced) [pdf, other]
Title: Realization of a programmable multi-purpose photonic quantum memory with over-thousand qubit manipulations
Comments: 17 pages, 19 figures
Journal-ref: Phys. Rev. X 14, 021018 (2024)
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Optics (physics.optics)
[515]  arXiv:2311.11645 (replaced) [pdf, other]
Title: Exploding AI Power Use: an Opportunity to Rethink Grid Planning and Management
Comments: Accepted by ACM e-Energy '24: the 15th ACM International Conference on Future and Sustainable Energy Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
[516]  arXiv:2311.11789 (replaced) [pdf, other]
Title: Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
[517]  arXiv:2311.12410 (replaced) [pdf, other]
Title: nach0: Multimodal Natural and Chemical Languages Foundation Model
Comments: Accepted to Chemical Science Journal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[518]  arXiv:2311.16300 (replaced) [pdf, ps, other]
Title: Towards Energysheds: A Technical Definition and Cooperative Framework for Future Power System Operations
Subjects: Systems and Control (eess.SY)
[519]  arXiv:2312.01047 (replaced) [pdf, other]
Title: A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization
Comments: 43 pages, 4 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[520]  arXiv:2312.03001 (replaced) [pdf, ps, other]
Title: Computer Vision for Increased Operative Efficiency via Identification of Instruments in the Neurosurgical Operating Room: A Proof-of-Concept Study
Authors: Tanner J. Zachem (1,2), Sully F. Chen (1), Vishal Venkatraman (1), David AW Sykes (1), Ravi Prakash (2), Koumani W. Ntowe (1), Mikhail A. Bethell (1), Samantha Spellicy (1), Alexander D Suarez (1), Weston Ross (1), Patrick J. Codd (1,2) ((1) Department of Neurosurgery, Duke University School of Medicine, Durham, NC, USA, (2) Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA)
Comments: Data is openly available through The Open Science Framework: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[521]  arXiv:2312.03273 (replaced) [pdf, other]
Title: Perfectly matched layers for the Boltzmann equation: stability and sensitivity analysis
Comments: 29 pages, 3 figures, 12 tables
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
[522]  arXiv:2312.03673 (replaced) [pdf, other]
Title: On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[523]  arXiv:2312.04191 (replaced) [pdf, ps, other]
Title: Subsets of groups with context-free preimages
Authors: Alex Levine
Comments: 23 pages, 3 figures. Various corrections and removal of (former) Section 3
Subjects: Group Theory (math.GR); Formal Languages and Automata Theory (cs.FL)
[524]  arXiv:2312.05706 (replaced) [pdf, other]
Title: Bit Blasting Probabilistic Programs
Subjects: Programming Languages (cs.PL)
[525]  arXiv:2312.07865 (replaced) [pdf, other]
Title: SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2312.08528 (replaced) [pdf, other]
Title: auto-sktime: Automated Time Series Forecasting
Comments: Accepted at LION18
Subjects: Machine Learning (cs.LG)
[527]  arXiv:2312.09956 (replaced) [pdf, other]
Title: An artificial neural network approach to finding the key length of the Vigenère cipher
Subjects: Cryptography and Security (cs.CR)
[528]  arXiv:2312.10076 (replaced) [pdf, ps, other]
Title: A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers
Comments: 19 pages, 1 table
Subjects: Computers and Society (cs.CY)
[529]  arXiv:2312.11793 (replaced) [pdf, other]
Title: An Effective Image Copy-Move Forgery Detection Using Entropy Information
Authors: Li Jiang, Zhaowei Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Multimedia (cs.MM)
[530]  arXiv:2312.12135 (replaced) [src]
Title: Object Detection for Automated Coronary Artery Using Deep Learning
Comments: The results in the article need fundamental corrections
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[531]  arXiv:2312.12255 (replaced) [pdf, other]
Title: A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[532]  arXiv:2312.12651 (replaced) [pdf, other]
Title: Toxic Bias: Perspective API Misreads German as More Toxic
Comments: 12 pages, 12 figures
Subjects: Social and Information Networks (cs.SI)
[533]  arXiv:2312.13912 (replaced) [pdf, other]
Title: Solving Long-run Average Reward Robust MDPs via Stochastic Games
Subjects: Artificial Intelligence (cs.AI)
[534]  arXiv:2312.14925 (replaced) [pdf, ps, other]
Title: A Survey of Reinforcement Learning from Human Feedback
Subjects: Machine Learning (cs.LG)
[535]  arXiv:2312.16043 (replaced) [pdf, other]
Title: An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification
Authors: Hyenkyun Woo
Comments: 26 pages, 9 figures, revised version
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[536]  arXiv:2312.16729 (replaced) [pdf, ps, other]
Title: Behavioural pseudometrics for continuous-time diffusions
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL); Probability (math.PR)
[537]  arXiv:2401.05609 (replaced) [pdf, ps, other]
Title: A cable finite element formulation based on exact tension field for static nonlinear analysis of cable structures
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[538]  arXiv:2401.07490 (replaced) [pdf, ps, other]
Title: Existence of MMS Allocations with Mixed Manna
Authors: Kevin Hsu
Comments: 11 pages. A mistake in the previous version has been fixed, and a new reference has been added
Subjects: Computer Science and Game Theory (cs.GT)
[539]  arXiv:2401.09332 (replaced) [pdf, other]
Title: Synergistic Reinforcement and Imitation Learning for Vision-driven Autonomous Flight of UAV Along River
Comments: Submitted to IROS2024
Subjects: Robotics (cs.RO)
[540]  arXiv:2401.10990 (replaced) [pdf, other]
Title: A Nonlinear Observer Design for the Discrete-time Systems: Exploiting Matrix-Multiplier-based LMI Approach
Authors: Shivaraj Mohite
Subjects: Systems and Control (eess.SY)
[541]  arXiv:2401.11325 (replaced) [pdf, other]
Title: Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[542]  arXiv:2401.11567 (replaced) [pdf, other]
Title: Deterministic Multi-stage Constellation Reconfiguration Using Integer Linear Programming and Sequential Decision-Making Methods
Comments: 39 pages, 13 figures, submitted to the Journal of Spacecraft and Rockets
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[543]  arXiv:2401.11647 (replaced) [pdf, other]
Title: LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[544]  arXiv:2401.11824 (replaced) [pdf, other]
Title: Rethinking Centered Kernel Alignment in Knowledge Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2401.13555 (replaced) [pdf, other]
Title: Benchmarking the Fairness of Image Upsampling Methods
Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[546]  arXiv:2401.13839 (replaced) [pdf, other]
Title: Edge-coloring sparse graphs with $Δ$ colors in quasilinear time
Authors: Lukasz Kowalik
Comments: This version differs significantly from the previous one. Here, a partitioning technique of Zhou et al. has been applied, which simplified the randomized algorithm considerably and allowed for a deterministic algorithm, though with worse dependency on mad(G) than the randomized one
Subjects: Data Structures and Algorithms (cs.DS)
[547]  arXiv:2401.14029 (replaced) [pdf, other]
Title: Towards a Systems Theory of Algorithms
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
[548]  arXiv:2401.14533 (replaced) [pdf, other]
Title: My Future with My Chatbot: A Scenario-Driven, User-Centric Approach to Anticipating AI Impacts
Subjects: Computers and Society (cs.CY)
[549]  arXiv:2401.14831 (replaced) [pdf, other]
Title: The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental Relations
Comments: Submitted at IEEE ITSC 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE); Image and Video Processing (eess.IV)
[550]  arXiv:2401.15583 (replaced) [pdf, other]
Title: SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-15, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[551]  arXiv:2401.17159 (replaced) [pdf, other]
Title: Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis
Comments: Accepted at IJCAI 2024
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
[552]  arXiv:2402.00160 (replaced) [pdf, other]
Title: Emergency Department Decision Support using Clinical Pseudo-notes
Subjects: Computation and Language (cs.CL)
[553]  arXiv:2402.01116 (replaced) [pdf, other]
Title: Scalable Multi-modal Model Predictive Control via Duality-based Interaction Predictions
Comments: Accepted at IEEE Intelligent Vehicles Symposium 2024
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[554]  arXiv:2402.03165 (replaced) [pdf, other]
Title: Risk-Aware MPC for Stochastic Systems with Runtime Temporal Logics
Comments: 7 pages, 4 figures, 1 table, Accepted for ADHS 2024
Subjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)
[555]  arXiv:2402.03305 (replaced) [pdf, other]
Title: Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?
Comments: 13 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[556]  arXiv:2402.04889 (replaced) [pdf, other]
Title: Detecting Generated Native Ads in Conversational Search
Comments: WWW'24 Short Papers Track; 4 pages
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[557]  arXiv:2402.05067 (replaced) [pdf, other]
Title: A Novel Paradigm in Solving Multiscale Problems
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[558]  arXiv:2402.06046 (replaced) [pdf, ps, other]
Title: Anatomy of a Robotaxi Crash: Lessons from the Cruise Pedestrian Dragging Mishap
Authors: Philip Koopman
Comments: 15 pages, 2 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[559]  arXiv:2402.06537 (replaced) [pdf, other]
Title: Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows
Comments: Accepted to CRV 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560]  arXiv:2402.06665 (replaced) [pdf, other]
Title: The Essential Role of Causality in Foundation World Models for Embodied AI
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
[561]  arXiv:2402.09185 (replaced) [pdf, other]
Title: Flattability of Priority Vector Addition Systems
Comments: 24 pages, 2 figures, full version of paper at ICALP 2024
Subjects: Formal Languages and Automata Theory (cs.FL)
[562]  arXiv:2402.10142 (replaced) [pdf, other]
Title: Tracking Changing Probabilities via Dynamic Learners
Authors: Omid Madani
Comments: 63 pages, 24 figures, 17 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[563]  arXiv:2402.10321 (replaced) [pdf, other]
Title: LaserSAM: Zero-Shot Change Detection Using Visual Segmentation of Spinning LiDAR
Comments: 9 pages (8 content, 1 references). 9 figures, Presented at 2024 Conference on Robots and Vision (CRV)
Subjects: Robotics (cs.RO)
[564]  arXiv:2402.10741 (replaced) [pdf, other]
Title: Identifying heterogeneous micromechanical properties of biological tissues via physics-informed neural networks
Subjects: Numerical Analysis (math.NA); Biological Physics (physics.bio-ph)
[565]  arXiv:2402.11919 (replaced) [pdf, other]
Title: Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts
Journal-ref: Expert Systems with Applications (2024): 123431
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[566]  arXiv:2402.12147 (replaced) [pdf, other]
Title: Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language Models
Authors: Vinay Setty
Comments: Accepted in SIGIR 2024 (industry track)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[567]  arXiv:2402.12303 (replaced) [pdf, other]
Title: UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking
Comments: Accepted to ICRA 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[568]  arXiv:2402.12365 (replaced) [pdf, other]
Title: Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Fluid Dynamics (physics.flu-dyn)
[569]  arXiv:2402.12652 (replaced) [pdf, other]
Title: PDEformer: Towards a Foundation Model for One-Dimensional Partial Differential Equations
Subjects: Numerical Analysis (math.NA)
[570]  arXiv:2402.12712 (replaced) [pdf, other]
Title: MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Comments: 3D generation, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571]  arXiv:2402.13250 (replaced) [pdf, other]
Title: Video ReCap: Recursive Captioning of Hour-Long Videos
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572]  arXiv:2402.13523 (replaced) [pdf, other]
Title: Balancing Spectral, Temporal and Spatial Information for EEG-based Alzheimer's Disease Classification
Comments: 4 pages, 3 figures, conference paper
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[573]  arXiv:2402.14846 (replaced) [pdf, other]
Title: Stick to Your Role! Context-dependence and Stability of Personal Value Expression in Large Language Models
Comments: The project website and code are available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[574]  arXiv:2402.15340 (replaced) [pdf, ps, other]
Title: MetaStates: An Approach for Representing Human Workers' Psychophysiological States in the Industrial Metaverse
Comments: 11 pages, 6 figures, 4 tables, journal
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
[575]  arXiv:2402.15604 (replaced) [pdf, other]
Title: Goal-Reaching Trajectory Design Near Danger with Piecewise Affine Reach-avoid Computation
Comments: The first two authors contributed equally to the work. This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[576]  arXiv:2402.16594 (replaced) [pdf, other]
Title: CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2402.17420 (replaced) [pdf, other]
Title: PANDAS: Prototype-based Novel Class Discovery and Detection
Comments: Accepted to the Conference on Lifelong Learning Agents (CoLLAs 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[578]  arXiv:2402.17516 (replaced) [pdf, other]
Title: QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[579]  arXiv:2402.18002 (replaced) [pdf, other]
Title: Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist
Comments: Accepted at ICRA-2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[580]  arXiv:2402.18012 (replaced) [pdf, other]
Title: Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[581]  arXiv:2402.18370 (replaced) [pdf, other]
Title: Adversarial Example Soups: Improving Transferability and Stealthiness for Free
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[582]  arXiv:2402.19404 (replaced) [pdf, other]
Title: EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[583]  arXiv:2403.00372 (replaced) [pdf, other]
Title: HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Journal-ref: IEEE/CVF conference on computer vision and pattern recognition 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584]  arXiv:2403.00549 (replaced) [pdf, other]
Title: Relaxometry Guided Quantitative Cardiac Magnetic Resonance Image Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[585]  arXiv:2403.00594 (replaced) [pdf, other]
Title: Discrete minimizers of the interaction energy in collective behavior: a brief numerical and analytic review
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
[586]  arXiv:2403.01081 (replaced) [pdf, other]
Title: LAB: Large-Scale Alignment for ChatBots
Comments: Corresponding Author: Akash Srivastava. Equal Contribution: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Akash Srivastava, Code: this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[587]  arXiv:2403.03239 (replaced) [pdf, ps, other]
Title: Note: Harnessing Tellurium Nanoparticles in the Digital Realm Plasmon Resonance, in the Context of Brewster's Angle and the Drude Model for Fake News Adsorption in Incomplete Information Games
Authors: Yasuko Kawahata
Comments: Tellurium Nanoparticles, Snell's Law, Soliton Solution, Anamorphic Surfaces, Nonlinear Dynamics, Fake News Adsorption, User Behavior Modeling, Health Improvement Strategies, Plasmonic Sensors This paper is partially an attempt to utilize "Generative AI" and was written with educational intent. There are currently no plans for it to become a peer-reviewed paper
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[588]  arXiv:2403.03367 (replaced) [pdf, ps, other]
Title: am-AMM: An Auction-Managed Automated Market Maker
Subjects: Trading and Market Microstructure (q-fin.TR); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Mathematical Finance (q-fin.MF)
[589]  arXiv:2403.04918 (replaced) [pdf, other]
Title: Secure Information Embedding and Extraction in Forensic 3D Fingerprinting
Subjects: Cryptography and Security (cs.CR)
[590]  arXiv:2403.05000 (replaced) [pdf, other]
Title: Medical Speech Symptoms Classification via Disentangled Representation
Comments: Accepted by the 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024)
Subjects: Artificial Intelligence (cs.AI)
[591]  arXiv:2403.05585 (replaced) [pdf, ps, other]
Title: Plasmon Resonance Model: Investigation of Analysis of Fake News Diffusion Model with Third Mover Intervention Using Soliton Solution in Non-Complete Information Game under Repeated Dilemma Condition
Authors: Yasuko Kawahata
Comments: Plasmon Resonance Model, Soliton Solution, Third Mover,Fake News, Non-Complete Information Game, Nonlinear Partial Differential Equations, First Mover, Second Mover, Third Mover, Diffusion Dynamics, Iteration Dilemma
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[592]  arXiv:2403.05593 (replaced) [pdf, ps, other]
Title: Introducing First-Principles Calculations: New Approach to Group Dynamics and Bridging Social Phenomena in TeNP-Chain Based Social Dynamics Simulations
Authors: Yasuko Kawahata
Comments: TeNP Chains, First-principles calculations, Tellurium nanoparticles (TeNPs), Graphene, Fake news dissemination, Social cohesion, Information Flow Disruption, Quantum Mechanics, Interdisciplinary approach, Misinformation mitigation
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Physics Education (physics.ed-ph)
[593]  arXiv:2403.06119 (replaced) [pdf, other]
Title: CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[594]  arXiv:2403.08127 (replaced) [pdf, ps, other]
Title: Guidelines for the Creation of Analysis Ready Data
Comments: 49 pages, 3 figures, 3 tables, and 5 appendices
Subjects: Databases (cs.DB); Data Analysis, Statistics and Probability (physics.data-an); Other Statistics (stat.OT)
[595]  arXiv:2403.08828 (replaced) [pdf, other]
Title: People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[596]  arXiv:2403.10731 (replaced) [pdf, other]
Title: Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[597]  arXiv:2403.10853 (replaced) [pdf, other]
Title: Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[598]  arXiv:2403.11585 (replaced) [pdf, other]
Title: Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)
[599]  arXiv:2403.13547 (replaced) [pdf, other]
Title: Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[600]  arXiv:2403.14385 (replaced) [pdf, other]
Title: Estimating Causal Effects with Double Machine Learning -- A Method Evaluation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM); Methodology (stat.ME)
[601]  arXiv:2403.16771 (replaced) [pdf, ps, other]
Title: Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
Comments: 9 pages, 2 figures, to be published in LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[602]  arXiv:2403.17169 (replaced) [pdf, other]
Title: QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
Comments: 11 pages, 1 figure,Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[603]  arXiv:2404.01276 (replaced) [pdf, other]
Title: Variable-Length Stop-Feedback Coding for Minimum Age of Incorrect Information
Comments: Minor clarifications
Subjects: Information Theory (cs.IT)
[604]  arXiv:2404.01413 (replaced) [pdf, other]
Title: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Emerging Technologies (cs.ET); Machine Learning (stat.ML)
[605]  arXiv:2404.02113 (replaced) [pdf, other]
Title: Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL
Subjects: Machine Learning (cs.LG)
[606]  arXiv:2404.03147 (replaced) [pdf, other]
Title: Eigenpruning
Comments: Extended abstract accepted to LatinX at NAACL 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[607]  arXiv:2404.04496 (replaced) [pdf, other]
Title: Towards Better Graph Neural Network-based Fault Localization Through Enhanced Code Representation
Subjects: Software Engineering (cs.SE)
[608]  arXiv:2404.04515 (replaced) [pdf, other]
Title: Predictable Verification using Intrinsic Definitions
Comments: Published at PLDI 2024
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
[609]  arXiv:2404.04814 (replaced) [pdf, other]
Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models
Authors: Yi Zhang, Jitao Sang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[610]  arXiv:2404.05253 (replaced) [pdf, other]
Title: CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement
Comments: 10 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[611]  arXiv:2404.05466 (replaced) [pdf, other]
Title: Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
Comments: 6 pages, 3 figures, Accepted at ICMEW 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[612]  arXiv:2404.05576 (replaced) [pdf, other]
Title: Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms
Subjects: Machine Learning (cs.LG)
[613]  arXiv:2404.06362 (replaced) [pdf, other]
Title: Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[614]  arXiv:2404.06741 (replaced) [pdf, other]
Title: An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video
Comments: arXiv admin note: text overlap with arXiv:2401.13414
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[615]  arXiv:2404.07017 (replaced) [pdf, other]
Title: Improving Language Model Reasoning with Self-motivated Learning
Comments: Accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[616]  arXiv:2404.07544 (replaced) [pdf, other]
Title: From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Comments: 50 pages, 48 figures, preprint; Fixed typos
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[617]  arXiv:2404.07768 (replaced) [pdf, ps, other]
Title: Using Letter Positional Probabilities to Assess Word Complexity
Authors: Michael Dalvean
Comments: 30 Pages, 15 Tables
Subjects: Computation and Language (cs.CL)
[618]  arXiv:2404.08423 (replaced) [pdf, other]
Title: SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies
Comments: 27 pages, 12 figures
Subjects: Machine Learning (cs.LG); Physics and Society (physics.soc-ph); Populations and Evolution (q-bio.PE)
[619]  arXiv:2404.08995 (replaced) [pdf, other]
Title: Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery
Comments: 9 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[620]  arXiv:2404.09005 (replaced) [pdf, other]
Title: Proof-of-Learning with Incentive Security
Comments: 17 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[621]  arXiv:2404.09285 (replaced) [pdf, other]
Title: Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing
Comments: Submitted to IEEE TSC
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[622]  arXiv:2404.09709 (replaced) [pdf, other]
Title: Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context
Comments: Accepted by SIGIR 2024, 10 pages, 5 figures, 5 tables
Journal-ref: SIGIR 2024
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[623]  arXiv:2404.10584 (replaced) [pdf, other]
Title: ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[624]  arXiv:2404.10610 (replaced) [pdf, other]
Title: Shining Light into the Tunnel: Understanding and Classifying Network Traffic of Residential Proxies
Subjects: Cryptography and Security (cs.CR)
[625]  arXiv:2404.10702 (replaced) [pdf, other]
Title: Retrieval Augmented Verification: Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts
Subjects: Multimedia (cs.MM)
[626]  arXiv:2404.10756 (replaced) [pdf, other]
Title: A High-Order Conservative Cut Finite Element Method for Problems in Time-Dependent Domains
Comments: 27 pages, 20 figures
Subjects: Numerical Analysis (math.NA)
[627]  arXiv:2404.10921 (replaced) [pdf, other]
Title: Tao: Re-Thinking DL-based Microarchitecture Simulation
Comments: Published in POMACS and SIGMETRICS'24
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[628]  arXiv:2404.12079 (replaced) [pdf, other]
Title: Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning
Authors: Hyunwoo Park
Comments: 8 pages, 6 figures
Subjects: Robotics (cs.RO)
[629]  arXiv:2404.12083 (replaced) [pdf, other]
Title: MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
Comments: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[630]  arXiv:2404.12538 (replaced) [pdf, other]
Title: TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
Comments: 2024 IEEE Intelligent Vehicles Symposium (IV)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[631]  arXiv:2404.12994 (replaced) [pdf, other]
Title: Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs
Comments: Accepted at SIGIR 2024 long paper track
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[632]  arXiv:2404.13070 (replaced) [pdf, other]
Title: Evidence from counterfactual tasks supports emergent analogical reasoning in large language models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[633]  arXiv:2404.13680 (replaced) [pdf, other]
Title: PoseAnimate: Zero-shot high fidelity pose controllable character animation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[634]  arXiv:2404.14606 (replaced) [pdf, ps, other]
Title: Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification
Journal-ref: Journal of Computer Technology and Applied Mathematics, vol. 1, no. 1, Apr. 2024, pp. 46-53,
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[635]  arXiv:2404.15096 (replaced) [pdf, other]
Title: Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot
Comments: Accepted by Ubiquitous Robots 2024
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[636]  arXiv:2404.15206 (replaced) [pdf, other]
Title: Does Instruction Tuning Make LLMs More Consistent?
Subjects: Computation and Language (cs.CL)
[637]  arXiv:2404.15369 (replaced) [pdf, other]
Title: Can a Machine be Conscious? Towards Universal Criteria for Machine Consciousness
Comments: This work was supported by the UKRI CDT in AI for Healthcare, this http URL (Grant No. EP/S023283/1)
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[638]  arXiv:2404.16014 (replaced) [pdf, other]
Title: Improving Dictionary Learning with Gated Sparse Autoencoders
Comments: 15 main text pages, 22 appendix pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[639]  arXiv:2404.16398 (replaced) [pdf, other]
Title: Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval
Comments: 20 pages, 8 sugures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[640]  arXiv:2404.16726 (replaced) [pdf, other]
Title: History repeats Itself: A Baseline for Temporal Knowledge Graph Forecasting
Comments: Accepted at IJCAI 2024
Subjects: Machine Learning (cs.LG)
[641]  arXiv:2404.16821 (replaced) [pdf, other]
Title: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[642]  arXiv:2404.16894 (replaced) [pdf, other]
Title: On TinyML and Cybersecurity: Electric Vehicle Charging Infrastructure Use Case
Comments: Submitted to IEEE Access; Code is available at GitHub link: this https URL
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[643]  arXiv:2404.17454 (replaced) [pdf, other]
Title: Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond
Comments: 17 pages, 2 figures. Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
[644]  arXiv:2404.17489 (replaced) [pdf, other]
Title: Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation
Comments: 14 pages, 4 algorithms, 3 figures, 5 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[645]  arXiv:2404.17771 (replaced) [pdf, ps, other]
Title: Characterization of dim light response in DVS pixel: Discontinuity of event triggering time
Authors: Xiao Jiang, Fei Zhou
Comments: 6 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[646]  arXiv:2404.17774 (replaced) [pdf, other]
Title: High-quality Surface Reconstruction using Gaussian Surfels
Comments: Results added and improved
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[647]  arXiv:2404.17918 (replaced) [pdf, other]
Title: I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures
Subjects: Computation and Language (cs.CL)
[648]  arXiv:2404.18081 (replaced) [pdf, other]
Title: ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[649]  arXiv:2404.18151 (replaced) [pdf, other]
Title: Decidability of Graph Neural Networks via Logical Characterizations
Subjects: Logic in Computer Science (cs.LO)
[650]  arXiv:2404.18159 (replaced) [pdf, other]
Title: Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models
Comments: 45 pages, 8 figures, 11 tables (3 in the Appendix), Journal paper
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
[651]  arXiv:2404.18225 (replaced) [pdf, other]
Title: Quadruped robot traversing 3D complex environments with limited perception
Comments: 10 pages, 8 figures,submitted to iros2024
Subjects: Robotics (cs.RO)
[652]  arXiv:2404.18255 (replaced) [pdf, other]
Title: PatentGPT: A Large Language Model for Intellectual Property
Comments: 19 pages, 9 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[653]  arXiv:2404.18286 (replaced) [pdf, other]
Title: Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
Comments: Accepted to the Americas NLP Workshop at NAACL 2024 (this https URL)
Subjects: Computation and Language (cs.CL)
[654]  arXiv:2404.18311 (replaced) [pdf, ps, other]
Title: Towards Real-time Learning in Large Language Models: A Critical Review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[655]  arXiv:2404.18385 (replaced) [pdf, other]
Title: Equivalence: An analysis of artists' roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[656]  arXiv:2404.18433 (replaced) [pdf, other]
Title: ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[657]  arXiv:2404.18533 (replaced) [pdf, other]
Title: Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[658]  arXiv:2404.18562 (replaced) [pdf, ps, other]
Title: Time Reversal for Near-Field Communications on Multi-chip Wireless Networks
Subjects: Hardware Architecture (cs.AR)
[659]  arXiv:2404.18736 (replaced) [pdf, other]
Title: Mapping the Potential of Explainable Artificial Intelligence (XAI) for Fairness Along the AI Lifecycle
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[660]  arXiv:2404.18783 (replaced) [pdf, ps, other]
Title: Improved bounds for group testing in arbitrary hypergraphs
Comments: arXiv admin note: text overlap with arXiv:2307.09608
Subjects: Data Structures and Algorithms (cs.DS)
[661]  arXiv:2404.18800 (replaced) [pdf, other]
Title: Extending h adaptivity with refinement patterns
Comments: Preprint accepted for publication in the Book series "Advances in Applied Mechanics, Vol 59, by Elsevier
Subjects: Numerical Analysis (math.NA)
[662]  arXiv:2404.18821 (replaced) [pdf, other]
Title: Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies
Comments: ACM e-Energy 2024
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[663]  arXiv:2404.18826 (replaced) [pdf, other]
Title: Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization
Comments: 8 pages, 3 figures, submitted to ASONAM 2024
Subjects: Social and Information Networks (cs.SI)
[664]  arXiv:2404.18848 (replaced) [pdf, other]
Title: FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[ total of 664 entries: 1-664 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)