Neural and Evolutionary Computing
- [1] arXiv:2406.04607 [pdf, ps, html, other]
-
Title: MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic AlgorithmSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: this https URL
- [2] arXiv:2406.04663 [pdf, ps, html, other]
-
Title: LLM-POET: Evolving Complex Environments using Large Language ModelsSubjects: Neural and Evolutionary Computing (cs.NE)
Creating systems capable of generating virtually infinite variations of complex and novel behaviour without predetermined goals or limits is a major challenge in the field of AI. This challenge has been addressed through the development of several open-ended algorithms that can continuously generate new and diverse behaviours, such as the POET and Enhanced-POET algorithms for co-evolving environments and agent behaviour. One of the challenges with existing methods however, is that they struggle to continuously generate complex environments. In this work, we propose LLM-POET, a modification of the POET algorithm where the environment is both created and mutated using a Large Language Model (LLM). By fine-tuning a LLM with text representations of Evolution Gym environments and captions that describe the environment, we were able to generate complex and diverse environments using natural language. We found that not only could the LLM produce a diverse range of environments, but compared to the CPPNs used in Enhanced-POET for environment generation, the LLM allowed for a 34% increase in the performance gain of co-evolution. This increased performance suggests that the agents were able to learn a more diverse set of skills by training on more complex environments.
- [3] arXiv:2406.04733 [pdf, ps, other]
-
Title: Unsupervised representation learning with Hebbian synaptic and structural plasticity in brain-like feedforward neural networksSubjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Neural networks that can capture key principles underlying brain computation offer exciting new opportunities for developing artificial intelligence and brain-like computing algorithms. Such networks remain biologically plausible while leveraging localized forms of synaptic learning rules and modular network architecture found in the neocortex. Compared to backprop-driven deep learning approches, they provide more suitable models for deploying on neuromorphic hardware and have greater potential for scalability on large-scale computing clusters. The development of such brain-like neural networks depends on having a learning procedure that can build effective internal representations from data. In this work, we introduce and evaluate a brain-like neural network model capable of unsupervised representation learning. It builds on the Bayesian Confidence Propagation Neural Network (BCPNN), which has earlier been implemented as abstract as well as biophyscially detailed recurrent attractor neural networks explaining various cortical associative memory phenomena. Here we developed a feedforward BCPNN model to perform representation learning by incorporating a range of brain-like attributes derived from neocortical circuits such as cortical columns, divisive normalization, Hebbian synaptic plasticity, structural plasticity, sparse activity, and sparse patchy connectivity. The model was tested on a diverse set of popular machine learning benchmarks: grayscale images (MNIST, Fashion-MNIST), RGB natural images (SVHN, CIFAR-10), QSAR (MUV, HIV), and malware detection (EMBER). The performance of the model when using a linear classifier to predict the class labels fared competitively with conventional multi-layer perceptrons and other state-of-the-art brain-like neural networks.
- [4] arXiv:2406.04899 [pdf, ps, html, other]
-
Title: Sliding Window 3-Objective Pareto Optimization for Problems with Chance ConstraintsComments: To appear at PPSN 2024Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Constrained single-objective problems have been frequently tackled by evolutionary multi-objective algorithms where the constraint is relaxed into an additional objective. Recently, it has been shown that Pareto optimization approaches using bi-objective models can be significantly sped up using sliding windows (Neumann and Witt, ECAI 2023). In this paper, we extend the sliding window approach to $3$-objective formulations for tackling chance constrained problems. On the theoretical side, we show that our new sliding window approach improves previous runtime bounds obtained in (Neumann and Witt, GECCO 2023) while maintaining the same approximation guarantees. Our experimental investigations for the chance constrained dominating set problem show that our new sliding window approach allows one to solve much larger instances in a much more efficient way than the 3-objective approach presented in (Neumann and Witt, GECCO 2023).
- [5] arXiv:2406.04929 [pdf, ps, html, other]
-
Title: Protein pathways as a catalyst to directed evolution of the topology of artificial neural networksComments: 8 pages, 6 figuresSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
In the present article, we propose a paradigm shift on evolving Artificial Neural Networks (ANNs) towards a new bio-inspired design that is grounded on the structural properties, interactions, and dynamics of protein networks (PNs): the Artificial Protein Network (APN). This introduces several advantages previously unrealized by state-of-the-art approaches in NE: (1) We can draw inspiration from how nature, thanks to millions of years of evolution, efficiently encodes protein interactions in the DNA to translate our APN to silicon DNA. This helps bridge the gap between syntax and semantics observed in current NE approaches. (2) We can learn from how nature builds networks in our genes, allowing us to design new and smarter networks through EA evolution. (3) We can perform EA crossover/mutation operations and evolution steps, replicating the operations observed in nature directly on the genotype of networks, thus exploring and exploiting the phenotypic space, such that we avoid getting trapped in sub-optimal solutions. (4) Our novel definition of APN opens new ways to leverage our knowledge about different living things and processes from biology. (5) Using biologically inspired encodings, we can model more complex demographic and ecological relationships (e.g., virus-host or predator-prey interactions), allowing us to optimise for multiple, often conflicting objectives.
New submissions for Monday, 10 June 2024 (showing 5 of 5 entries )
- [6] arXiv:2406.04612 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Revisiting Attention Weights as Interpretations of Message-Passing Neural NetworksComments: 11 pages, 3 figures, 5 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Social and Information Networks (cs.SI)
The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, and do not take the precise and careful calculation of edge attribution into consideration. In our study, we aim to fill the gap between the widespread usage of attention-enabled MPNNs and their potential in largely under-explored explainability, a topic that has been actively investigated in other areas. To this end, as the first attempt, we formalize the problem of edge attribution from attention weights in GNNs. Then, we propose GATT, an edge attribution calculation method built upon the computation tree. Through comprehensive experiments, we demonstrate the effectiveness of our proposed method when evaluating attributions from GATs. Conversely, we empirically validate that simply averaging attention weights over graph attention layers is insufficient to interpret the GAT model's behavior. Code is publicly available at this https URL.
- [7] arXiv:2406.04706 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Winner-takes-all learners are geometry-aware conditional density estimatorsVictor Letzelter (LTCI, S2A, IDS, IP Paris), David Perera (LTCI, S2A, IDS, IP Paris), Cédric Rommel, Mathieu Fontaine (LTCI, S2A, IDS, IP Paris), Slim Essid (IDS, S2A, LTCI, IP Paris), Gael Richard (S2A, IDS, LTCI, IP Paris), Patrick PérezComments: International Conference on Machine Learning, Jul 2024, Vienne (Autriche), AustriaSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Probability (math.PR); Machine Learning (stat.ML)
Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypotheses for uncertainty quantification is still an open this http URL this work, we show how to leverage the appealing geometric properties of the Winner-takes-all learners for conditional density estimation, without modifying its original training scheme. We theoretically establish the advantages of our novel estimator both in terms of quantization and density estimation, and we demonstrate its competitiveness on synthetic and real-world datasets, including audio data.
Cross submissions for Monday, 10 June 2024 (showing 2 of 2 entries )
- [8] arXiv:2302.13696 (replaced) [pdf, ps, html, other]
-
Title: Moderate Adaptive Linear Units (MoLU)Comments: 4 pages, 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Neural and Evolutionary Computing (cs.NE)
We propose a new high-performance activation function, Moderate Adaptive Linear Units (MoLU), for the deep neural network. The MoLU is a simple, beautiful and powerful activation function that can be a good main activation function among hundreds of activation functions. Because the MoLU is made up of the elementary functions, not only it is a infinite diffeomorphism (i.e. smooth and infinitely differentiable over whole domains), but also it decreases training time.
- [9] arXiv:2401.10009 (replaced) [pdf, ps, html, other]
-
Title: An optimization-based equilibrium measure describes non-equilibrium steady state dynamics: application to edge of chaosComments: 21 pages, 9 figures, revised version 2Subjects: Neurons and Cognition (q-bio.NC); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE)
Understanding neural dynamics is a central topic in machine learning, non-linear physics and neuroscience. However, the dynamics is non-linear, stochastic and particularly non-gradient, i.e., the driving force can not be written as gradient of a potential. These features make analytic studies very challenging. The common tool is the path integral approach or dynamical mean-field theory, but the drawback is that one has to solve the integro-differential or dynamical mean-field equations, which is computationally expensive and has no closed form solutions in general. From the aspect of associated Fokker-Planck equation, the steady state solution is generally unknown. Here, we treat searching for the steady states as an optimization problem, and construct an approximate potential related to the speed of the dynamics, and find that searching for the ground state of this potential is equivalent to running an approximate stochastic gradient dynamics or Langevin dynamics. Only in the zero temperature limit, the distribution of the original steady states can be achieved. The resultant stationary state of the dynamics follows exactly the canonical Boltzmann measure. Within this framework, the quenched disorder intrinsic in the neural networks can be averaged out by applying the replica method, which leads naturally to order parameters for the non-equilibrium steady states. Our theory reproduces the well-known result of edge-of-chaos, and further the order parameters characterizing the continuous transition are derived, and the order parameters are explained as fluctuations and responses of the steady states. Our method thus opens the door to analytically study the steady state landscape of the deterministic or stochastic high dimensional dynamics.
- [10] arXiv:2402.13852 (replaced) [pdf, ps, html, other]
-
Title: Neural Control System for Continuous Glucose Monitoring and MaintenanceComments: 9 Pages, 4 figures, ICLR 2024 Tiny Papers Track this https URLJournal-ref: The Second Tiny Papers Track at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY); Machine Learning (stat.ML)
Precise glucose level monitoring is critical for people with diabetes to avoid serious complications. While there are several methods for continuous glucose level monitoring, research on maintenance devices is limited. To mitigate the gap, we provide a novel neural control system for continuous glucose monitoring and management that uses differential predictive control. Our approach, led by a sophisticated neural policy and differentiable modeling, constantly adjusts insulin supply in real-time, thereby improving glucose level optimization in the body. This end-to-end method maximizes efficiency, providing personalized care and improved health outcomes, as confirmed by empirical evidence. Code and data are available at: \url{this https URL}.