He Wang
Please Note
2 records found
1
Actor-critic (AC) cooperative multiagent reinforcement learning (MARL) over directed graphs is studied in this article. The goal of the agents in MARL is to maximize the globally averaged return in a distributed way, i.e., each agent can only exchange information with its neighboring agents. AC methods proposed in the literature require the communication graphs to be undirected and the weight matrices to be doubly stochastic (more precisely, the weight matrices are row stochastic and their expectation are column stochastic). Differently from these methods, we propose a distributed AC algorithm for MARL over directed graph with fixed topology that only requires the weight matrix to be row stochastic. Then, we also study the MARL over directed graphs (possibly not connected) with changing topologies, proposing a different distributed AC algorithm based on the push-sum protocol that only requires the weight matrices to be column stochastic. Convergence of the proposed algorithms is proven for linear function approximation of the action value function. Simulations are presented to demonstrate the effectiveness of the proposed algorithms.
To fully unleash the potential of graphene-based devices for neuromorphic computing, we propose a graphene synapse and a graphene neuron that form together a basic Spiking Neural Network (SNN) unit, which can potentially be utilized to implement complex SNNs. Specifically, the proposed synapse enables two fundamental synaptic functionalities, i.e., Spike-Timing-Dependent Plasticity (STDP) and Long-Term Plasticity, and both Long-Term Potentiation (LTP) and Long-Term Depression (LTD) can be emulated with the same structure by properly adjusting its bias. The proposed neuron captures the essential Leaky Integrate and Fire spiking neuron behavior with post firing refractory interval. We demonstrate the proper operation of the graphene SNN unit by relying on a mixed simulation approach that embeds the high accuracy of atomistic level simulation of graphene structures conductance within the SPICE framework. Subsequently, we analyze the way graphene synaptic plasticity affects the behavior of a 2-layer SNN example consisting of 6 neurons and demonstrate that LTP significantly increases the number of firing events while LTD is diminishing them, as expected. To assess the plausibility of the graphene SNN reaction to input stimuli we simulate its behavior by means of both SPICE and NEST, a well established SNN simulation framework, and demonstrate that the obtained reactions, characterized in terms of total number of firing events and mean Inter-Spike Interval (ISI) length, are in close agreement, which clearly suggests that the proposed design exhibits a proper behavior. Further, we prove the unsupervised learning capabilities of the proposed design by considering a 2-layer SNN consisting of 30 neurons meant to recognize the characters 'A,' 'E,' 'I,' 'O,' and 'U,' represented with a 5 by 5 black and white pixel matrix. The SPICE simulation results indicate that the graphene SNN is able to perform unsupervised character recognition associated learning and that its recognition ability is robust to input character variations. Finally, we note that our proposal results in a small real-estate footprint (max. 30 nm^2 are required by one graphene-based device) and operates at 200 mV supply voltage, which suggest its suitability for the design of large-scale energy-efficient computing systems.