您当前的位置: 首页 >> 科研动态 >> 正文
祝贺李论同学的论文被国际期刊Neurocomputing录用
2024/03/31

Li L, Zhang X, Qian C, et al. Cross coordination of behavior clone and reinforcement learning for autonomous within-visual-range air combat. Neurocomputing, 2024, accepted.


Abstract


        In this article, we propose a novel hierarchical framework to resolve within-visual-range (WVR) air-to-air combat under complex nonlinear 6 degrees-of-freedom (6-DOF) dynamics of the aircraft and missile. The decision process is constructed with two layers from the top to the bottom and adopts reinforcement learning to solve them separately. The top layer designs a new combat policy to decide the autopilot commands (such as the target heading, velocity, and altitude) and missile launch according to the current combat situation. Then the bottom layer uses a control policy to answer the autopilot commands by calculating the actual input signals (deflections of the rudder, elevator, aileron, and throttle) for the aircraft. For the combat policy, we present a new learning method called “E2L” that can mimic the knowledge of the expert under the two-layer decision frame to inspire the intelligence of the agent in the early stage of training. This method establishes a cross coordination of behavior clone (BC) and proximal policy optimization (PPO). Under the mechanism, the agent is alternately updated around the latest strategy, using BC with gradient clipping and PPO with Kullback–Leibler divergence loss and the modified BC demonstration trajectories, which can learn competitive combat strategies more stably and quickly. Sufficient experimental results show that the proposed method can achieve better combat performance than the baselines.