祝贺李论同学的论文被国际期刊Neurocomputing录用-机器人自主行为与人智协同研究团队

祝贺李论同学的论文被国际期刊Neurocomputing录用

2024/03/20

Li L, Zhang X, Qian C, et al. Cross coordination of behavior clone and reinforcement learning for autonomous within-visual-range air combat. Neurocomputing, 2024, accepted.

Abstract

In this article, we propose a novel hierarchical framework to resolve within-visual-range (WVR) air-to-air combat under complex nonlinear 6 degrees-of-freedom (6-DOF) dynamics of the aircraft and missile. The decision process is constructed with two layers from the top to the bottom and adopts reinforcement learning to solve them separately. The top layer designs a new combat policy to decide the autopilot commands (such as the target heading, velocity, and altitude) and missile launch according to the current combat situation. Then the bottom layer uses a control policy to answer the autopilot commands by calculating the actual input signals (deflections of the rudder, elevator, aileron, and throttle) for the aircraft. For the combat policy, we present a new learning method called “E2L” that can mimic the knowledge of the expert under the two-layer decision frame to inspire the intelligence of the agent in the early stage of training. This method establishes a cross coordination of behavior clone (BC) and proximal policy optimization (PPO). Under the mechanism, the agent is alternately updated around the latest strategy, using BC with gradient clipping and PPO with Kullback–Leibler divergence loss and the modified BC demonstration trajectories, which can learn competitive combat strategies more stably and quickly. Sufficient experimental results show that the proposed method can achieve better combat performance than the baselines.

上一条：赵铭慧老师在“参观罗克韦尔实验室”活动中为同学们带来精彩有趣的实验

下一条：地面与空中移动机器人课题组赴湖南参加国家自然科学基金重大项目年度报告