Development of a Flight Control Program using Reinforcement Learning

Create an agent that flies to the target while avoiding obstacles

Goal of the project

The goal of this project was to provide Maya, a small unmanned aerial vehicle for ground observation at the university, with an obstacle avoidance function. There are various methods to achieve obstacle avoidance, but our group decided to use reinforcement learning.

Result

Since the original control program of Maya was implemented in Simulink, I decided to use Matlab/Simulink for reinforcement learning. There are two main methods using Matlab. One is to create classes in m.files and learn them there. The second method is using Simulink.

As a first step, I created a vertical-only class modeled after a small airplane, Cessna, in an m.file. In this model, engine throttle and elevation rudder are used as actions. These manipulation quantities are said to be low-level and directly affect the behavior of the aircraft. The green ellipse shows the obstacle and the red dot shows the goal point.

Due to the low-level manipulation volume, the agent had to learn to fly first. Therefore the learning process was very unstable. In the above gif image, it can be seen that the aircraft turned up abruptly and the episode ended suddenly.

Here is the Github repository for this mode

Therefore, I decided to use high-level manipulation quantities such as direction and velocity as input and a control loop based on the existing transfer function. The left image below shows the Simulink configuration and the right image shows the problem setup. The green circle is the region to pass, and the black circle is the obstacle.

In the actual learning, the flight simulation in Simulink was very time-consuming. The problem was that we could not earn enough number of episodes. Therefore, we used the following very simplified motion model for traiing, and applied the model to the actual Maya model later.

\begin{align} \begin{pmatrix} X(n) \\ Y(n) \\ Z(n) \end{pmatrix} &= \begin{pmatrix} X(n-1) \\ Y(n-1) \\ Z(n-1) \end{pmatrix} + VK \cdot \begin{pmatrix} cos(\gamma(n)) cos(\chi(n)) \\ cos(\gamma(n)) sin(\chi(n)) \\ -sin(\gamma(n)) \end{pmatrix} \cdot Ts\\ \begin{pmatrix} \chi(n) \\ \gamma(n) \end{pmatrix} &= \begin{pmatrix} \chi(n-1) \\ \gamma(n-1) \end{pmatrix} + \begin{pmatrix} \dot{\chi}(n) \\ \dot{\gamma}(n) \end{pmatrix} \cdot Ts \end{align}

After implementation of the simplified model and tuning the parameters, I was able to pass the first passage point stably.

After further training the same model, it successfully passed the second passage point.

This learned model was then applied to the actual Maya control model. Because the training was performed with a simplified model based on the assumption of constant speed, the actual control model could not achieve the target. However, the trajectory of the model sufficiently indicated that it was aiming to reach the target. It was found that this method could be adapted if sufficient improvements were made.

 

The Github repository for this model is here (the copyright of the actual control model of Maya comes with the university, so unfortunately I can't publish it)