Abstract
In the presence of Warden's detection, a maximization problem on transmission throughput from unmanned aerial vehicle (UAV) to legitimate nodes is considered and solved via UAV trajectory design, subject to covert, velocity and mobility constraints. With the building-distribution-based pathloss model and the Warden's uncertain location model, the formulated optimization problem is challenging to be tackled through standard offline optimization methods. Alternatively, a twin delayed deep deterministic policy gradient approach enhanced by multi-step learning and prioritized experience replay techniques, termed as multi-step TD3-PER, is proposed to help the UAV adaptively select velocity from continuous action space. Numerical results demonstrate the effectiveness of the proposed multi-step TD3-PER solution and showcase the corresponding superiorities against provided baselines.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Communications (ICC 2022) |
Publication status | Accepted/In press - 18 Jan 2022 |