TY - CHAP
T1 - Deep Reinforcement Learning for Discrete and Continuous Massive Access Control optimization
AU - Jiang, Nan
AU - Deng, Yansha
AU - Nallanathan, Arumugam
PY - 2020/6
Y1 - 2020/6
N2 - Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems, however, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision during the simultaneous massive. Despite that this collision problem has been treated in existing RACH schemes by organizing IoT devices' transmission and retransmission via the central control at the Base Station (BS), these existing RACH schemes are usually fixed over time, thus can hardly adapt to time-varying traffic patterns. In order to optimize the long-term objective in the number of success devices, this paper aims to design Deep Reinforcement Learning (DRL)-based optimizers with Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG) for optimizing RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). Specifically, we apply DQN to handle discrete action selection for the BO as well as the DQ schemes, and DDPG to handle continuous action selection for the ACB scheme. Both agents are integrated with Gated recurrent unit Gated Recurrent Unit (GRU) network to approximate their value function/policy, which can improve the optimization performance by capturing temporal traffic correlations. Numerical results showcase that our proposed DRL-based optimizers considerably outperform conventional heuristic solutions in terms of the number of success access devices.
AB - Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems, however, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision during the simultaneous massive. Despite that this collision problem has been treated in existing RACH schemes by organizing IoT devices' transmission and retransmission via the central control at the Base Station (BS), these existing RACH schemes are usually fixed over time, thus can hardly adapt to time-varying traffic patterns. In order to optimize the long-term objective in the number of success devices, this paper aims to design Deep Reinforcement Learning (DRL)-based optimizers with Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG) for optimizing RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). Specifically, we apply DQN to handle discrete action selection for the BO as well as the DQ schemes, and DDPG to handle continuous action selection for the ACB scheme. Both agents are integrated with Gated recurrent unit Gated Recurrent Unit (GRU) network to approximate their value function/policy, which can improve the optimization performance by capturing temporal traffic correlations. Numerical results showcase that our proposed DRL-based optimizers considerably outperform conventional heuristic solutions in terms of the number of success access devices.
KW - access control
KW - Deep reinforcement learning
KW - dynamic optimization
KW - random access
UR - http://www.scopus.com/inward/record.url?scp=85089477390&partnerID=8YFLogxK
U2 - 10.1109/ICC40277.2020.9149055
DO - 10.1109/ICC40277.2020.9149055
M3 - Conference paper
AN - SCOPUS:85089477390
T3 - IEEE International Conference on Communications
BT - 2020 IEEE International Conference on Communications, ICC 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Communications, ICC 2020
Y2 - 7 June 2020 through 11 June 2020
ER -