Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

Nan Jiang; Yansha Deng; Arumugam Nallanathan; Jonathon A. Chambers

doi:10.1109/JSAC.2019.2904366

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

Nan Jiang, Yansha Deng, Arumugam Nallanathan, Jonathon A. Chambers

Research output: Contribution to journal › Article › peer-review

94 Citations (Scopus)

318 Downloads (Pure)

Abstract

NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the
amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in “how to determine the configuration that maximizes the longterm average number of served IoT devices at each Transmission Time Interval (TTI) in an online fashion”. Given the complexity
of searching for optimal configuration, we first develop realtime configuration selection based on the tabular Q-learning (tabular-Q), the Linear Approximation based Q-learning (LAQ), and the Deep Neural Network based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via Actions Aggregation (AA-LA-Q and AADQN) and via Cooperative Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve
the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.

Original language	English
Article number	8664581
Pages (from-to)	1424-1440
Number of pages	17
Journal	IEEE Journal on Selected Areas in Communications
Volume	37
Issue number	6
Early online date	11 Mar 2019
DOIs	https://doi.org/10.1109/JSAC.2019.2904366
Publication status	Published - Jun 2019

Keywords

Narrowband Internet of Things
cooperative learning
real-time optimization
reinforcement learning
resource configuration

Access to Document

10.1109/JSAC.2019.2904366

Reinforcement Learning for Real-Time_JIANG_Publishedonline11March2019_GREEN AAM
(©2019 IEEE)
Accepted author manuscript, 2.3 MB

Cite this

@article{414b77dbd1074ca99274bb77948aa8f1,

title = "Reinforcement Learning for Real-Time Optimization in NB-IoT Networks",

abstract = "NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies theamount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in “how to determine the configuration that maximizes the longterm average number of served IoT devices at each Transmission Time Interval (TTI) in an online fashion”. Given the complexityof searching for optimal configuration, we first develop realtime configuration selection based on the tabular Q-learning (tabular-Q), the Linear Approximation based Q-learning (LAQ), and the Deep Neural Network based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via Actions Aggregation (AA-LA-Q and AADQN) and via Cooperative Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solvethe problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.",

keywords = "Narrowband Internet of Things, cooperative learning, real-time optimization, reinforcement learning, resource configuration",

author = "Nan Jiang and Yansha Deng and Arumugam Nallanathan and Chambers, {Jonathon A.}",

year = "2019",

month = jun,

doi = "10.1109/JSAC.2019.2904366",

language = "English",

volume = "37",

pages = "1424--1440",

journal = "IEEE Journal on Selected Areas in Communications",

issn = "0733-8716",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

AU - Jiang, Nan

AU - Deng, Yansha

AU - Nallanathan, Arumugam

AU - Chambers, Jonathon A.

PY - 2019/6

Y1 - 2019/6

N2 - NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies theamount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in “how to determine the configuration that maximizes the longterm average number of served IoT devices at each Transmission Time Interval (TTI) in an online fashion”. Given the complexityof searching for optimal configuration, we first develop realtime configuration selection based on the tabular Q-learning (tabular-Q), the Linear Approximation based Q-learning (LAQ), and the Deep Neural Network based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via Actions Aggregation (AA-LA-Q and AADQN) and via Cooperative Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solvethe problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.

AB - NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies theamount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in “how to determine the configuration that maximizes the longterm average number of served IoT devices at each Transmission Time Interval (TTI) in an online fashion”. Given the complexityof searching for optimal configuration, we first develop realtime configuration selection based on the tabular Q-learning (tabular-Q), the Linear Approximation based Q-learning (LAQ), and the Deep Neural Network based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via Actions Aggregation (AA-LA-Q and AADQN) and via Cooperative Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solvethe problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.

KW - Narrowband Internet of Things

KW - cooperative learning

KW - real-time optimization

KW - reinforcement learning

KW - resource configuration

UR - http://www.scopus.com/inward/record.url?scp=85065891533&partnerID=8YFLogxK

U2 - 10.1109/JSAC.2019.2904366

DO - 10.1109/JSAC.2019.2904366

M3 - Article

SN - 0733-8716

VL - 37

SP - 1424

EP - 1440

JO - IEEE Journal on Selected Areas in Communications

JF - IEEE Journal on Selected Areas in Communications

IS - 6

M1 - 8664581

ER -

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this