In modern high-speed serial communications (PCIe 6.0, Terabit Ethernet, etc), pam-4 (pulse amplitude modulation 4-level) signaling is frequently used. pam-4 encodes two bits of data using four different voltage levels. Compared to conventional NRZ (non-return-to-zero) encoding, which employs two voltage levels to represent one bit of information, pam-4 is a more effective technique to convey data. However, not every pam-4 sequence is equally easy to reconstruct at the receiver, and some sequences are more error-prone than others. Traditionally, worst-case sequences are identified analytically or numerically in terms of susceptibility to certain impairments, like jitter or intersymbol interference, but such methods do not provide reliable prediction of the errors at the receiver. An alternative is the data-driven identification of the error-prone sequences, and in this paper, we utilize RL (reinforcement learning) for this task. In total, we compare six different RL algorithms. The utilized algorithms are Q-Learning, MAB (Multi-Armed Bandit), MCTS (Monte Carlo Tree Search), DQN (Deep Q Learning), A2C (Advantage Actor Critic), and PPO (Proximal Policy Optimization). Almost all algorithms have their limits to the length of the sequences they can learn. However, our experiments show that higher scalability leads to high memory requirements, especially in the case of MCTS and MAB. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.