This page provides code and data to help reproduce some of the results presented in the publications related to the projet.
Unless otherwise stated, all data is made available under the Open Licence / License Ouverte 2.0.
BP-RNN diversity decoder for short LDPC codes
The parity-check matrices as well as the absorbing sets for the codes presented in the IEEE TCOM 2022 journal paper “Decoding Short LDPC Codes via BP-RNN Diversity and Reliability-Based Post-Processing”, by J. Rosseel, V. Mannoni, I. Fijalkow and V. Savin can be downloaded here (7zip archive).
Optimized datasets for syndrome-based neural decoders training
Here are some of the datasets used to produce the results presented in the ICMLCN 2025 paper “Doing More With Less: Towards More Data-Efficient Syndrome-Based Neural Decoders“. Owing to storage limitations, we can only provide selected datasets for the (31,21,5) and (63,45,7) BCH codes. Please contact us in case you need other training data for your own experiments.
Dataset files
Datasets for the (31,21,5) BCH code (all generated at Eb/N0=3dB):
- 16M MLD error patterns collected from standard Monte-Carlo simulation (sampling method 2)
- 4M MLD error patterns obtained by importance sampling simulation (sampling method 3)
- 4M MLD error patterns collected from standard Monte-Carlo simulation with the additional constraint that non-zero syndrome values have uniform distribution in the dataset (sampling method 4)
These 3 datasets have been used to obtain the results shown in Fig. 5 of the paper. Only the first 4M samples of the first dataset were used to produce the “ML EP (method 1)” curve in this figure.
Datasets for the (63,45,7) BCH code (all generated at Eb/N0=2dB):
- 64M MLD error patterns collected from standard Monte-Carlo simulation (sampling method 1)
- 32M MLD error patterns obtained by importance sampling simulation (sampling method 3)
- 32M MLD error patterns collected from standard Monte-Carlo simulation with the additional constraint that non-zero syndrome values have uniform distribution in the dataset (sampling method 4)
These 3 datasets have been used to produce some of the best performance reported in Fig. 8 of the paper. Note that we provide only 64M training examples instead of 100M for the first dataset due to storage limitation, but the performance is nearly equivalent.
Dataset format
Each dataset is provided as a MATLAB MAT-file, with the following content:
Variable name | Shape | type | description |
---|---|---|---|
N | 1 | double | Code length |
K | 1 | double | Code dimension |
G | (K, N) | double | Generator matrix |
H | (N-K, N) | double | Parity-check matrix |
ebN0dB | 1 | double | Signal-to-noise ratio per information bit (dB) |
y | (L, N) | double | Noisy observations of the all-zero codeword |
e | (L, N) | double | Corresponding Maximum-Likelihood Decoder (MLD) decisions on the error pattern that occured |
The noisy observations y
have been generated by simulating BPSK transmission of the all-zero codeword (all-one BSPK symbol vector) over an AWGN channel with normalized signal-to-noise ratio per information bit ebN0dB
.
The decoder decisions e
on the most-likely error patterns have been obtained by passing the received words y
through an Ordered-Statistics Decoder. Note that e
is a binary error pattern.
A training example is a pair [y(i,:), e(i,:)]
. The dataset provides L
such examples Please refer to the paper for more details about the various sampling methods used to generate the training data.
Here is a function to load one of the datasets into Pytorch. It takes as input the dataset filename mat_file
and returns the training examples as a tuple of two torch.float32
tensors (y, e)
:
import torch, h5py, scipy.io, numpy as np
from torch import Tensor
def load_matlab_data(mat_file: str) -> tuple[Tensor, Tensor]:
"""Load received words y and target error patterns e from a MATLAB .mat file."""
try:
# v7 mat files or earlier are supported by scipy.io
matlab_data = scipy.io.loadmat(mat_file, squeeze_me=True)
y = torch.tensor(matlab_data["y"], dtype=torch.float32)
e = torch.tensor(matlab_data["e"], dtype=torch.float32)
except NotImplementedError:
# but not v7.3 (=HDF5) mat files, for which we need h5py
f = h5py.File(mat_file,'r')
y = torch.from_numpy(f["y"][:].astype(np.float32).transpose())
e = torch.from_numpy(f["e"][:].astype(np.float32).transpose())
return y, e
As a final note, we only provide datasets of MLD error patterns, but each of them can be turned into a dataset of true error patterns by simply taking the bit-by-bit hard-decision on the received words y
as new training targets e
.