Software – AI4CODE

This page provides code and data to help reproduce some of the results presented in the publications related to the projet.

Unless otherwise stated, all data is made available under the Open Licence / License Ouverte 2.0.

BP-RNN diversity decoder for short LDPC codes

The parity-check matrices as well as the absorbing sets for the codes presented in the IEEE TCOM 2022 journal paper “Decoding Short LDPC Codes via BP-RNN Diversity and Reliability-Based Post-Processing”, by J. Rosseel, V. Mannoni, I. Fijalkow and V. Savin can be downloaded here (7zip archive).

Optimized datasets for syndrome-based neural decoders training

Here are some of the datasets used to produce the results presented in the ICMLCN 2025 paper “Doing More With Less: Towards More Data-Efficient Syndrome-Based Neural Decoders“. Owing to storage limitations, we can only provide selected datasets for the (31,21,5) and (63,45,7) BCH codes. Please contact us in case you need other training data for your own experiments.

Dataset files

Datasets for the (31,21,5) BCH code (all generated at Eb/N0=3dB):

These 3 datasets have been used to obtain the results shown in Fig. 5 of the paper. Only the first 4M samples of the first dataset were used to produce the “ML EP (method 1)” curve in this figure.

Datasets for the (63,45,7) BCH code (all generated at Eb/N0=2dB):

These 3 datasets have been used to produce some of the best performance reported in Fig. 8 of the paper. Note that we provide only 64M training examples instead of 100M for the first dataset due to storage limitation, but the performance is nearly equivalent.

Dataset format

Each dataset is provided as a MATLAB MAT-file, with the following content:

Variable name	Shape	type	description
`N`	1	double	Code length
`K`	1	double	Code dimension
`G`	(K, N)	double	Generator matrix
`H`	(N-K, N)	double	Parity-check matrix
`ebN0dB`	1	double	Signal-to-noise ratio per information bit (dB)
`y`	(L, N)	double	Noisy observations of the all-zero codeword
`e`	(L, N)	double	Corresponding Maximum-Likelihood Decoder (MLD) decisions on the error pattern that occured

The noisy observations y have been generated by simulating BPSK transmission of the all-zero codeword (all-one BSPK symbol vector) over an AWGN channel with normalized signal-to-noise ratio per information bit ebN0dB.

The decoder decisions e on the most-likely error patterns have been obtained by passing the received words ythrough an Ordered-Statistics Decoder. Note that e is a binary error pattern.

A training example is a pair [y(i,:), e(i,:)]. The dataset provides L such examples Please refer to the paper for more details about the various sampling methods used to generate the training data.

Here is a function to load one of the datasets into Pytorch. It takes as input the dataset filename mat_file and returns the training examples as a tuple of two torch.float32 tensors (y, e):

import torch, h5py, scipy.io, numpy as np
from torch import Tensor

def load_matlab_data(mat_file: str) -> tuple[Tensor, Tensor]:
    """Load received words y and target error patterns e from a MATLAB .mat file."""
    try:
        # v7 mat files or earlier are supported by scipy.io
        matlab_data = scipy.io.loadmat(mat_file, squeeze_me=True)
        y = torch.tensor(matlab_data["y"], dtype=torch.float32)
        e = torch.tensor(matlab_data["e"], dtype=torch.float32)
    except NotImplementedError:
        # but not v7.3 (=HDF5) mat files, for which we need h5py
        f = h5py.File(mat_file,'r')
        y = torch.from_numpy(f["y"][:].astype(np.float32).transpose())
        e = torch.from_numpy(f["e"][:].astype(np.float32).transpose())
    return y, e

As a final note, we only provide datasets of MLD error patterns, but each of them can be turned into a dataset of true error patterns by simply taking the bit-by-bit hard-decision on the received words y as new training targets e.