Channel Coding

Channel coding, also known as forward error correction (FEC), represents one of the most important techniques in digital communication systems for protecting transmitted data against errors introduced by the communication channel. By systematically adding redundant information to the original data before transmission, channel coding enables receivers to detect and correct errors without requiring retransmission, making it indispensable for applications where retransmission is impractical or impossible.

The theoretical foundation for channel coding was established by Claude Shannon in his groundbreaking 1948 paper, which proved that reliable communication is possible over noisy channels provided the data rate remains below the channel capacity. Shannon's work defined the ultimate limits of what is achievable but did not specify how to approach these limits. The subsequent decades have witnessed remarkable progress in coding theory, with modern codes coming within a fraction of a decibel of Shannon's theoretical bounds.

Block Codes

Block codes operate by dividing the input data into fixed-size blocks and encoding each block independently into a longer codeword containing both information and parity symbols. The fundamental parameters of a block code are typically expressed as (n, k), where k represents the number of information symbols and n represents the total codeword length. The code rate R = k/n indicates what fraction of the transmitted symbols carry actual information.

Linear block codes form the most important class, characterized by the property that any linear combination of valid codewords produces another valid codeword. This algebraic structure enables efficient encoding using generator matrices and syndrome-based decoding using parity-check matrices. The minimum distance of a code, defined as the smallest Hamming distance between any two distinct codewords, determines its error-correcting capability: a code with minimum distance d can correct up to floor((d-1)/2) errors.

Hamming Codes

Hamming codes represent the earliest practical error-correcting codes, developed by Richard Hamming at Bell Labs in 1950. These codes achieve single-error correction with minimal redundancy, making them particularly efficient for applications where errors are rare and independent. A (7,4) Hamming code, for example, encodes 4 data bits into 7 transmitted bits, providing the ability to correct any single-bit error.

The construction of Hamming codes places parity bits at positions that are powers of 2 (positions 1, 2, 4, 8, etc.), with each parity bit covering a specific pattern of data positions determined by the binary representation of position indices. This elegant structure enables simple syndrome calculation, where the syndrome directly indicates the position of any single error. Extended Hamming codes add an overall parity bit to provide single-error correction with double-error detection (SECDED), widely used in computer memory systems.

Reed-Solomon Codes

Reed-Solomon codes are powerful non-binary block codes that operate on symbols rather than individual bits, making them particularly effective against burst errors that corrupt multiple consecutive bits. An RS(n,k) code over a field with q elements can correct up to t = (n-k)/2 symbol errors, regardless of how many bits within each symbol are corrupted. This burst-error correction capability makes Reed-Solomon codes essential for storage media and communication systems.

The mathematical foundation of Reed-Solomon codes lies in polynomial algebra over finite fields (Galois fields). Codewords are constructed as evaluations of polynomials of degree less than k at n distinct points in the field. Decoding involves sophisticated algorithms such as the Berlekamp-Massey algorithm for finding the error locator polynomial, followed by Chien search for error location and Forney's algorithm for error value calculation. Despite their computational complexity, hardware implementations achieve the throughput required for applications including optical storage (CD, DVD, Blu-ray), deep-space communication, and QR codes.

BCH Codes

Bose-Chaudhuri-Hocquenghem (BCH) codes form a large class of powerful cyclic codes with precisely designed error-correcting capabilities. Unlike Reed-Solomon codes that operate on multi-bit symbols, BCH codes work with binary data while still providing strong multiple-error correction. A t-error-correcting BCH code can guarantee correction of any pattern of t or fewer bit errors within a codeword.

BCH codes are constructed using roots from extension fields of the binary field, with the generator polynomial being the least common multiple of minimal polynomials of consecutive powers of a primitive element. This construction ensures the required minimum distance while keeping the encoder and decoder complexity manageable. BCH codes find application in flash memory error correction, magnetic recording, and various communication standards where moderate error correction with reasonable complexity is required.

Convolutional Codes

Unlike block codes that process data in independent chunks, convolutional codes encode a continuous stream of data where each output symbol depends on both current and previous input symbols. The code is characterized by its rate (k/n, where k input bits produce n output bits), constraint length (the number of memory stages), and the generator polynomials that define how input bits combine to form outputs.

The encoding process can be visualized as a linear shift register with multiple modulo-2 adders computing different linear combinations of the register contents. The resulting code structure creates complex dependencies between transmitted symbols, which the decoder exploits to identify and correct errors. A code with constraint length K has 2^(K-1) possible encoder states, with transitions between states determined by input bits.

Trellis Representation

The trellis diagram provides a powerful graphical representation of convolutional code behavior over time. Each column of the trellis represents the possible encoder states at a particular time instant, while branches connecting states represent transitions caused by input bits, labeled with the corresponding output symbols. This representation reveals the code structure and forms the basis for optimal decoding algorithms.

The trellis structure shows that valid code sequences correspond to paths through the trellis starting and ending at specified states. The number of paths grows exponentially with sequence length, but the regular structure of the trellis enables efficient algorithms that avoid exhaustive enumeration. The minimum free distance of the code, defined as the minimum Hamming weight of any non-zero path that diverges from and later remerges with the all-zeros path, determines asymptotic error-correcting performance.

Viterbi Decoding

The Viterbi algorithm, developed by Andrew Viterbi in 1967, provides the optimal maximum-likelihood decoding solution for convolutional codes with complexity that grows only linearly with sequence length. The algorithm works by maintaining the best (minimum metric) path to each trellis state at every time step, comparing candidate paths that merge at each state and retaining only the survivor.

At each trellis stage, the decoder computes branch metrics measuring the similarity between received symbols and the symbols expected for each possible transition. Path metrics accumulate branch metrics along paths, with the Viterbi algorithm comparing only the paths entering each state and discarding all but the best. After processing the entire received sequence, traceback through stored decisions recovers the most likely transmitted sequence. Hardware Viterbi decoders achieve high throughput through parallel processing of state metrics and pipelined architectures.

Sequential Decoding

For convolutional codes with large constraint lengths where Viterbi decoding becomes impractical due to the exponential growth in states, sequential decoding algorithms offer an alternative that explores only promising paths through the code trellis. The Fano algorithm and stack algorithm are the primary sequential decoding approaches, each using different strategies to search for the maximum-likelihood path.

Sequential decoders work by extending a single path through the trellis, backing up and trying alternatives when the accumulated metric suggests the current path is unlikely to be correct. The variable computational effort depends on the received signal quality, with good signals requiring minimal backtracking while noisy signals may require extensive search. This characteristic makes sequential decoding attractive for channels where high-quality reception is common but occasional deep fades or interference bursts occur.

Turbo Codes

Turbo codes, introduced by Berrou, Glavieux, and Thitimajshima in 1993, represented a breakthrough in approaching Shannon's channel capacity limits. The key innovations were the parallel concatenation of simple convolutional codes through an interleaver and the iterative decoding algorithm that exchanges soft information between component decoders. Turbo codes achieve performance within 0.5 dB of the Shannon limit for long block lengths.

A classic turbo encoder consists of two identical recursive systematic convolutional (RSC) encoders connected in parallel, with the same information bits fed to both encoders but in different orders. The first encoder receives bits in their natural order, while the second encoder receives the same bits permuted by an interleaver. The transmitted codeword consists of the systematic (information) bits plus parity bits from both encoders, with the rate adjusted through puncturing.

Iterative Decoding

Turbo decoding employs the iterative exchange of soft information between two SISO (Soft-Input Soft-Output) decoders, typically implemented using the BCJR algorithm or its logarithmic variants. Each component decoder processes the received systematic bits plus its corresponding parity bits, along with extrinsic information from the other decoder representing prior knowledge about the information bits.

The decoding process begins with the first decoder computing log-likelihood ratios (LLRs) for each information bit based on the channel observations. The extrinsic portion of these LLRs (the new information contributed by the code structure) is interleaved and passed to the second decoder as prior information. The second decoder combines this prior with its own channel observations to produce refined LLRs, whose extrinsic portion is de-interleaved and fed back to the first decoder. This iterative process typically converges within 6-10 iterations.

Interleaver Design

The interleaver plays a crucial role in turbo code performance by ensuring that bits that are close together (and thus have correlated reliability information) in one decoder's trellis are well separated in the other decoder's trellis. This spread of information enables the iterative process to effectively combine independent observations, breaking correlation between decoder outputs.

Various interleaver designs have been developed, from simple random interleavers to structured designs optimized for specific block lengths and code parameters. S-random interleavers ensure a minimum separation between any two bits that were originally within S positions of each other. Algebraic interleavers based on permutation polynomials offer both good performance and practical advantages for implementation. The interleaver design becomes particularly important for shorter block lengths where random interleavers may not provide sufficient spreading.

LDPC Codes

Low-Density Parity-Check (LDPC) codes, originally invented by Robert Gallager in 1962 but largely forgotten until their rediscovery in the 1990s, now rival and in some cases surpass turbo codes in performance while offering implementation advantages for very high-speed applications. The defining characteristic is a sparse parity-check matrix where most entries are zero, enabling efficient iterative decoding based on message passing on a bipartite graph.

An LDPC code is defined by its parity-check matrix H, where each row specifies a parity constraint that valid codewords must satisfy. The sparsity requirement means each row contains only a small number of ones (typically 3-10), regardless of the code length. This sparsity directly enables the iterative decoding algorithm and determines both the code's error-correcting capability and the complexity of the decoder.

Tanner Graph Representation

The Tanner graph provides a graphical representation of LDPC codes as a bipartite graph with two types of nodes: variable nodes representing codeword bits and check nodes representing parity constraints. An edge connects variable node j to check node i if and only if the (i,j) entry of the parity-check matrix is 1. The sparsity of the parity-check matrix translates to a sparse Tanner graph where each node connects to only a few neighbors.

The graph structure reveals important code properties. The girth (length of the shortest cycle) affects decoding performance, with larger girth generally beneficial for iterative decoding. Degree distributions, specifying the fraction of nodes with each number of connections, strongly influence the threshold and error floor behavior. Irregular LDPC codes with carefully optimized non-uniform degree distributions achieve performance closer to the Shannon limit than regular codes with uniform degrees.

Belief Propagation Decoding

The standard decoding algorithm for LDPC codes is belief propagation (also called sum-product or message-passing), which operates by exchanging probability or log-likelihood ratio messages along the edges of the Tanner graph. Variable nodes combine messages from all connected check nodes to update their bit probability estimates, while check nodes combine messages from all connected variable nodes to update their constraint satisfaction estimates.

In each iteration, variable-to-check messages represent the probability that a bit has a particular value based on the channel observation and incoming check messages (excluding the destination check). Check-to-variable messages represent the probability that a parity check is satisfied given the incoming variable messages (excluding the destination variable). After sufficient iterations, final bit decisions are made based on the accumulated evidence at each variable node. The min-sum algorithm provides a simplified approximation suitable for hardware implementation with only minor performance degradation.

Code Construction

The construction of good LDPC codes requires careful attention to both the degree distribution and the specific placement of ones in the parity-check matrix. Density evolution analysis predicts the theoretical threshold (minimum signal-to-noise ratio for successful decoding) for a given degree distribution, enabling optimization of degree profiles for specific channel conditions.

Practical code construction must also consider implementation constraints, encoding complexity, and error floor behavior. Quasi-cyclic LDPC codes construct the parity-check matrix from circulant submatrices, enabling efficient encoder and decoder implementations through simple shift registers. Protograph-based designs start from a small base graph that is replicated and permuted to create the full code, providing both good performance and structured implementations. Many modern standards including 5G NR, Wi-Fi 6, and DVB-S2 specify quasi-cyclic LDPC codes with carefully optimized structures.

Polar Codes

Polar codes, introduced by Erdal Arikan in 2009, represent the first provably capacity-achieving codes with explicit construction and polynomial complexity for encoding and decoding. The key insight is channel polarization: through a recursive transformation, a set of independent copies of a given channel can be converted into virtual channels that are either nearly perfect (capacity approaching 1) or nearly useless (capacity approaching 0). Information bits are transmitted on the good channels while the bad channels carry predetermined frozen bits.

The polarization transformation applies a simple 2x2 kernel matrix recursively to combine pairs of channels. After n stages of recursion (creating 2^n channels from the original), the resulting synthetic channels exhibit extreme polarization. The fraction of good channels approaches the original channel capacity as n increases, enabling capacity-achieving performance. The explicit and deterministic nature of polar code construction contrasts with the probabilistic designs required for good LDPC and turbo codes.

Successive Cancellation Decoding

The basic decoding algorithm for polar codes is successive cancellation (SC), which decodes bits one at a time in a specific order, using each decoded bit as prior information for subsequent decisions. The decoder computes the likelihood ratio for each bit given the channel observations and all previously decoded bits, making hard decisions as it progresses through the bit positions.

The SC decoder can be efficiently implemented with complexity O(N log N) using a butterfly structure similar to the FFT. However, finite-length performance of basic SC decoding falls short of other modern codes, motivating enhanced decoding algorithms. SC List (SCL) decoding maintains multiple candidate paths through the decoding tree, selecting the best at the end, significantly improving performance at moderate complexity increase. CRC-aided SCL decoding adds a short CRC to help select the correct path, achieving performance competitive with LDPC codes.

Polar Code Construction

Constructing a polar code requires selecting which bit positions carry information and which are frozen. This selection depends on the channel characteristics and target operating point. Various methods exist for ranking channel reliabilities, from Monte Carlo simulation and density evolution to Gaussian approximation and more recent learning-based approaches.

For practical applications, the frozen bit positions are typically determined offline based on the target channel and stored in a table. More sophisticated approaches adapt the code construction to varying channel conditions or design universal codes that perform well across a range of channels. The flexibility in choosing frozen bit positions also enables rate-compatible designs where a family of codes at different rates shares a common structure, facilitating hybrid ARQ and adaptive modulation systems.

Interleaving

Interleaving is a technique that reorders transmitted symbols to spread burst errors across multiple codewords, converting error bursts that would overwhelm a single codeword into isolated errors distributed among many codewords where they can be individually corrected. Interleaving adds latency and memory requirements but is essential for channels with bursty error characteristics such as wireless fading channels and optical storage media.

The basic concept involves writing data into a memory array by rows and reading it out by columns (or vice versa). When burst errors corrupt consecutive transmitted symbols, de-interleaving at the receiver distributes these errors across different codewords. For an interleaver of depth D (the number of rows), a burst of up to D errors is spread so that each codeword receives at most one error, well within the correction capability of even simple codes.

Block Interleaving

Block interleavers organize data in a two-dimensional array, writing symbols into the array in one order and reading them out in a different order. The simplest form writes data by rows and reads by columns, so that consecutive symbols in the input stream end up separated by the number of columns in the output stream. The interleaving depth equals the number of rows, determining the maximum burst length that can be fully dispersed.

Block interleavers introduce a fixed delay equal to the array size, which must be traversed for both interleaving and de-interleaving. This delay is acceptable for many applications but may be problematic for interactive communications. The memory requirement is straightforward, needing storage for the complete interleaver array at both transmitter and receiver. Various optimizations reduce memory by clever addressing schemes that avoid the need to physically rearrange data.

Convolutional Interleaving

Convolutional interleavers offer reduced latency and memory requirements compared to block interleavers of equivalent burst-spreading capability. Instead of a rectangular array, a convolutional interleaver uses a set of shift registers with progressively increasing lengths. Symbols cycle through the registers sequentially, with each register introducing a different delay that effectively spreads the symbol stream.

A convolutional interleaver with I branches and M-symbol basic delay provides similar burst-spreading to a block interleaver of dimensions I by M, but with average delay of only (I-1)M/2 symbols instead of IM symbols. The memory requirement is also approximately halved. The Ramsey interleaver, commonly used in broadcasting standards, represents a specific design where the shift register lengths increase linearly. Convolutional interleavers are standard in many communication systems including DVB and DAB broadcasting.

Puncturing

Puncturing provides a simple method to create a family of codes at different rates from a single mother code by selectively deleting (not transmitting) certain parity symbols. The receiver, knowing the puncturing pattern, inserts erasures at the punctured positions before decoding. This approach enables rate-compatible coding schemes where a single encoder and decoder design supports multiple code rates, simplifying implementation of adaptive systems.

The puncturing pattern specifies which parity symbols are transmitted and which are discarded. Well-designed puncturing patterns remove symbols that contribute least to error correction capability, maintaining performance as close as possible to a purpose-built code at the punctured rate. Poor puncturing choices can dramatically degrade performance by eliminating critical redundancy.

Rate-Compatible Punctured Codes

Rate-compatible punctured codes (RCPC) form a nested family where lower-rate codes contain all the parity symbols of higher-rate codes plus additional symbols. This nesting enables efficient incremental redundancy (IR) retransmission schemes: the transmitter initially sends a high-rate codeword, and if decoding fails, transmits additional parity symbols that combine with the original transmission to form a more powerful lower-rate code.

The rate-compatible structure is particularly valuable for hybrid automatic repeat request (HARQ) systems that combine error correction with retransmission. Type-II HARQ with incremental redundancy provides excellent throughput by adapting the effective code rate to current channel conditions without prior knowledge of the channel state. The receiver accumulates multiple transmissions, with each increment adding error correction capability until successful decoding occurs.

Puncturing Pattern Design

Designing effective puncturing patterns requires balancing multiple considerations including achievable free distance at each rate, error coefficient (the number of minimum-weight error patterns), implementation complexity, and rate-compatible nesting requirements. Exhaustive search is feasible only for small constraint length codes; larger codes require heuristic optimization techniques.

For convolutional codes, the puncturing period typically matches the encoder period, with the pattern repeating regularly. The effective free distance after puncturing should be maximized for each target rate, subject to nesting constraints. For turbo and LDPC codes, puncturing can be applied to systematic bits, parity bits, or both, with different implications for performance and decoder complexity. Standards typically specify optimized puncturing patterns derived through extensive simulation and analysis.

Code Selection and Applications

Selecting the appropriate channel coding scheme for a specific application requires balancing multiple factors including required error rate, available bandwidth, latency constraints, implementation complexity, and power consumption. Each code family offers different trade-offs, and the optimal choice depends heavily on the application context.

For memory and storage applications, Reed-Solomon and BCH codes remain dominant due to their deterministic error correction capability and mature implementation technology. Communication systems increasingly employ LDPC codes (in Wi-Fi 6, 5G NR, and satellite communications) and polar codes (in 5G control channels) for their near-capacity performance. Turbo codes continue in applications including LTE and deep-space communications where their proven performance justifies the decoder complexity.

Implementation technology plays a crucial role in code selection. High-throughput applications favor codes with parallel decoder architectures, such as LDPC codes with their inherently parallel message-passing algorithm. Low-power applications may prefer simpler codes or accept performance trade-offs to reduce computational requirements. Emerging applications in machine-type communications and IoT motivate continued research into codes optimized for very short block lengths and sporadic transmission patterns.

Summary

Channel coding provides essential protection for digital data transmission, enabling reliable communication over imperfect channels by adding structured redundancy that receivers can exploit for error detection and correction. The field has progressed remarkably from early Hamming and convolutional codes through Reed-Solomon codes to modern capacity-approaching schemes including turbo codes, LDPC codes, and polar codes.

Understanding channel coding requires familiarity with both the mathematical foundations of coding theory and the practical considerations of encoder and decoder implementation. The choice among available codes involves trade-offs between performance, complexity, latency, and power that depend on specific application requirements. As data rates continue to increase and new applications emerge, channel coding remains an active area of research and development, with ongoing work on short-block codes, joint source-channel coding, and codes for novel channel models.