Finite Field (Galois) Arithmetic

A finite field, also called a Galois field, is a set with finitely many elements on which addition, subtraction, multiplication, and division (excluding division by zero) are defined and obey the usual algebraic laws. Finite fields exist only when the number of elements is a prime power, written as q = p^m for a prime p and a positive integer m. Digital hardware works almost exclusively with the case p = 2, because the field GF(2) consists of the two values 0 and 1, and its arithmetic maps directly onto Boolean logic. The extension fields GF(2^m) built on top of GF(2) underpin much of modern error-control coding and cryptography.

The appeal of GF(2^m) for hardware is that addition reduces to the bitwise exclusive-OR (XOR) of m-bit values and therefore requires no carry propagation, while multiplication, though more involved, decomposes into shifts and XOR operations that map cleanly onto gates and registers. This article develops the representation of field elements, the core arithmetic operations, the role of irreducible and primitive polynomials, the table-based methods that accelerate multiplication and inversion, and the hardware structures that implement these operations in serial and parallel form. It closes with the two application domains that motivate most practical interest in the subject.

Field Fundamentals and GF(2^m) Representation

The simplest finite field, GF(2), contains only the elements 0 and 1. Addition is computed modulo 2, which makes it identical to the XOR operation, and multiplication is identical to the logical AND operation. Because 1 + 1 = 0 in this field, every element is its own additive inverse, and subtraction is therefore the same operation as addition. These properties carry upward into every extension field built from GF(2).

Polynomial Representation

An element of GF(2^m) is represented as a polynomial of degree less than m whose coefficients are drawn from GF(2). A polynomial such as x^3 + x + 1 therefore corresponds to the coefficient vector (1, 0, 1, 1), which is stored in hardware as the m-bit word 1011. There are exactly 2^m such polynomials, matching the field's order. This correspondence between polynomials and bit vectors is the foundation of every hardware implementation: a register holding m bits holds one field element, and the bit positions correspond to ascending powers of x.

Degree bound: Valid elements have degree at most m - 1, so an m-bit register represents every element exactly once.
Coefficient field: Each coefficient is a single bit, an element of GF(2).
Zero and one: The additive identity is the all-zero word, and the multiplicative identity is the polynomial 1.
Bit ordering: A consistent convention must be fixed so that hardware and software agree on which bit carries which power of x; the common choice maps the least significant bit to the constant term x^0 and the most significant bit to x^(m-1).

Alternative Bases

The polynomial representation described above uses the standard, or polynomial, basis, in which the basis elements are the powers 1, x, x^2, and so on up to x^(m-1). This basis is the most common because multiplication and reduction are straightforward. Two other bases appear in specialized designs and offer different trade-offs.

Normal basis: Uses the set of successive squares of a single element, so squaring becomes a simple cyclic shift of the coefficient word, which benefits exponentiation-heavy computations such as inversion.
Dual basis: Defined relative to a trace function; it simplifies certain serial multiplier structures such as the Berlekamp multiplier.
Basis conversion: Changing basis is a linear transformation over GF(2) and can be realized as a fixed matrix of XOR gates when interoperation is required.

Addition and Subtraction

Addition in GF(2^m) adds the two polynomials coefficient by coefficient, and because each coefficient lives in GF(2), every coefficient addition is computed modulo 2. The result is the bitwise XOR of the two operand words. No carry can propagate from one coefficient position to the next, which is the single most important property distinguishing GF(2^m) arithmetic from ordinary integer arithmetic.

Subtraction is identical to addition. Since each coefficient satisfies a = -a in GF(2), the negation of any element is the element itself, and so a - b produces the same word as a + b. This identity eliminates an entire class of operations that integer datapaths must support, and it explains why field adders are nothing more than arrays of two-input XOR gates with no ripple, no lookahead, and no propagation delay beyond a single gate.

Hardware cost: An m-bit adder is m independent XOR gates operating in parallel, with latency equal to one gate delay.
No overflow: The sum of two degree-(m-1) polynomials is again of degree at most m - 1, so the result never leaves the field and never requires reduction.
Accumulation: Summing many elements is a balanced XOR tree, which parallelizes naturally and underlies the syndrome and checksum computations used in coding.

Irreducible and Primitive Polynomials

Multiplication of two degree-(m-1) polynomials can produce a polynomial of degree up to 2m - 2, which lies outside the field. To bring the product back into the field, the arithmetic is performed modulo a fixed reduction polynomial of degree m. For the result to form a valid field, this reduction polynomial must be irreducible over GF(2), meaning that it cannot be factored into the product of two lower-degree polynomials with coefficients in GF(2). An irreducible polynomial plays the same role that a prime modulus plays in modular integer arithmetic.

Choosing the Reduction Polynomial

For any given m there are several irreducible polynomials, and the choice affects the cost of reduction hardware. Designers prefer polynomials with few nonzero terms because each nonzero term in the reduction polynomial contributes XOR gates to the reduction network.

Trinomials: Polynomials of the form x^m + x^k + 1 minimize reduction hardware and exist for many but not all values of m.
Pentanomials: Polynomials with five terms are used when no suitable trinomial exists for a given degree.
Standardized choices: The Advanced Encryption Standard fixes GF(2^8) with the polynomial x^8 + x^4 + x^3 + x + 1, while Galois/Counter Mode operates in GF(2^128) with a fixed pentanomial; standardizing the reduction polynomial is what guarantees interoperability between independent implementations.

Primitivity and Field Generators

A primitive polynomial is an irreducible polynomial whose root generates the entire multiplicative group of the field. The multiplicative group of GF(2^m) contains the 2^m - 1 nonzero elements, and when the root, conventionally denoted by the symbol alpha, is a generator, every nonzero element can be written as a power of alpha. The successive powers alpha^0, alpha^1, alpha^2, and so on cycle through all nonzero elements before returning to alpha^0.

Generator element: When the reduction polynomial is primitive, the element alpha = x is a generator of the multiplicative group.
Cyclic structure: The powers of alpha repeat with period 2^m - 1, which is precisely the structure exploited by the log and antilog tables described below.
Maximal-length sequences: The same primitivity condition that makes alpha a generator makes a linear feedback shift register with the corresponding feedback taps produce a maximal-length sequence.

Multiplication and Reduction

Field multiplication consists of two conceptual stages: an ordinary polynomial product over GF(2), followed by reduction modulo the irreducible polynomial. The polynomial product is a carry-free convolution of the coefficient vectors, computed with AND gates for the partial products and XOR gates for the accumulation. The reduction step then folds the high-degree terms back into the field using the relation that sets the reduction polynomial equal to zero.

Shift-and-XOR Multiplication

The bit-serial multiplication algorithm mirrors the shift-and-add method for integers but replaces addition with XOR and discards carries. The algorithm scans the bits of one operand, conditionally accumulating shifted copies of the other operand, and reduces after each shift whenever the running result reaches degree m.

Conditional accumulation: For each bit of the multiplier that equals one, the current shifted multiplicand is XORed into the product.
Shift and reduce: After each step the multiplicand is shifted left by one position; if the shift sets the bit at position m, the reduction polynomial is XORed in to bring the value back into range.
Latency: The serial algorithm completes in m iterations, trading throughput for a very small circuit.

The xtime Operation

Multiplication by the element x, often called the xtime operation in the context of the Advanced Encryption Standard, is the elementary building block of field multiplication. It shifts the coefficient word left by one bit and, if the shifted-out bit was set, XORs in the reduction polynomial. Repeated application of xtime, combined with XOR accumulation, expresses multiplication by any constant.

Single-step reduction: Multiplying by x can raise the degree by at most one, so at most one conditional reduction is required per step.
Constant decomposition: A constant multiplier such as multiplication by the element 0x03 is computed as xtime of the operand XORed with the operand itself.
Reuse: Because xtime is small and fast, iterated xtime underlies both the MixColumns transform of block ciphers and the syndrome evaluation of cyclic codes.

Logarithm and Antilogarithm Tables

For small fields such as GF(2^8), multiplication and division can be reduced to addition and subtraction of exponents by precomputing two lookup tables. Because every nonzero element equals a power of the generator alpha, an element b can be associated with the exponent i for which b = alpha^i. The function that returns this exponent is the discrete logarithm, and its inverse, which returns alpha^i given i, is the antilogarithm or exponential table.

Table Construction

The antilog table is built by starting from alpha^0 = 1 and repeatedly applying the xtime operation, which advances from alpha^i to alpha^(i+1). Each value is stored at its exponent index. The log table is then the inverse mapping, populated by inverting each entry of the antilog table.

Antilog table: Maps an exponent i to the field element alpha^i; for GF(2^8) it holds 255 nonzero entries indexed by exponent.
Log table: Maps a nonzero field element to its exponent; the zero element has no logarithm and is handled as a special case.
Period wraparound: Exponents are taken modulo 2^m - 1, so the sum of two exponents that exceeds the period wraps around to the start of the cycle.

Multiplication, Division, and Inversion by Table

With the two tables in place, multiplication of nonzero elements a and b is computed as antilog of the sum of their logarithms, taken modulo 2^m - 1. Division subtracts the logarithms, and the multiplicative inverse of an element is the antilog of the negation of its logarithm. These reductions are why table-based field arithmetic dominates software implementations of Reed-Solomon coding.

Multiplication: a times b equals antilog((log a + log b) mod (2^m - 1)), with a zero operand short-circuited to a zero result.
Division: a divided by b equals antilog((log a - log b) mod (2^m - 1)).
Inversion: The inverse of b equals antilog((2^m - 1 - log b)), giving a constant-time inverse for small fields.
Memory cost: The tables grow as 2^m entries and become impractical beyond roughly GF(2^16), at which point algorithmic methods are preferred.

Inversion and Division

Division is multiplication by a multiplicative inverse, so the central problem is computing the inverse of a nonzero field element. For small fields the antilog method or a direct lookup table suffices, but for the large fields used in elliptic-curve cryptography, inversion must be computed algorithmically because storing a full table is infeasible.

Inversion Algorithms

Two families of algorithms dominate. The extended Euclidean algorithm computes inverses directly through polynomial greatest-common-divisor steps, while exponentiation-based methods exploit the group structure to express the inverse as a power of the element.

Extended Euclidean algorithm: Computes the inverse through iterated polynomial division and is well suited to variable field sizes.
Fermat-based inversion: In GF(2^m), every nonzero element b satisfies b raised to the power (2^m - 1) equals 1, so the inverse equals b raised to the power (2^m - 2), computed by repeated squaring and multiplication.
Itoh-Tsujii algorithm: Reorganizes the Fermat exponentiation into an addition chain that drastically reduces the number of full multiplications, leaving mostly inexpensive squarings.

Squaring as a Special Case

Squaring is far cheaper than general multiplication in GF(2^m). Because the cross terms of a polynomial squared over GF(2) cancel in pairs, squaring a polynomial simply interleaves zeros between its coefficients, after which the result is reduced. The interleaving is pure wiring with no logic, so squaring reduces to a fixed linear reduction network and is exploited heavily by inversion algorithms that are built from chains of squarings.

Hardware Architectures

The structure of a field arithmetic unit reflects a throughput-versus-area trade-off identical in spirit to that found across digital design. Serial structures reuse a small datapath over many clock cycles, while parallel structures unroll the computation into combinational logic that completes in a single cycle.

Linear Feedback Shift Registers

The linear feedback shift register (LFSR) is the canonical sequential structure for field computation. An LFSR is a chain of flip-flops with XOR feedback taps positioned according to a feedback polynomial. When the polynomial is primitive, the register cycles through all nonzero field elements and produces a maximal-length pseudorandom sequence. LFSRs implement multiplication by alpha directly, evaluate the remainder of a polynomial division, and form the heart of cyclic-code encoders and many stream ciphers.

Feedback taps: The XOR taps correspond to the nonzero terms of the feedback polynomial.
Two configurations: The Fibonacci, or external-XOR, form places the feedback network before the register chain, while the Galois, or internal-XOR, form distributes XOR gates between stages for a shorter critical path.
Division by feedback: Loading data serially into a Galois LFSR computes the polynomial remainder, which is exactly the syndrome or check computation in coding.

Bit-Serial Multipliers

A bit-serial multiplier processes one bit of an operand per clock cycle, interleaving conditional accumulation with reduction. It minimizes silicon area at the cost of m clock cycles per multiplication, which makes it attractive for resource-constrained devices and for very large fields where a fully parallel multiplier would be prohibitively large.

Compact datapath: A single accumulation register and a small XOR network suffice.
Latency: One multiplication requires m cycles, so throughput is inversely proportional to field size.
Use cases: Preferred in compact cryptographic accelerators and embedded controllers with tight area budgets.

Bit-Parallel Multipliers

A bit-parallel multiplier computes the full product and reduction in combinational logic, delivering one result per clock cycle. The Mastrovito multiplier folds the reduction into the partial-product matrix, while Karatsuba decomposition reduces the number of subproducts at the cost of additional XOR gates and is widely used in high-throughput GF(2^128) units for authenticated encryption.

Mastrovito multiplier: Generates a reduced product matrix directly, balancing gate count against logic depth.
Karatsuba decomposition: Trades multiplications for additions, lowering the asymptotic gate count for large m.
Composite-field techniques: Representing GF(2^8) as GF((2^4)^2) shrinks the inversion logic in compact cipher implementations at the cost of conversion overhead.

Applications in Coding and Cryptography

Finite field arithmetic is not an end in itself; it is the computational substrate for two pillars of digital systems. Error-control coding uses fields to construct codes that detect and correct corruption, and cryptography uses them to build ciphers and authentication schemes whose security rests on the algebraic structure of the field.

Error-Control Coding

Reed-Solomon and BCH codes treat data symbols as elements of GF(2^m) and define codewords as evaluations of polynomials over the field. Encoding multiplies the message polynomial by a generator polynomial, and decoding computes syndromes, locates errors, and corrects them, all through field arithmetic. These codes protect optical discs, deep-space links, QR codes, and the data paths of storage and memory systems.

Reed-Solomon codes: Operate on m-bit symbols in GF(2^m) and correct bursts of symbol errors, making them ideal for media subject to clustered defects.
Cyclic redundancy checks: Compute a polynomial remainder in GF(2) using exactly the LFSR division described above.
Syndrome evaluation: Reduces to evaluating the received polynomial at successive powers of alpha, an operation tailor-made for field hardware.

Cryptographic Use

Block ciphers and message-authentication schemes lean on field arithmetic for their nonlinear and mixing layers. The Advanced Encryption Standard computes its substitution box as a multiplicative inverse in GF(2^8) followed by an affine map, and its MixColumns step is a matrix product over the same field. Galois/Counter Mode authenticates data by multiplying message blocks in GF(2^128), and elliptic-curve cryptosystems perform point arithmetic over fields that may be binary extension fields.

Substitution boxes: Built from field inversion to obtain strong nonlinearity with a compact algebraic description.
Authentication: GF(2^128) multiplication accumulates an authentication tag across the message in Galois/Counter Mode.
Elliptic curves: Binary-field curves perform point addition and doubling through chains of field multiplications, squarings, and inversions.

Summary

Finite field arithmetic provides a closed algebraic system over a fixed, finite set of values, and the binary extension fields GF(2^m) are uniquely suited to digital hardware. Field elements are polynomials stored as bit vectors, addition is carry-free XOR, and multiplication is a convolution followed by reduction modulo an irreducible polynomial. The choice of a primitive reduction polynomial endows the field with a cyclic multiplicative structure that log and antilog tables exploit to turn multiplication and division into table lookups for small fields.

For larger fields, algorithmic inversion through the extended Euclidean, Fermat, and Itoh-Tsujii methods replaces tabulation, and squaring remains inexpensive throughout because cross terms vanish over GF(2). Hardware ranges from compact bit-serial multipliers and linear feedback shift registers to high-throughput bit-parallel Mastrovito and Karatsuba structures, with composite-field representations offering a middle ground. Together these techniques make field arithmetic fast and area-efficient enough to sit in the critical path of error-control decoders and cryptographic accelerators alike, where it quietly guarantees that digital data arrives both intact and secure.

Electronics Guide