Electronics Guide

Number Systems and Codes

Digital electronics fundamentally operates on the representation and manipulation of information using discrete values. At the heart of every digital system lies the ability to encode numbers, characters, and data into patterns that electronic circuits can store, process, and transmit. Understanding number systems and codes is essential for anyone working with digital electronics, from designing arithmetic circuits to implementing communication protocols.

This article explores the various number systems used in digital electronics, methods for representing signed and floating-point numbers, character encoding standards, and specialized codes designed for error detection, efficient data transfer, and specific applications. Mastery of these concepts enables engineers to choose appropriate representations for different applications, convert between systems, and design circuits that correctly process digital information.

Positional Number Systems

Positional number systems express quantities using digits whose values depend on their position within the number. Each position represents a power of the base (also called radix), with positions to the left of the radix point representing increasing positive powers and positions to the right representing increasing negative powers.

Binary Number System (Base 2)

The binary number system forms the foundation of all digital electronics because it maps directly to the two-state nature of electronic switches. Binary uses only two digits: 0 and 1, corresponding to low and high voltage levels, off and on states, or the presence and absence of current.

In binary, each position represents a power of 2. Reading from right to left, positions represent 2^0 = 1, 2^1 = 2, 2^2 = 4, 2^3 = 8, and so on. For example, the binary number 1011 represents:

1 x 2^3 + 0 x 2^2 + 1 x 2^1 + 1 x 2^0 = 8 + 0 + 2 + 1 = 11 (decimal)

Binary arithmetic follows the same rules as decimal arithmetic but with only two digits. Addition produces a carry when the sum exceeds 1:

  • 0 + 0 = 0
  • 0 + 1 = 1
  • 1 + 0 = 1
  • 1 + 1 = 10 (0 with carry of 1)

Binary fractions work similarly, with positions to the right of the binary point representing 2^-1 = 0.5, 2^-2 = 0.25, 2^-3 = 0.125, and so forth. The binary number 0.101 equals 0.5 + 0.125 = 0.625 in decimal.

Octal Number System (Base 8)

Octal uses digits 0 through 7 and provides a compact representation of binary numbers because each octal digit corresponds exactly to three binary digits. This relationship made octal popular in early computing systems, particularly those with word lengths that were multiples of three bits.

Converting between binary and octal is straightforward: group binary digits into sets of three (starting from the radix point and working outward), then convert each group to its octal equivalent. For example:

Binary: 101 110 011 = Octal: 563

While less common in modern systems, octal still appears in Unix file permissions and some legacy applications. The Unix permission mode 755, for instance, sets read, write, and execute permissions for the owner (7 = 111 binary), and read and execute for group and others (5 = 101 binary).

Hexadecimal Number System (Base 16)

Hexadecimal has become the preferred shorthand for binary in modern computing because each hexadecimal digit maps to exactly four binary digits, matching the nibble (half-byte) structure of most computer architectures. Hexadecimal uses digits 0-9 and letters A-F to represent values 0 through 15.

Hexadecimal to Binary Conversion
Hexadecimal Binary Decimal
000000
100011
200102
300113
401004
501015
601106
701117
810008
910019
A101010
B101111
C110012
D110113
E111014
F111115

Hexadecimal notation appears throughout electronics: memory addresses, color codes in graphics (FF0000 for red), MAC addresses, and debugging output. A common notation prefixes hexadecimal numbers with "0x" (as in 0x1A3F) or suffixes with "h" (1A3Fh).

Converting an 8-bit byte to hexadecimal produces a two-digit number (00 to FF), while a 16-bit word produces four digits, and a 32-bit double word produces eight digits. This compact representation makes hexadecimal invaluable for examining memory contents and machine code.

Conversion Between Number Systems

Converting between number systems is a fundamental skill in digital electronics. Several methods exist, each suited to different situations.

Decimal to Binary: The repeated division method divides the decimal number by 2 repeatedly, recording the remainders. The binary number consists of the remainders read in reverse order. For example, converting 25:

  • 25 / 2 = 12 remainder 1
  • 12 / 2 = 6 remainder 0
  • 6 / 2 = 3 remainder 0
  • 3 / 2 = 1 remainder 1
  • 1 / 2 = 0 remainder 1

Reading remainders bottom to top: 25 decimal = 11001 binary.

Binary to Decimal: Multiply each binary digit by its positional weight (power of 2) and sum the results, as shown in the earlier example.

Binary to Hexadecimal: Group binary digits into sets of four (padding with leading zeros if necessary), then convert each group to its hexadecimal equivalent. For example, 11001 becomes 0001 1001 = 19 hexadecimal.

Decimal Fractions to Binary: The repeated multiplication method multiplies the fractional part by 2 repeatedly, recording the integer parts. For example, converting 0.625:

  • 0.625 x 2 = 1.25, integer part = 1
  • 0.25 x 2 = 0.5, integer part = 0
  • 0.5 x 2 = 1.0, integer part = 1

Reading integer parts top to bottom: 0.625 decimal = 0.101 binary.

Note that some decimal fractions cannot be represented exactly in binary, leading to rounding errors. For instance, 0.1 decimal produces an infinitely repeating binary fraction.

Signed Number Representations

Representing negative numbers in binary requires a convention that distinguishes positive from negative values. Several methods exist, each with distinct advantages for different applications.

Sign-Magnitude Representation

Sign-magnitude uses the most significant bit (MSB) as a sign indicator: 0 for positive, 1 for negative. The remaining bits represent the magnitude of the number. For an 8-bit representation, +5 is 00000101 and -5 is 10000101.

While intuitive for humans, sign-magnitude presents problems for hardware implementation. It has two representations for zero (+0 and -0), complicating comparisons. More significantly, arithmetic operations require separate circuits for addition and subtraction because the sign bit must be handled differently from magnitude bits.

Sign-magnitude finds use primarily in floating-point representations and some specialized applications where its simplicity outweighs its arithmetic complexity.

One's Complement

One's complement represents negative numbers by inverting all bits of the positive number. To negate a value, change all 0s to 1s and all 1s to 0s. For 8 bits, +5 (00000101) becomes -5 (11111010).

Addition in one's complement works by adding numbers as unsigned values, then adding any carry-out back to the result (end-around carry). This allows the same hardware to perform both addition and subtraction, since subtracting B from A equals adding the one's complement of B to A.

One's complement still has two representations for zero: 00000000 (+0) and 11111111 (-0). This dual zero and the complexity of end-around carry have led to its replacement by two's complement in most modern systems, though it still appears in some networking applications such as IP header checksums.

The range of an n-bit one's complement number is -(2^(n-1) - 1) to +(2^(n-1) - 1). For 8 bits, this is -127 to +127.

Two's Complement

Two's complement has become the dominant representation for signed integers in digital systems due to its elegant properties that simplify arithmetic hardware. To negate a number, invert all bits and add 1. Equivalently, find the one's complement and add 1.

For 8 bits: +5 = 00000101, one's complement = 11111010, two's complement (add 1) = 11111011 = -5.

The key advantages of two's complement include:

  • Single zero representation: Only 00000000 represents zero, eliminating comparison ambiguity
  • Symmetric arithmetic: Addition and subtraction use the same hardware; subtraction is addition of the negated value
  • Overflow detection: Overflow occurs when the carry into the sign bit differs from the carry out
  • Extended range: The range is -(2^(n-1)) to +(2^(n-1) - 1), one more negative value than one's complement

For 8 bits, two's complement represents -128 to +127. The most negative value (-128 = 10000000) has no positive counterpart within the same bit width, a characteristic that programmers must consider when negating values.

Reading a two's complement number: if the MSB is 0, interpret directly as positive. If the MSB is 1, the number is negative; negate it (invert and add 1) to find the magnitude.

Sign extension converts a two's complement number to a wider representation by replicating the sign bit. Extending 8-bit -5 (11111011) to 16 bits produces 1111111111111011, preserving the value. This property simplifies operations between values of different widths.

Excess-N (Biased) Representation

Excess-N representation adds a fixed bias to all values, shifting the range so that the encoded minimum value corresponds to all zeros. This representation simplifies comparisons because numeric ordering matches the binary ordering of encoded values.

Common biases include excess-127 (used in IEEE 754 single-precision floating-point exponents) and excess-1023 (double-precision exponents). In excess-127, the value 0 encodes as 01111111 (127 in unsigned binary), while -127 encodes as 00000000 and +128 encodes as 11111111.

Excess-N finds primary use in floating-point exponent fields where it enables simple magnitude comparison of floating-point numbers by comparing their bit patterns as unsigned integers.

Floating-Point Representation

Floating-point representation enables computers to handle very large and very small numbers with a consistent relative precision. By separating a number into a significand (mantissa) and an exponent, floating-point can represent a vast range of values that would be impractical with fixed-point notation.

IEEE 754 Standard

The IEEE 754 standard defines floating-point formats that have become universal in computing. The standard specifies single precision (32 bits), double precision (64 bits), and extended precision formats, along with rounding modes, special values, and exception handling.

An IEEE 754 number consists of three fields:

  • Sign bit (S): 0 for positive, 1 for negative
  • Exponent (E): Biased representation of the power of 2
  • Fraction (F): The fractional part of the significand

The value represented is: (-1)^S x 1.F x 2^(E - bias)

The "1." before the fraction is implicit (not stored), providing an extra bit of precision. This normalization requires that every nonzero number be adjusted so that its binary representation has exactly one 1 before the binary point.

Single Precision (32-bit)

Single-precision floating-point allocates: 1 sign bit, 8 exponent bits, and 23 fraction bits. The exponent uses excess-127 bias, allowing exponents from -126 to +127 (with 0 and 255 reserved for special cases).

Key characteristics:

  • Approximate decimal precision: 7 significant digits
  • Approximate range: 1.2 x 10^-38 to 3.4 x 10^38
  • Smallest positive normal number: 2^-126
  • Machine epsilon: 2^-23 (approximately 1.19 x 10^-7)

Example: Converting 12.375 to IEEE 754 single precision:

  1. Convert to binary: 12.375 = 1100.011
  2. Normalize: 1.100011 x 2^3
  3. Sign bit: 0 (positive)
  4. Exponent: 3 + 127 = 130 = 10000010
  5. Fraction: 10001100000000000000000 (23 bits after the implicit 1)
  6. Result: 0 10000010 10001100000000000000000 = 0x41460000

Double Precision (64-bit)

Double-precision floating-point provides: 1 sign bit, 11 exponent bits, and 52 fraction bits. The exponent uses excess-1023 bias.

Key characteristics:

  • Approximate decimal precision: 15-16 significant digits
  • Approximate range: 2.2 x 10^-308 to 1.8 x 10^308
  • Smallest positive normal number: 2^-1022
  • Machine epsilon: 2^-52 (approximately 2.22 x 10^-16)

Double precision is the default for most scientific and engineering calculations, providing sufficient range and precision for the vast majority of applications.

Special Values

IEEE 754 reserves certain bit patterns for special values:

  • Zero: Exponent and fraction both zero. IEEE 754 has both +0 and -0, which compare equal but may produce different results in some operations.
  • Infinity: Exponent all ones, fraction all zeros. Results from overflow or division by zero. Positive and negative infinity exist.
  • NaN (Not a Number): Exponent all ones, fraction nonzero. Results from undefined operations like 0/0, infinity minus infinity, or square root of a negative number. NaN propagates through calculations and compares unequal to everything, including itself.
  • Denormalized numbers: Exponent zero, fraction nonzero. Represent values smaller than the minimum normal number, gradually underflowing to zero. The implicit leading 1 becomes 0 for denormalized numbers.

Floating-Point Arithmetic Considerations

Floating-point arithmetic introduces subtleties that digital designers must understand:

  • Rounding errors: Most decimal fractions cannot be exactly represented in binary floating-point, leading to small errors that can accumulate
  • Associativity loss: (a + b) + c may not equal a + (b + c) due to rounding at different magnitudes
  • Catastrophic cancellation: Subtracting nearly equal numbers loses significant digits of precision
  • Overflow and underflow: Results exceeding the representable range become infinity or zero

IEEE 754 defines four rounding modes: round to nearest (even), round toward positive infinity, round toward negative infinity, and round toward zero. Round to nearest even is the default and minimizes statistical bias in repeated calculations.

Binary-Coded Decimal (BCD)

Binary-coded decimal represents each decimal digit with its four-bit binary equivalent, preserving the decimal structure within a binary system. While less efficient than pure binary, BCD simplifies decimal display and avoids the conversion errors inherent in binary-to-decimal translation.

BCD Encoding

In BCD, each decimal digit 0-9 maps to its natural binary equivalent using four bits:

  • 0 = 0000, 1 = 0001, 2 = 0010, 3 = 0011, 4 = 0100
  • 5 = 0101, 6 = 0110, 7 = 0111, 8 = 1000, 9 = 1001

The combinations 1010 through 1111 are unused in standard BCD, making it less space-efficient than binary. A two-digit BCD number requires 8 bits but can only represent 0-99 (100 values), while 8 bits of binary can represent 0-255 (256 values).

Example: The decimal number 47 in BCD is 0100 0111 (4 = 0100, 7 = 0111).

BCD Arithmetic

BCD arithmetic requires correction when results exceed the valid range 0-9. After adding two BCD digits, if the result exceeds 9 or produces a carry, add 6 (0110) to correct the result.

Example: Adding 8 + 5 in BCD:

  • 1000 + 0101 = 1101 (13 in binary, but invalid BCD)
  • Add correction: 1101 + 0110 = 10011
  • Result: 0001 0011 = 13 in BCD (1 carry, 3 units)

Many processors include a decimal adjust instruction that performs this correction automatically after binary addition, enabling efficient BCD arithmetic.

Packed and Unpacked BCD

Packed BCD stores two decimal digits per byte, maximizing storage efficiency. This format is common in financial applications and databases where decimal accuracy is paramount.

Unpacked BCD stores one decimal digit per byte, typically in the lower nibble with the upper nibble set to zero or used for a zone field. This format simplifies processing at the cost of doubling storage requirements.

ASCII digits (0x30 to 0x39) form a type of unpacked BCD, with the upper nibble serving as a zone that identifies the character as a digit.

Applications of BCD

  • Financial calculations: Banks and accounting systems use BCD to avoid the rounding errors of binary floating-point in monetary calculations
  • Digital displays: Seven-segment displays connect naturally to BCD, with dedicated decoder chips converting BCD to segment patterns
  • Real-time clocks: Many RTC chips store time values in BCD format
  • Measurement instruments: Multimeters and other instruments often use BCD internally to maintain decimal precision

Gray Code

Gray code is a binary numeral system where successive values differ in only one bit. This single-bit change property makes Gray code valuable in applications where intermediate states during transitions could cause errors.

Gray Code Sequence

The 4-bit Gray code sequence differs from natural binary:

4-bit Gray Code
Decimal Binary Gray
000000000
100010001
200100011
300110010
401000110
501010111
601100101
701110100
810001100
910011101
1010101111
1110111110
1211001010
1311011011
1411101001
1511111000

Conversion Algorithms

Binary to Gray: The most significant bit remains the same. Each subsequent Gray bit is the XOR of the corresponding binary bit and the binary bit to its left.

Gray[n] = Binary[n] (for MSB)

Gray[i] = Binary[i+1] XOR Binary[i] (for other bits)

Gray to Binary: The most significant bit remains the same. Each subsequent binary bit is the XOR of the corresponding Gray bit and the previously computed binary bit.

Binary[n] = Gray[n] (for MSB)

Binary[i] = Binary[i+1] XOR Gray[i] (for other bits)

Applications

  • Rotary encoders: Position sensors use Gray code on encoder discs to prevent ambiguous readings when the sensor reads between tracks
  • Karnaugh maps: Gray code ordering ensures adjacent cells differ by one variable, simplifying Boolean minimization
  • Analog-to-digital converters: Some ADC designs use Gray code to reduce glitches during transitions
  • FIFO pointers: Asynchronous FIFO designs use Gray code for read and write pointers crossing clock domains
  • Genetic algorithms: Gray code encoding improves crossover and mutation operators in optimization problems

Excess-3 Code

Excess-3 (XS-3) is a self-complementing BCD code where each decimal digit is represented by its binary equivalent plus 3. This representation has the valuable property that the nine's complement of a number can be obtained by simply inverting all bits.

Excess-3 Encoding

Excess-3 Code
Decimal BCD Excess-3
000000011
100010100
200100101
300110110
401000111
501011000
601101001
701111010
810001011
910011100

Self-Complementing Property

The nine's complement of a decimal digit d is (9 - d). In excess-3, the one's complement (bit inversion) of any digit produces its nine's complement:

  • 0 (0011) inverted = (1100) = 9
  • 3 (0110) inverted = (1001) = 6
  • 4 (0111) inverted = (1000) = 5

This property simplifies decimal subtraction hardware by allowing subtraction through addition of the nine's complement plus one (similar to two's complement for binary).

Historical Context

Excess-3 was widely used in early electronic calculators and computers, particularly in machines that performed decimal arithmetic. While less common in modern systems, understanding excess-3 provides insight into the design considerations of early computing and the value of self-complementing codes.

Character Encoding: ASCII

The American Standard Code for Information Interchange (ASCII) established a standard mapping between characters and numeric codes, enabling text interchange between different computer systems. Originally a 7-bit code (128 characters), ASCII has become the foundation for modern character encoding.

ASCII Structure

ASCII divides its 128 code points into distinct regions:

  • Control characters (0-31, 127): Non-printable characters for communication control, including NUL (0), TAB (9), LF/newline (10), CR/carriage return (13), and DEL (127)
  • Punctuation and symbols (32-47, 58-64, 91-96, 123-126): Space (32), digits' neighbors, and various punctuation marks
  • Digits (48-57): Characters '0' through '9'. Note that subtracting 48 (or 0x30) converts an ASCII digit to its numeric value
  • Uppercase letters (65-90): 'A' through 'Z'. Adding 32 converts to lowercase
  • Lowercase letters (97-122): 'a' through 'z'. Subtracting 32 converts to uppercase

ASCII Properties

ASCII's design includes several useful properties:

  • Numeric ordering: Character codes for digits and letters are sequential, enabling simple comparisons and sorting
  • Case conversion: Uppercase and lowercase letters differ only in bit 5 (value 32), enabling efficient case conversion with a single XOR or bit operation
  • Digit extraction: ASCII digits 0-9 occupy codes 48-57, so masking the lower four bits yields the numeric value
  • Control characters: Control characters can be represented as Ctrl+letter, where the control character code equals the letter code with bits 5 and 6 cleared

Extended ASCII

Various 8-bit extensions to ASCII use codes 128-255 for additional characters. These extensions are not standardized, leading to the "code page" system where different regions use different mappings. Common extensions include:

  • ISO 8859-1 (Latin-1): Western European characters, accented letters
  • Windows-1252: Similar to Latin-1 with additional characters in the 128-159 range
  • Code page 437: Original IBM PC character set with line drawing and box characters

These incompatible extensions led to the development of Unicode as a universal character encoding standard.

Character Encoding: Unicode

Unicode provides a universal character set encompassing virtually all writing systems, symbols, and emoji used worldwide. Unlike ASCII, Unicode separates the concept of character identity (code points) from encoding (byte representation), enabling flexibility in storage and transmission.

Unicode Code Points

Unicode assigns each character a unique code point, written as U+XXXX (hexadecimal). The current Unicode standard defines over 140,000 characters across more than 150 scripts. Code points range from U+0000 to U+10FFFF, organized into 17 planes of 65,536 characters each:

  • Basic Multilingual Plane (BMP, Plane 0): U+0000 to U+FFFF, containing most commonly used characters
  • Supplementary Planes (1-16): U+10000 to U+10FFFF, containing less common scripts, historical characters, and emoji

The first 128 Unicode code points match ASCII exactly, ensuring backward compatibility with ASCII text.

UTF-8 Encoding

UTF-8 is a variable-length encoding that represents Unicode code points using one to four bytes. Its design maintains ASCII compatibility while efficiently encoding the full Unicode range:

  • 1 byte (0xxxxxxx): ASCII characters (U+0000 to U+007F)
  • 2 bytes (110xxxxx 10xxxxxx): Latin, Greek, Cyrillic, Hebrew, Arabic (U+0080 to U+07FF)
  • 3 bytes (1110xxxx 10xxxxxx 10xxxxxx): BMP characters including CJK (U+0800 to U+FFFF)
  • 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx): Supplementary characters (U+10000 to U+10FFFF)

UTF-8 advantages include ASCII compatibility, self-synchronization (byte sequence boundaries are unambiguous), and no byte-order issues. It has become the dominant encoding for web content and Unix-like systems.

UTF-16 and UTF-32

UTF-16 uses 16-bit code units. BMP characters require one code unit; supplementary characters require two (a surrogate pair). UTF-16 is used internally by Windows, Java, and JavaScript. Surrogate pairs use special code points (U+D800 to U+DFFF) reserved for this purpose.

UTF-32 uses 32-bit code units, directly encoding each Unicode code point. While simple and fixed-width, UTF-32 requires four bytes per character, making it memory-inefficient for most text. It finds use in internal processing where fixed-width simplifies algorithms.

Both UTF-16 and UTF-32 have byte-order variants (big-endian and little-endian), often indicated by a Byte Order Mark (BOM) at the beginning of the data.

Error-Detecting Codes

Error-detecting codes add redundancy to data, enabling receivers to detect (and sometimes correct) errors introduced during storage or transmission. The design of these codes balances detection capability against overhead.

Parity Bits

The simplest error-detecting code adds a single parity bit to each data word. The parity bit is set so that the total number of 1s in the combined word is either always even (even parity) or always odd (odd parity).

For even parity, the parity bit is the XOR of all data bits. If any single bit flips during transmission, the received parity will be incorrect, indicating an error.

Limitations of single-bit parity:

  • Detects only odd numbers of bit errors (1, 3, 5, ...)
  • Cannot detect even numbers of errors (2, 4, 6, ...)
  • Cannot identify which bit is in error (no correction possible)

Despite limitations, parity remains useful for simple applications and as a component of more sophisticated schemes. Memory systems often use parity to detect single-bit errors in RAM.

Checksums

Checksums compute a summary value from a block of data, appended to the block for verification. Simple checksums add all data values (bytes or words) and store the result. The receiver computes the same sum and compares.

Common checksum algorithms:

  • Simple sum: Add all bytes, typically keeping only the lower bits. Can miss errors that cancel (swapping bytes or offsetting errors)
  • One's complement sum: Used in IP headers; add with end-around carry, enabling the receiver to include the checksum in the sum (result should be all 1s)
  • Fletcher's checksum: Maintains two running sums for position sensitivity, detecting byte reordering
  • Adler-32: Similar to Fletcher but with different moduli, used in zlib compression

Cyclic Redundancy Check (CRC)

CRC treats data as a polynomial and computes the remainder when divided by a generator polynomial. CRCs provide strong error detection with efficient hardware implementation using shift registers and XOR gates.

Common CRC polynomials:

  • CRC-8: 8-bit CRC for small data blocks
  • CRC-16: Used in Modbus, USB, and other protocols
  • CRC-32: Used in Ethernet, ZIP files, PNG images; polynomial 0x04C11DB7

CRC properties:

  • Detects all single-bit errors
  • Detects all double-bit errors (with proper polynomial selection)
  • Detects all odd numbers of bit errors (if polynomial includes factor x+1)
  • Detects burst errors up to the CRC length

CRC computation can be implemented efficiently in hardware using linear feedback shift registers (LFSRs), or in software using lookup tables.

Hamming Codes

Hamming codes not only detect errors but can correct single-bit errors and detect (but not correct) double-bit errors (SECDED: Single Error Correction, Double Error Detection).

Hamming codes place check bits at power-of-2 positions (1, 2, 4, 8, ...). Each check bit covers specific data positions based on binary representation. When an error occurs, the check bits form a syndrome pointing to the error position.

For a (7,4) Hamming code with 4 data bits and 3 check bits:

  • Position 1 (check): covers positions 1, 3, 5, 7
  • Position 2 (check): covers positions 2, 3, 6, 7
  • Position 3 (data)
  • Position 4 (check): covers positions 4, 5, 6, 7
  • Positions 5, 6, 7 (data)

The syndrome (binary number formed by failed check bits) indicates the position of the error, enabling correction by flipping that bit. ECC memory uses Hamming codes or similar schemes to correct single-bit errors in RAM.

Data Compression Codes

Data compression codes reduce the size of data for efficient storage and transmission. Compression can be lossless (perfect reconstruction) or lossy (approximate reconstruction with smaller size).

Run-Length Encoding (RLE)

RLE replaces sequences of repeated values with a count and the value. Effective for data with long runs of identical values, such as simple graphics or sparse data.

Example: "AAAAAABBBCC" becomes "6A3B2C" or in binary form, a sequence of count-value pairs.

RLE appears in PCX graphics format, fax transmission (Group 3 fax), and as a component of more complex schemes. It can actually expand data if runs are short, so practical implementations include escape codes or only encode runs above a minimum length.

Huffman Coding

Huffman coding assigns shorter codes to more frequent symbols and longer codes to less frequent symbols, minimizing average code length. The code is prefix-free: no code is a prefix of another, enabling unambiguous decoding.

Huffman tree construction:

  1. Create a leaf node for each symbol with its frequency
  2. Repeatedly combine the two lowest-frequency nodes into a new internal node
  3. Continue until one root node remains
  4. Assign 0 to left branches and 1 to right branches (or vice versa)
  5. Read codes by traversing from root to each leaf

Huffman coding is optimal for symbol-by-symbol encoding but cannot exploit patterns across symbols. It forms the basis of JPEG, MP3, and many other compression formats.

Arithmetic Coding

Arithmetic coding represents an entire message as a single number between 0 and 1, achieving compression closer to the theoretical entropy limit than Huffman coding. Each symbol narrows the range based on its probability, with the final number uniquely identifying the message.

Arithmetic coding can approach optimal compression for any probability distribution and naturally handles adaptive models where probabilities change based on context. It is used in JPEG 2000, H.264/H.265 video, and modern compression algorithms.

Dictionary-Based Compression

Dictionary-based methods replace repeated patterns with references to a dictionary of previously seen patterns:

  • LZ77: Uses a sliding window as the dictionary; matches reference a position and length in recent data. Used in gzip, DEFLATE, and PNG.
  • LZ78/LZW: Builds an explicit dictionary during encoding; references are dictionary indices. Used in GIF images and Unix compress.

Modern algorithms like LZMA (used in 7-Zip and xz) combine dictionary compression with sophisticated entropy coding for high compression ratios.

Practical Considerations

Choosing Number Representations

Selecting appropriate number representations depends on application requirements:

  • Fixed-point vs. floating-point: Fixed-point offers deterministic timing and simpler hardware but limited range; floating-point provides wide range and automatic scaling but introduces rounding complexities
  • Signed representation: Two's complement for general-purpose computing; sign-magnitude for floating-point significands; biased for floating-point exponents
  • Precision requirements: Match word size to needed precision while considering memory and performance implications
  • BCD vs. binary: Use BCD when decimal precision is critical and display conversion is frequent; use binary for general computation

Endianness

Multi-byte values can be stored with the most significant byte first (big-endian) or least significant byte first (little-endian). This affects data interchange between systems:

  • Big-endian: Natural reading order; used in network protocols (network byte order), some processors (Motorola 68k, SPARC)
  • Little-endian: Efficient for incrementing multi-byte values; used in x86, ARM (configurable), most modern processors

When exchanging binary data between systems, explicit conversion using functions like htons/ntohs (host-to-network short) ensures correct interpretation regardless of native byte order.

Code Selection for Specific Applications

  • Position encoding: Gray code for encoders and sensors
  • Error detection in storage: CRC for block verification, ECC for bit-level correction
  • Communication protocols: CRC for frame checking, possibly with forward error correction for unreliable channels
  • Text storage: UTF-8 for modern applications, ASCII for legacy compatibility
  • Financial calculations: BCD or decimal floating-point to preserve exact decimal values

Summary

Number systems and codes form the vocabulary of digital electronics. Binary provides the foundation, directly mapping to electronic switching, while hexadecimal and octal offer human-readable compact notation. Signed number representations enable arithmetic on positive and negative values, with two's complement dominating due to its elegant properties. Floating-point extends the range of representable values at the cost of increased complexity and potential for rounding errors.

Specialized codes address specific needs: BCD maintains decimal precision, Gray code ensures single-bit transitions, and character codes like ASCII and Unicode enable text representation. Error-detecting codes from simple parity to sophisticated CRCs protect data integrity, while compression codes reduce storage and transmission requirements.

Mastery of these concepts enables digital designers to select appropriate representations for each application, convert between systems confidently, and understand the tradeoffs inherent in different encoding choices. Whether designing arithmetic logic units, communication protocols, or data storage systems, the principles of number systems and codes underpin every aspect of digital electronics.

Further Reading

  • Explore Boolean algebra and logic gates to understand how arithmetic operations are implemented in hardware
  • Study computer architecture to see how number representations affect processor design
  • Investigate digital signal processing for applications of fixed-point and floating-point arithmetic
  • Learn about communication systems and protocols that rely on error-detecting codes
  • Examine memory systems to understand ECC and data integrity mechanisms