Serializer-Deserializer Design
Serializer-deserializer circuits, commonly known as SerDes, form the critical interface between parallel digital logic operating at moderate clock rates and high-speed serial communication channels. By converting wide parallel data buses into narrow serial streams and vice versa, SerDes enables multi-gigabit data transmission over a minimal number of signal lines while maintaining compatibility with standard digital design practices and manufacturing processes.
The serializer takes parallel data words from the digital domain and produces a continuous stream of serial bits at the line rate. Conversely, the deserializer accepts the incoming serial stream and reconstructs the original parallel words for processing by digital logic. This seemingly straightforward conversion encompasses sophisticated timing, synchronization, and signal processing challenges that determine whether a communication link achieves reliable operation at target data rates.
Parallel-to-Serial Conversion
The serializer transforms parallel data words into a serial bit stream by transmitting each bit sequentially at a much higher rate than the parallel interface clock. This conversion enables high aggregate bandwidth over a single differential pair, eliminating the timing skew and routing complexity that plague wide parallel buses at high speeds.
Multiplexer Tree Architecture
The most common serializer architecture employs a tree of multiplexers that progressively combine parallel inputs into the final serial output. A typical implementation begins with a parallel data word, perhaps 8 or 10 bits wide, and uses multiple stages of 2:1 multiplexers to produce the serial bit stream. Each multiplexer stage halves the number of data paths while doubling the data rate, ultimately producing a single output operating at the full line rate.
The multiplexer tree requires careful attention to timing at each stage. Clock signals for each multiplexer level must arrive with precise phase relationships to ensure correct bit ordering and avoid glitches at output transitions. Retiming flip-flops between stages help maintain signal integrity and provide well-defined timing boundaries. The final output stage typically employs current-mode logic or other high-speed circuit techniques capable of clean switching at multi-gigabit rates.
Shift Register Serialization
An alternative serialization approach loads parallel data into a shift register and clocks out bits sequentially. When the parallel load signal activates, the entire data word transfers into the register simultaneously. The high-speed serial clock then shifts bits out one at a time until the register empties, at which point a new parallel word loads.
Shift register serializers offer conceptual simplicity and straightforward timing analysis. However, the need to operate the entire shift register at the line rate can limit maximum speed compared to multiplexer trees where only the final stages run at full speed. Hybrid architectures combine shift register and multiplexer approaches to balance complexity, speed, and power consumption.
Data Path Timing
Proper timing through the serializer ensures that each bit occupies its correct position in the output stream. The parallel input data must remain stable during the serialization window, typically requiring setup and hold times relative to the parallel clock. Pipeline registers may add latency but provide cleaner timing boundaries and enable higher operating frequencies.
The relationship between parallel and serial clocks must maintain precise phase alignment. Any phase drift causes bits to shift position in the output stream, corrupting the data. Phase-locked loops generate the high-speed serial clock from a lower-frequency reference while maintaining the required phase relationship with the parallel interface clock.
Serial-to-Parallel Conversion
The deserializer performs the inverse operation, accepting a continuous serial bit stream and reconstructing the original parallel data words. This process requires not only capturing each bit but also determining where word boundaries occur in the continuous stream and maintaining proper alignment despite variations in the incoming signal.
Demultiplexer Architecture
Demultiplexer-based deserializers reverse the multiplexer tree structure used in serializers. The incoming serial stream feeds a tree of 1:2 demultiplexers that progressively separate the data into parallel paths. Each stage operates at half the rate of the preceding stage, ultimately producing a parallel word at the target output rate.
Critical timing requirements include proper sampling of the incoming data at each demultiplexer stage and correct routing based on the clock phase. The recovered clock must align precisely with bit transitions to sample data at the center of each bit period, maximizing noise margin and tolerance to jitter.
Shift Register Deserialization
Shift register deserializers clock incoming bits into a serial-in parallel-out shift register at the line rate. After accumulating a complete word, the parallel outputs transfer to a holding register for downstream processing while the shift register continues accepting new bits. This ping-pong arrangement maintains continuous operation without gaps.
The shift register approach requires operating at the full line rate throughout the register, which may limit maximum speed. However, the architecture provides straightforward implementation and clear timing relationships. Proper word alignment requires additional logic to identify word boundaries and control the parallel load timing.
Oversampling Techniques
Some deserializer architectures sample the incoming data at multiples of the bit rate, typically 2x, 4x, or higher oversampling ratios. Multiple samples per bit period enable digital processing to determine optimal sampling points and adapt to changing channel conditions. The additional samples also facilitate phase interpolation for fine-grained clock adjustment.
Oversampling relaxes analog circuit requirements by allowing digital algorithms to select the best sample from multiple candidates. However, the approach increases power consumption and circuit complexity due to the higher sampling rate. The trade-off between analog precision and digital processing depends on the target data rate, power budget, and available circuit technology.
Clock Multiplication
Serializer-deserializer circuits require high-frequency clocks derived from lower-frequency references. Clock multiplication generates the line-rate clock needed for serialization while maintaining synchronization with system timing. The quality of this clock directly impacts transmit jitter and overall link performance.
Integer-N Phase-Locked Loops
Integer-N PLLs multiply the reference frequency by an integer factor N by dividing the VCO output and comparing it to the reference in a phase detector. The loop filter converts the phase error into a control voltage that adjusts the VCO frequency until the divided output matches the reference. Once locked, the VCO output provides a clock at N times the reference frequency.
The multiplication factor determines the relationship between reference frequency and output frequency. For example, a 100 MHz reference with N=100 produces a 10 GHz output clock suitable for 10 Gbps serialization. The PLL loop bandwidth trades off jitter filtering against acquisition speed and reference spur suppression.
Fractional-N Synthesis
Fractional-N PLLs achieve non-integer frequency multiplication by dynamically modulating the divider ratio. A delta-sigma modulator controls the divider, alternating between integer values to produce an average division ratio with fractional precision. This enables generating specific frequencies not achievable with integer multiplication alone.
The modulated divider introduces quantization noise that appears as phase noise on the output. Careful delta-sigma modulator design shapes this noise to frequencies where the loop filter attenuates it. Higher-order modulators push noise to higher frequencies for better suppression but require more complex implementation and careful stability analysis.
Jitter Considerations
Clock jitter directly translates to transmit jitter on the serial output, potentially causing bit errors at the receiver. The PLL must attenuate jitter on the reference clock while contributing minimal jitter from its own circuits. Voltage-controlled oscillator phase noise often dominates the jitter budget, making VCO design critical for high-performance SerDes.
Reference jitter within the PLL bandwidth passes through to the output, while jitter above the bandwidth is attenuated. Conversely, VCO phase noise within the bandwidth is suppressed by the feedback loop, but noise above the bandwidth appears directly on the output. Optimal bandwidth selection balances these competing requirements based on the spectral characteristics of reference and VCO noise.
Multi-Phase Clock Generation
Many SerDes architectures require multiple clock phases to control different serialization stages. A common approach divides the VCO output to generate quadrature phases (0, 90, 180, 270 degrees) or finer phase increments. These phases enable time-interleaved operations and provide the timing signals for multiplexer trees.
Phase interpolators generate intermediate phases by combining adjacent outputs from the divider or VCO. Digital control of interpolator weights enables fine phase adjustment for clock and data recovery applications. The interpolator resolution and linearity affect the precision of phase control and contribute to deterministic jitter.
Clock Recovery
The deserializer must recover timing information from the incoming serial data stream since most high-speed serial links do not include a separate clock signal. Clock and data recovery circuits generate a sampling clock synchronized to bit transitions in the data, enabling correct data capture despite frequency differences between transmitter and receiver reference oscillators.
Phase-Locked Loop CDR
Traditional CDR architectures employ a phase-locked loop that adjusts a local oscillator to track the incoming data timing. A phase detector compares data transitions to clock edges, producing an error signal that drives the loop filter. The filtered error controls the VCO or phase interpolator, adjusting timing until the clock samples data at optimal points.
The CDR loop bandwidth determines tracking characteristics. Wider bandwidth enables faster acquisition and better tracking of low-frequency phase variations but passes more high-frequency jitter to the sampling clock. Narrower bandwidth provides better jitter filtering but slower response to frequency changes. Protocol requirements and channel characteristics guide bandwidth selection.
Bang-Bang Phase Detectors
Bang-bang phase detectors make binary early/late decisions by comparing data transition timing to clock edges. Each data transition produces either an "early" or "late" indication regardless of the actual phase error magnitude. The loop integrates these decisions to drive the clock phase toward the optimal sampling point.
The binary nature of bang-bang detection creates limit cycle oscillations called dithering jitter, as the loop cannot settle precisely at zero phase error but instead hunts around it. Dither amplitude depends on loop gain and update rate. Despite this limitation, bang-bang detectors offer robust operation, simple implementation, and tolerance to input amplitude variations.
Linear Phase Detectors
Linear phase detectors produce an output proportional to the phase error magnitude, enabling smoother tracking without dithering jitter. These detectors typically compare the analog signal level at sampling instants to reference thresholds, inferring phase error from the sampled amplitude. Alexander and Mueller-Muller detectors represent common linear detection approaches.
Linear detection requires more sophisticated analog circuitry and may exhibit sensitivity to signal amplitude variations. Some architectures combine linear detection for steady-state operation with bang-bang detection for acquisition, capturing the benefits of both approaches across different operating conditions.
Frequency Acquisition
Before phase tracking can begin, the CDR must acquire the approximate frequency of the incoming data. Reference-based CDRs compare the recovered frequency to a local crystal reference, limiting the required acquisition range to the tolerance of the two oscillators (typically hundreds of parts per million). Referenceless CDRs must acquire frequency from the data alone, requiring wider tuning range and more sophisticated acquisition algorithms.
Frequency detection circuits monitor phase error trends to distinguish frequency offset from phase noise. Consistent early or late indications suggest frequency error requiring correction, while random early/late patterns indicate phase noise around the correct frequency. Acquisition state machines guide the CDR through frequency acquisition before transitioning to phase tracking mode.
Word Alignment
After recovering individual bits from the serial stream, the deserializer must determine where word boundaries occur. Without explicit framing, the continuous bit sequence could be interpreted starting at any of N positions for an N-bit word, only one of which represents the correct alignment. Word alignment mechanisms identify the proper boundaries and maintain alignment during operation.
Alignment Challenges
The serial bit stream carries no inherent indication of word boundaries. Consider an 8-bit parallel word: the deserializer produces 8 bits per parallel clock cycle, but which 8 consecutive bits constitute a valid word depends on alignment. Misalignment by even one bit position produces completely incorrect data, typically appearing as random garbage at the receiver output.
Alignment must be established during link initialization and maintained throughout operation. Any slip in alignment, whether from bit errors, clock glitches, or other disturbances, corrupts all subsequent data until realignment occurs. Robust systems include mechanisms to detect alignment loss and reinitiate the alignment process.
Training Sequences
Many protocols define specific training sequences transmitted during link initialization that the receiver uses to establish alignment. These sequences contain recognizable patterns that produce unique outputs only when correctly aligned. The receiver shifts alignment incrementally until detecting the expected pattern, then transitions to normal operation.
Training sequences may also serve other purposes including clock recovery lock verification, equalization adaptation, and link quality assessment. The duration of training affects link startup time and must accommodate worst-case conditions while not unduly delaying normal operation. Some protocols support periodic retraining to readapt to changing channel conditions.
Barrel Shifter Implementation
A barrel shifter enables rapid alignment adjustment by selecting which bit positions map to the parallel output. The shifter accepts more bits than the word width (typically double) and selects a contiguous subset based on an alignment offset. Changing the offset by one position shifts all output bits by one position in the serial stream.
The barrel shifter operates at the parallel clock rate rather than the line rate, simplifying implementation. Alignment control logic monitors for proper alignment and adjusts the offset when misalignment is detected. The shift can occur dynamically during operation if protocols support it, or only during initialization for simpler implementations.
Comma Detection
Comma characters provide explicit markers in encoded data streams that enable reliable word alignment without special training sequences. Line codes like 8b/10b define specific bit patterns called commas that do not appear within any valid encoded character, allowing unambiguous identification regardless of alignment.
Comma Character Properties
A valid comma character must be recognizable regardless of word alignment, meaning the comma pattern cannot appear as any rotation or combination of other valid code words. In 8b/10b encoding, the K28.5 character serves as the primary comma with distinctive patterns that satisfy this uniqueness requirement. The comma also maintains DC balance properties required by the encoding.
Commas appear in the data stream to mark packet boundaries, provide idle patterns during gaps between data, and enable receiver alignment. Protocols specify when commas must be transmitted and how receivers should respond upon detecting them. Some protocols use comma sequences rather than single commas for more robust alignment.
Detection Circuit Implementation
Comma detection circuits compare incoming bit patterns against known comma values. Since alignment is unknown during initial acquisition, the detector must examine all possible alignments simultaneously. A bank of comparators checks for comma matches at each possible offset, flagging when a match occurs.
Upon detecting a comma, the alignment logic adjusts the barrel shifter offset to place the comma at the proper word boundary. Once aligned, the detector continues monitoring for commas to verify alignment remains correct. Detecting commas at unexpected positions indicates loss of alignment requiring correction.
Alignment State Machine
An alignment state machine coordinates the acquisition and maintenance of word alignment. Initial states search for comma characters while trying different alignments. Upon finding a comma, the machine adjusts alignment and transitions to a verification state that confirms consistent comma detection. Only after verification does the machine declare alignment achieved and enable data output.
During normal operation, the state machine monitors for alignment errors indicated by commas appearing at wrong positions or other error conditions. Configurable thresholds determine how many errors trigger realignment, balancing sensitivity to genuine problems against robustness to transient disturbances. The machine may attempt local correction before falling back to full reacquisition.
Comma Masking and Filtering
Not all comma detections necessarily indicate boundaries requiring alignment action. Data patterns might occasionally mimic comma sequences, particularly in systems without strict encoding rules. Comma filtering applies additional criteria to distinguish true alignment markers from coincidental matches.
Comma masking allows portions of the comma pattern to vary while still triggering detection. This accommodates protocols where comma variations indicate different boundary types or carry additional information. The mask pattern specifies which bits must match exactly and which can assume any value while still indicating a comma.
Elastic Buffers and Rate Matching
Even after clock recovery locks, slight frequency differences between transmitter and receiver persist due to reference oscillator tolerances. These differences cause the receiver to gain or lose bits over time relative to the transmitter. Elastic buffers accommodate this rate mismatch by adding or removing data during non-critical intervals.
FIFO Buffer Architecture
The elastic buffer implements a first-in first-out memory between the recovered clock domain and the local clock domain. Data enters the FIFO at the recovered rate and exits at the local rate. The FIFO depth absorbs rate variations, increasing when the receiver runs slow and decreasing when it runs fast relative to the transmitter.
The FIFO must be large enough to accommodate rate differences over the interval between adjustments. Nominal operation maintains approximately half-full status, providing margin in both directions. Threshold detectors monitor fill level and trigger corrective action when approaching overflow or underflow conditions.
Skip and Insert Operations
When the elastic buffer approaches overflow, a skip operation removes data to reduce fill level. When approaching underflow, an insert operation adds data to increase fill level. These operations must occur at positions where the added or removed data does not corrupt the information being transferred.
Protocols typically designate specific character types or positions eligible for skip and insert operations. Idle characters between packets provide natural adjustment opportunities. Some protocols embed special skip-ordered sets that receivers recognize and use for rate matching. The adjustment mechanism must maintain alignment and error checking across the operation.
Error Detection and Monitoring
SerDes circuits include mechanisms to detect errors and monitor link quality. These features support debugging during development, manufacturing test, and runtime health monitoring. Early detection of degrading conditions enables corrective action before errors corrupt data.
Code Violation Detection
Encoded data streams should contain only valid code words. When the deserializer detects a bit pattern that does not correspond to any valid code, a code violation has occurred indicating a transmission error. Error counters accumulate violation counts, and threshold comparisons trigger alarms when error rates exceed acceptable levels.
Code violations may indicate various problems including excessive noise, inadequate equalization, clock recovery issues, or alignment loss. The pattern of violations can help diagnose root causes. Burst errors suggest transient disturbances while steady error rates point to marginal signal quality requiring investigation.
Disparity Error Detection
Encoded systems like 8b/10b track running disparity to ensure DC balance. Each code word either maintains neutral disparity or must alternate with words of opposite disparity. Disparity errors occur when the running count indicates improper sequencing, revealing errors that might not produce invalid codes.
Disparity checking provides additional error detection beyond code validation. Some errors produce valid codes with incorrect disparity, detectable only through disparity tracking. The combination of code and disparity checking improves overall error detection coverage.
Eye Margin Monitoring
Eye margin monitoring deliberately samples data at non-optimal positions to characterize available timing and voltage margin. By measuring error rates as sampling points move away from center, the system maps the boundaries of the data eye. This information reveals how much margin exists against various impairments.
Runtime eye monitoring enables tracking signal quality changes over time due to temperature drift, aging, or environmental factors. Declining margins trigger alerts before errors actually occur, enabling proactive maintenance. The monitoring function may run continuously at low impact or periodically with more thorough characterization.
Implementation Considerations
Practical SerDes design involves numerous trade-offs across performance, power, area, and complexity. The target application's requirements guide these decisions, with different emphases for low-power mobile devices versus high-performance data center equipment.
Power Optimization
SerDes circuits often dominate power consumption in high-speed systems. The analog circuits including drivers, receivers, and VCOs require careful design to minimize power while maintaining performance. Power scales with data rate, making efficiency increasingly critical at higher speeds.
Architectural choices affect power consumption significantly. Serializer multiplexer depth, deserializer oversampling ratio, and equalization complexity all impact power. Low-power modes that reduce functionality during idle periods conserve energy when full performance is unnecessary. Advanced process technologies enable lower power at equivalent performance levels.
Process Technology Selection
The choice of semiconductor process technology affects achievable data rates, power consumption, and integration options. Finer geometry processes enable faster transistors and lower power but may compromise analog circuit performance. Some designs use mixed processes, placing digital logic in advanced nodes while implementing sensitive analog circuits in optimized processes.
Process variations affect SerDes performance across manufacturing lots and operating conditions. Robust designs include calibration and adaptation mechanisms that compensate for variations. Characterization across process corners validates operation under worst-case conditions and establishes specifications achievable in production.
Testing and Characterization
SerDes testing verifies both parametric performance and functional correctness. Parametric tests measure jitter, voltage levels, rise times, and other analog characteristics. Functional tests verify data integrity across various patterns and operating conditions. Built-in self-test features enable testing without expensive external equipment.
Loopback modes connect transmitter output to receiver input for self-testing without external connections. Different loopback points test different portions of the circuit. Pattern generators and checkers built into the SerDes enable bit error rate testing at speed. These features support manufacturing test and field diagnostics.
Summary
Serializer-deserializer design encompasses the fundamental techniques enabling high-speed digital communication over serial links. Parallel-to-serial conversion through multiplexer trees or shift registers creates the serial bit stream, while serial-to-parallel conversion reconstructs the original data. Clock multiplication generates the high-speed timing required for serialization, and clock recovery extracts timing from the incoming data stream.
Word alignment ensures correct interpretation of the continuous bit stream, with comma detection providing explicit markers for reliable synchronization. Elastic buffers handle rate differences between transmitter and receiver clocks. Error detection and monitoring maintain link integrity and enable diagnosis of problems. Together, these techniques form the foundation of modern high-speed serial communication, enabling the multi-gigabit interfaces that connect components within systems and systems to networks.