Router Architecture

Router Architecture

PCBWay

Router architecture is shown in the figure below :

DRex Electronics

Router Architecture

The four router components can be identified :

  1. Input Ports : An input port performs several key functions. It performs the physical layer function of terminating an incoming physical link at a router; this is shown in the leftmost box of the input port and the rightmost box of the output port in the figure above. An input port also performs link-layer functions needed to interoperate with the link layer at the other side of the incoming link; this is represented by the middle boxes in the input and output ports. Perhaps most crucially, the lookup function is also performed at the input port; this will occur in the rightmost box of the input port. It is here that the forwarding table is consulted to determine the router output port to which an arriving packet will be forwarded via the switching fabric. Control packets (for example, packets carrying routing protocol information) are forwarded from an input port to the routing processor. Note that the term port here – referring to the physical input and output router interfaces – is distinctly different from the software ports associated with network applications and sockets.
  2. Switching Fabric : The switching fabric connects the router’s input ports to its output ports. This switching fabric is completely contained within the router – a network inside of a network router!
  3. Output Ports : An output port stores packets received from the switching fabric and transmits these packets on the outgoing link by performing the necessary link-layer and physical-layer functions. When a link is bidirectional (that is, carries traffic in both directions), an output port will typically be paired with the input port for that link on the same line card (a printed circuit board containing one or more input ports, which is connected to the switching fabric).
  4. Rotating Processor : The routing processor executes the routing protocols, maintains routing tables and attached link state information, and computes for forwarding table for the router. It also performs the network management functions.

A router’s input ports, output ports, and switching fabric together implement the forwarding functions and are almost always implemented in hardware, as shown in the figure above.

These forwarding functions are sometimes collectively referred to as the router forwarding plane. To appreciate why a hardware implementation is needed, consider that with a 10 Gbps input link and a 64-byte IP datagram, the input port has only 51.2 ns to process the datagram before another datagram may arrive. If N ports are combined on a line card (as is often done in practice), the datagram-processing pipeline must operate N times faster – far too fast for software implementation. Forwarding plane hardware can be implemented either using a router vendor’s own hardware designs, or constructed using purchased merchant-silicon chips (e.g. as sold by companies as Intel and Broadcom).

While the forwarding plane operates at the nanosecond time scale, a router’s control functions – executing the routing protocols, responding to attached links that grow up or down, and performing management functions operate at the millisecond or second timescale. The router control plane functions are usually implemented in software and execute on the routing processor (typically a traditional CPU).

Before delving into the details of a router’s control and data plane, let’s return to our analogy where forwarding is compared to cars entering and leaving an interchange. Let’s suppose that the interchange is a roundabout, and that before a car enters the roundabout, a bit of processing is required – the car stops at an entry station and indicates its final destination (not at the local roundabout, but the ultimate destination, of its journey). An attendant at the entry station looks up the final destination, determines the roundabout exit that leads to that final destination, and tells the driver which roundabout exit to take. The car enters the roundabout (which may be filled with other cars entering from other input roads and heading to other roundabout exits) and eventually leavers at the prescribed roundabout exit ramp, where it may encounter other cars leaving the roundabout at that exit.

We can recognize the principal router components in the above figure in this analogy – the entry road and entry station correspond to the input port (with a lookup function to determine to local outgoing port); the roundabout corresponds to the switch fabric; and the roundabout exit road corresponds to the output port. With this analogy, it’s instructive to consider where bottlenecks might occur.

What happens if cars arrive blazingly fast (for example, the roundabout is in Germany or Italy!) but the station attendant is slow? How fast must the attendant work to ensure there’s no backup on an entry road? Even with a blazingly fast attendant, what happens if cars traverse the roundabout slowly – can backups still occur? And what happens if most of the entering cars all want to leave the roundabout at the same exit ramp – can backups occur at the exit ramp or elsewhere? How should the roundabout operate if we want to assign priorities to different cars, or block certain cars from entering the roundabout in the first place? These are all analogous to critical questions faced by router and switch designers.

Input Processing

A detail view of input processing is given in the figure below :

Router Input Processing

As discussed above, the input port’s line termination function and link-layer processing implement the physical and link layers for that individual input link. The lookup performed in the input port is central to the router’s operation – it is here that the router uses the forwarding table to look up the output port to which an arriving packet will be forwarded via the switch fabric. The forwarding table is computed and updated by the routing processor, with a shadow copy typically stored at each input port. The forwarding table is copied from the routing processor to the line cars over a separate bus (e.g. a PCI bus) indicated by the dashed line from the routing processor to the input line cards.

With a shadow copy, forwarding decision can be made locally, at each input port, without invoking the centralized routing processor on a per-packet basis and thus avoiding a centralized processing bottleneck.

Given the existence of a forwarding table, lookup is conceptually simple –we just search through the forwarding table looking for the longest prefix match. But at Gigabit transmission rates, this lookup must be performed in nanoseconds.

Thus not only must lookup be performed in hardware, but techniques beyond a simple linear search through a large table are needed. Special attention must also be paid to memory access times, resulting in designs with embedded on-chip DRAM and faster SRAM (used as a DRAM cache) memories.  Ternary Content Address Memories (TCAMs) are also often used to lookup. With a TCAM, a 32-bit IP address is presented to the memory, which return the content of the forwarding table entry for that address in essentially constant time.  The Cisco 8500 has a 64K CAM for each input port.

Once a packet’s output port has been determined via the lookup, the packet can be sent into the switching fabric. In some designs, a packet may be temporarily blocked from entering the switching fabric if packets from other input porta are currently using the fabric. A blocked packet will be queued at the input port and then scheduled to cross the fabric at a later point in time.

Although “lookup” is arguably the most important action in input port processing, many other actions must be taken:

  1. Physical- and link-layer processing must occur
  2. The packet’s version number, checksum and time-to-live filed must be checked and the latter two fields rewritten
  3. Counters used for network management (such as the number of IP datagrams needed) must be updated.

The input port steps of looking up an IP address (“match”) then sending the packet into the switching fabric (“action”) is a specific case of a more general “match plus action” abstraction that is performed in many network devices, not just routers.

In link-layer switches, link-layer destination addresses are looked up and several actions may be taken in addition to sending the frame into the switching fabric towards the output port.

In firewalls, devices that filter out selected incoming packets, an incoming packet whose header matches a given criteria (e.g. a combination of source/destination IP and addresses and transport-layer port numbers) may be prevented from being forwarded (action).

In a network address translator (NAT) an incoming packet whose transport-layer port number matches a given value will have its port number rewritten before forwarding (action). Thus, the “match plus action” abstraction is both powerful and prevalent in network devices.

Switching

The switching fabric is at the very heart of a router, as it is through this fabric that the packets are actually switched (that is, forwarded) from an input port to an output port. Switching can be accomplished in a number of ways, as shown in the figure below :

Router Switching

  • Switching via memory : The simplest, earliest routers were traditional computers, with switching between input and output ports being done under direct control of the CPU (routing processor). Input and output ports functioned as traditional I/O devices in a traditional operating system. An input port with an arriving packet first signalled the routing processor via an interrupt. The packet was then copied from the input port into processor memory. The routing processor then extracted the destination address from the header, looked up the appropriate output port in the forwarding table, and copied the packet to the output port’s buffers.

In this scenario, if the memory bandwidth is such that B packets per second can be written into, or read from, memory, then the overall forwarding throughput (the total rate at which packets are transferred from input ports to output ports) must be less than B/2. Note also that two packets cannot be forwarded at the same time, even if they have different destination ports, since only one memory read/write over the shared system bus can be done at a time.

Many modern routers switch via memory. A major difference from early routers, however, is that the lookup of the destination address and the storing of the packet into the appropriate memory location are performed by processing on the input line cards. In some ways, routers that switch via memory look very much like shared-memory multiprocessors, with the processing on a line card switching (writing) packets into the memory of the appropriate output port. Cisco’s Catalyst 8500 series switches forward packets via a shared memory.

  • Switching via a Bus : In this approach, an input port transfers a packet directly to the output port over a shared bus, without intervention by the routing processor. This is typically done by having the input port pre-pend a switch-internal label (header) to the packet indicating the local output port to which this packet is being transferred and transmitting the packet onto the bus. The packet is received by all output ports, but only the port that matches the label will keep the packet. The label is then removed at the output port, as this label is only used within the switch to cross the bus. If multiple packets arrive to the router at the same time, each at a different input port, all but one must wait since only one packet can cross the bust at a time. Because every packet muss cross the single bus, the switching speed of the router is limited to the bus speed: in our roundabout analogy, this is as if the roundabout could only contain one car at a time. Nonetheless, switching vis a bus is often sufficient for routers that operate in small local area and enterprise networks. The Cisco 5600 switches packets over 32 Gbps backplane bus.
  • Switching via an Interconnection Network : One way to overcome the bandwidth limitation of a single, shared bus is to use a more sophisticated interconnection network, such as those that have been used in the past to interconnect processors in a multiprocessor computer architecture. A crossbar switch is an interconnection network consisting of 2N busses that connect N input ports to N output ports, as shown in the figure above. Each vertical bus intersects each horizontal bus at a crosspoint, which can be opened or closed at any time by the switch fabric controller (whose logic is part of the switch fabric itself). When a packet arrives from port A and needs to be forwarded to port Y, the switch controller closes the crosspoint at the intersection of buses A and Y, and port A then sends the packet onto its bus, which is picked up (only) by bus Y. Note that a packet from port B can be forwarded to port X at the same time, since the A-to-Y and B-to-X packets use different input and output busses. Thus, unlike the previous two switching approaches, crossbar networks are capable of forwarding multiple packets in parallel. However, if two packets from two different input ports are destined to the same output port, then one will have to wait at the input, since only one packet can be sent over any given bus at a time.

More sophisticated interconnection networks use multiple stages of switching elements to allow packets from different input ports to proceed towards the same output port at the same time through the switching fabric.

Output Processing

Output port processing, shown in the figure below, takes packets that have been stored in the output port’s memory and transmits them over the output link. This includes selecting and de-queuing packets for transmission, and performing the needed link-layer and physical-layer transmission functions.

Router Output Processing