by Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang,
Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown
MIT CSAIL, Cisco Systems, Barefoot Networks, and Stanford University
The Push-In First-Out Queue (PIFO) is a priority queue data structure that allows packets to be "pushed" (enqueued) into an arbitrary location in the queue, but only dequeued from the head. PIFOs enable programmable packet scheduling: by programming how each packet's location in the PIFO is computed as it arrives, PIFOs allow network operators to express a wide variety of practical schedulers. Further, PIFOs are feasible in hardware at the high clock rates (1 GHz) typical of high-end switches today.The PIFO paper Source code
What motivated this work?
Today's switch schedulers are fixed. As an operator, beyond tweaking a few parameters in an existing scheduling algorithm, there is no way to express a new scheduling algorithm once a switch has been built.
This capability to program a switch is useful in many ways. It allows network operators to customize their network by running specific schedulers for specific application objectives. For instance, the Shortest Remaining Processing Time scheduler minimizes flow completion times, Fair Queueing shares constrained link capacity equitably among contending flows, and least slack time first minimizes tail packet delays.
Yet, despite demonstrating significant benefits in simulations, these schedulers haven't been realized in practice because there is no way to modify an existing switch to support a new scheduler. PIFOs aim to change that by making the switch scheduler programmable.
How do PIFOs work?
Any scheduling algorithm makes two decisions: in what order packets are transmitted and when? PIFOs exploit the observation that in many practical schedulers, these decisions can be made as the packet is enqueued. Put differently, a packet's place in the scheduling order or its scheduling time can be determined at enqueue, exploiting the fact that the relative order of already buffered packets does not need to change.
An example of this is strict priority scheduling, where each packet has a priority associated with it. Using this priority, a packet can be enqueued into its right location in an array that maintains scheduling order. Further, packets that are already enqueued don't need to be reordered relative to each other. This sorted array is the PIFO, and a packet's location is determined by writing a program to compute it every time a packet arrives. This program computing packet locations is what makes PIFOs programmable.
Now, there are algorithms where the relative order of already buffered packets does need to change with new packet arrivals. An example is Hierarchical Packet Fair Queueing (HPFQ), which divides link capacity among classes in some ratio, and then recursively in some ration between flows within a class, and so on. However, with a minor change to the PIFO to enqueue either a packet or a pointer to another PIFO, HPFQ can be implemented using a tree of PIFOs. This ability to compose PIFOs together is powerful and allows us to similarly express many hierarchical schedulers. Section 2.2 of the paper has further details.
What can you express using PIFOs?
What can't you express?
PIFOs can't express algorithms where dequeues from a particular flow are limited to a specific maximum rate. While PIFOs can express a maximum flow rate limit before packets are enqueued into a PIFO, they can't do so on the dequeue side. Section 3.5 of the paper explains this.
How practical is a PIFO in hardware?
We implemented a PIFO in Verilog. Our hardware design exploits the fact that most schedulers schedule across flows, not packets. For these schedulers, a packet's priority in a PIFO increases monotonically within a flow. Exploiting this fact, we sort only the head packets of all flows, and store the remaining packets in a bank of first-in first-out queues (FIFOs).
Our hardware implementation is a sorted array of these head packets. An incoming packet from a new flow is inserted in the right location by comparing its priority in parallel to all head packets and then shifting the array to accomodate the new packet. When the PIFO is dequeued, we shift out the top element of the array, fetch the next packet of the just-dequeued flow from the bank of FIFOs, and then insert this next packet into the array.
Our current implementation supports 1024 flows within a PIFO that stores up to 60000 elements. 5 such "PIFO blocks" can implement a 5-level hierarchical scheduler and cost less than 4% in chip area. An ideal hardware implementation would support as many flows as the number of PIFO elements: in the extreme, each PIFO element belongs to a distinct flow. We can currently support up to 2048 flows, so we are still a ways off from 60000 elements/flows. However, we think this can be optimized further.
This work is by
Sachin Katti, and
For questions/comments, please e-mail
anirudh at csail dot mit dot