Marple: Language-directed hardware design for network performance monitoring

Marple is a language for network operators to query network performance. Modeled on familiar functional constructs like map, filter, groupby, and zip, Marple is backed by a new programmable key-value store primitive on switch hardware. The key-value store performs flexible aggregations at line rate (e.g., a moving average of queueing latencies per flow), and scales to millions of keys. We have built a Marple compiler that targets a P4-programmable software switch and a simulator for high-speed programmable switches. Marple can express switch queries that could previously run only on end hosts, while Marple queries only occupy a modest fraction of a switch’s hardware resources.

Why have a query language once you have the switch primitives?: The query language provides a way to specify the monitoring intent of the network operator. Without it, it would be challenging to program the raw capabilities of the switch primitives, similar to how it is challenging to program in assembly language. A user should think directly about what they want to measure, as opposed to how they should be collecting the necessary measurements. The latter should be the work of a compiler and run-time system. The query specification allows an operator to state complex filtering and aggregation operations on packet data at a high-level of abstraction.
Can Marple be used to track the performance of a packet as it traverses the network?: Yes. For example, in our paper, we have a query that selects packets that have a high end-to-end latency value by summing the queueing latency for each unique packet along its trajectory through the network.
How can Marple's key-value store process all packets, while existing technologies like NetFlow/IPFIX must rely on heavy sampling? Are the requirements for Marple on switches realistic?: Marple's key-value store relies on a split design involving a fast cache (in on-chip SRAM) and a slow backing store (in off-chip DRAM, either on the switch CPU or a separate server). The on-chip SRAM can process every single packet owing to its small access latency (~1ns). Software implemenations of NetFlow must sample heavily since they cannot operate at the switch's line rate. While hardware implementations of NetFlow can process every packet, they cannot keep a record of every flow, since the flow lookup table cannot insert new flows at line rate in the presence of hash collisions in their table. Marple solves precisely this problem with its cache design (i.e., inserting the new key on a cache miss) and merging (i.e., maintaining correctness for evicted values).
Can Marple be used to diagnose root causes for microbursts?

Marple

Language-directed hardware design for network performance monitoring

About Marple

The Marple query language

Switch hardware to support performance queries

Query compiler

Frequently Asked Questions

Contact