Srinivas Narayana




alephtwo AT csail DOT mit DOT edu




Prior Research Experiences

I've worked on research and class projects at Princeton, and interned at Microsoft (Redmond, Washington), Google (Mountain View, California) and IBM research (Bangalore, India). I briefly describe those projects below in reverse chronological order.

Programmable network path queries

Understanding the spatial flow of traffic in a network is very beneficial for control and troubleshooting tasks. Some examples include measuring an ingress-egress traffic matrix for ISP traffic engineering, identifying source-sink flows traversing a congested (or faulty) link, and determining reachability violations after packets have been rewritten via NAT. There is no general and accurate way to perform these kinds of measurements today, as current approaches are fraught with complexities such as conflating link-level measurements with packet forwarding policy, making sense of potentially large traffic shifts---as topology and forwarding policies change---in any given measurement epoch, and inaccuracies of indirect measurements in the presence of non-invertible packet forwarding policies. Software-defined networking (SDN) offers a promising foundation to help solve this problem: a query system running on an SDN controller can directly know the existing packet forwarding policy, track changes to this policy, as well as modify it (e.g., introduce state on packets transparently) to make direct measurements of traffic possible in a general way. In this work, I'm building a measurement stack on the Pyretic SDN controller that provides a query interface for path queries, through which an application can query aggregated statistics or even single packets that satisfy packet path expressions. A query run-time system generates and tracks the Openflow rules that conflate independently-specified forwarding policy and measurement queries.

Detecting application-level denial of service attacks in Microsoft's Azure cloud network

Denial of service attacks cause significant downtime and financial damage to operational networks and services. Such attacks increasingly target application-specific resources like thread pools, server compute resources, and others. Detecting these attacks is challenging because they don't have an obviously discernible signature on the network---even if every packet is inspected, the impact is often experienced only on application-specific resources. At this internship, we used endpoint information in the form of TCP connection parameters (bytes sent/received, number of timeouts, etc.) and Windows performance counters (CPU utilization, memory pool parameters, etc.) to detect attack scenarios, as well as specific connections that are attacking. We trained a classifier that detected these attacks with high accuracy in controlled environments.

Measuring impact of network reconfiguration in centralized TE on Google's production WAN

Logical centralization of traffic engineering in a private backbone network can provide high network utilization, resulting from deterministic, fine-grained control of traffic flows. However, reconfigurations are not reflected on the data plane instantaneously, as network devices are geographically distributed, with varying reprogramming delays. At this internship, I studied the impact of distributed data plane update due to traffic engineering in Google's production wide-area network, which supports large data transfers between compute clusters. In particular, we focused on adverse effects on packet loss rates and link utilization due to pathological orderings in which different devices are updated. Simulations of worst-case update orderings with production workloads showed that links can sometimes be loaded by as much as 100\% over capacity. Yet surprisingly, an analysis of loss rates and link loads measured from counters on production switches revealed that there is little adverse impact on these metrics during network transition periods (at a measured time scale of tens of seconds). We believe that a combination of high link utilization, the presence of long-lived bulk transfer TCP flows, and switch QoS, significantly simplifies the TE design: distributed data plane update might as well be considered atomic, and need not be sequenced.

Joint data center and route selection for online services

The performance of user-facing services (e.g., Amazon, Facebook, Google search) depends heavily on which data centers handle client requests, and which wide-area paths carry the response traffic. We observe that selecting data centers and network routes independently, as is common for today's services, can lead to much worse performance or higher costs than a coordinated decision. We have designed a system that achieves a desired cost-performance tradeoff by jointly optimizing the selection of data centers and routes, while retaining the administrative separation between them. Our evaluation shows that our system converges quickly in practice and offers lower cost and much better performance than existing solutions---illustrating benefits of compatible objectives between request mapping and response routing systems, sharing information, and employing optimization models that admit optimal distributed computation.

Past projects

An Openflow controller for interoperable 802.1D Ethernet spanning tree

Openflow is rapidly being embraced by industry as a practical next step to mitigate the "ossification" of computer network stacks. To meet these goals, Openflow needs to be incrementally deployable alongside legacy network equipment---which necessitates backwards compatibility with these equipment and the protocols they implement. Today, there is some support for emulation of protocol functionality on a fully Openflow network, and backwards compatibility in the form of Openflow-capable hardware (run either in traditional L2/L3 mode or Openflow mode exclusively). However, full support for simultaneous legacy-interoperability and controller/flow-table customization is lacking. We make a modest step in this direction by implementing the Ethernet Spanning Tree Protocol (STP 802.1D) on an Openflow network. The controller runs the STP algorithm separately for every switch, generating STP packets and instructing switches to send and receive them. We have designed a modular architecture for our protocol implementation which allows the system to be easily extensible to other protocols.

Configurable line-rate traffic monitoring on a netFPGA

NetFPGA is a programmable PCI card with an on-board FPGA and GigaBit Ethernet ports. We developed a tool that implements configurable counter and field-based packet sampling on a NetFPGA, based on the PSAMP RFC (5476). The samplers can be configured by setting registers on the FPGA, which is accessible to the user through the simple command line interface of the netFPGA.

Stability of explicit congestion control protocols

Rate Control Protocol (RCP) is a transport mechanism that uses explicit rate feedback from points in the network at traffic sources to achieve small flow completion times. Our investigations on RCP stability stem from two observations---first, small-buffer variants of RCP that control queues through the mean of their distributions exhibit oscillatory behaviour inside their `stable' regions, when flow bandwidth-delay products are reduced. Second, we found parameter regions under this regime in which queue and rate instabilities occur in the presence of queue feedback, not otherwise. To explain these non-intuitive observations, we modelled the small-buffer RCP feedback loop with explicit queue evolutions, and analytically found necessary and sufficient stability conditions---whose predictions agree with our initial observations. We also characterized the observed instabilities just outside the stable region through Hopf bifurcations.

Transactions on the World Wide Telecom Web (WWTW)

WWTW (also known as the "spoken web") is a voice-driven equivalent of the WWW over the Telecom network, started as a pilot project by IBM Research India to enable developing regions leverage the benefits of the WWW through their mobile phones (which are only required to have a simple numeric keypad and voice connections). As part of my summer internship, we developed a mechanism for securing financial transactions over this medium using social trust to provide additional authentication factors. Our work was peer reviewed internally at IBM Research India and a patent application has been filed.