Sponsor: National Science Foundation (CNS-0830961)
PI:
Eytan Modiano; Dates: 9/2008 – 8/2012
Project overview
Modern communication networks are constructed using a layered
approach, with one or more electronic layers (e.g., IP, ATM, SONET) built on
top of an optical fiber network. This multitude of layers is used in order to
simplify network design and operations and to enable efficient sharing of
network resources. However, this layering also gives rise to certain
inefficiencies and interoperability issues. In this project we focus on the
impact of layering on network survivability, i.e., protection and restoration
in the face of link failures.
Networks often rely on the electronic layers to provide most
protection and restoration services. However, in a layered network, the
protection mechanisms provided at the electronic layer may not be robust in the
face of failures in the underlying optical layer. For example, SONET networks
typically provide protection against single link failures using a ring network
architecture, and protection in general ÒmeshÓ networks (e.g., ATM, WDM) is
typically provided using disjoint paths. However, even electronic topologies
that are designed to be tolerant of single link failures, once they are
embedded on the physical (e.g., fiber) topology, may no longer be survivable to
single physical (fiber) link failures. This is because the failure of a single
fiber link can lead to the failure of multiple links in the electronic
topology, which may subsequently leave the electronic topology disconnected.
Thus, network survivability mechanisms often cannot provide their claimed level
of protection and restoration, when embedded on a physical topology.
I. Metrics for layered network survivability
In [1] we show that standard survivability metrics, such as
graph connectivity, have different meaning in the context of layered network
architectures. Thus, we developed new metrics that capture the interactions
across layers. These metrics will allow us to compare the survivability
performance of different network architectures, and provide insights into the
design of cross-layer network architectures that are highly survivable.
In particualr, we introduce the Min Cross Layer Cut as a new
metric for measuring the survivability of layered networks. Min Cross Layer Cut
is a natural extension of the single layer min-cut metric and corresponds to
the minimum number of physical fiber cuts that would disconnect the logical
topology of the layered network. We
show that in the layered setting, fundamental theorems such as the
max-flow-min-cut theorem no longer hold.
We then develop new mechanisms for routing the logical links of the network
on the physical toplogy so as to maximize the Min Cross Layer Cut. This allows
the resulting cross-layer architecture to be resilient to failures across
layers.
II. Reliability in layered networks with probabilistic
failures
In [2] we develop diverse routing schemes for dealing with
multiple, possibly correlated, failures. While disjoint path protection can
effectively deal with isolated single link failures, recovering from multiple
failures is not guaranteed. In particular, events such as natural disasters or
intentional attacks can lead to multiple correlated failures, for which
recovery mechanisms are not well understood. We take a probabilistic view of
network failures where multiple failure events can occur simultaneously, and
develop algorithms for finding diverse routes with minimum joint failure
probability. Moreover, we develop a novel Probabilistic Shared Risk Link Group
(PSRLG) framework for modeling correlated failures. In this context, we
formulate the problem of finding two paths with minimum joint failure
probability as an Integer Non-Linear Program (INLP), and develop approximations
and linear relaxations that can find nearly optimal solutions in most cases.
In [3] we consider network reliability in layered networks
where the lower layer experiences random link failures. In layered networks,
each failure at the lower layer may lead to multiple failures at the upper
layer. We generalize the classical polynomial expression for network
reliability to the multi-layer setting. Using random sampling techniques, we
develop polynomial time approximation algorithms for the failure polynomial.
Our approach gives an approximate expression for reliability as a function of
the link failure probability, eliminating the need to resample for different
values of the failure probability. Furthermore, it gives insight on how the
routings of the logical topology on the physical topology impact network
reliability. We show that maximizing the min cut of the (layered) network
maximizes reliability in the low failure probability regime. Based on this
observation, we develop algorithms for routing the logical topology to maximize
reliability.
In [4] we consider the impact of random failures that are geographically
correlated on network connectivity. Fiber-optic networks are vulnerable to
natural disasters, such as tornadoes or earthquakes, as well as to physical failures,
such as an anchor cutting underwater fiber cables. Such real-world events occur
in specific geographical locations and disrupt specific parts of the network.
Therefore, the geography of the network determines the effect of physical
events on the networkÕs connectivity and capacity. In [4] we developed tools to analyze
network reliability after a ÔrandomÕ geographic disaster. In particular, we
consider disasters that take the form of a ÔrandomÕ line in a plane. Using
results from geometric probability, we are able to calculate some network performance
metrics to such a disaster in polynomial time. In particular, we can evaluate
average two-terminal reliability in polynomial time under ÔrandomÕ line-cuts.
This is in contrast to the case of independent link failures for which there
exists no known polynomial time algorithm to calculate this reliability metric.
Our novel approach provides a promising new direction for modeling and
designing networks to lessen the effects of geographical disasters or attacks.
In [5] we consider the problem of survivability in multilayer
networks. Our objective is to find a set of paths for a given source-destination
pair that will survive any single link failure. In single layered networks, a
pair of disjoint paths can be used to provide protection. However, disjoint
paths may not exist in layered networks. In [5] we study the problem of finding
paths that may not be disjoint but together will survive any single failure. First, we address the problem of finding
the minimum number of survivable paths. Then, we focus on two versions of this problem:
one where the length of a path is restricted, and another where the number of
paths sharing a fiber is restricted. We prove that in general, finding the
minimum survivable paths set is NP-hard. However, both restricted versions of
the problem can be solved in polynomial time. We present several Integer Linear
Programming (ILP) formulations, and develop heuristics and approximation
algorithms. We also consider the
problem of finding a set of survivable paths that uses the minimum number of
fibers. We show that this problem is NP-hard in general, and develop heuristics
and approximation algorithms with provable approximation bounds.
In [6] we developed a scheme in which a dedicated backup
network is designed to provide protection from random link failures. Upon a
link failure in the primary network, traffic is rerouted through a preplanned
path in the backup network. We introduce a novel approach for dealing with
random link failures, in which probabilistic survivability guarantees are
provided to limit capacity over-provisioning. We show that the optimal backup routing
strategy in this respect depends on the reliability of the primary network.
Specifically, as primary links become less likely to fail, the optimal backup
networks employ more resource sharing amongst backup paths. We apply results
from the field of robust optimization to formulate an ILP for the design and
capacity provisioning of these backup networks. We also propose a simulated
annealing heuristic to solve this problem for large-scale networks. We use extensive simulation results to
verify our analysis and approach, and demonstrate significant capacity savings
through the use of the robust optimization approach. In [7] we apply a similar approach
to networks where the traffic demand is stochastic, and develop capacity
allocation scheme, using robust optimization, that is robust to random
fluctuation in the end-user demand.
IV.
Enhancing reliability through lightpath routing
In [8] we study
the reliability maximization problem in WDM networks with random link failures.
Reliability in these networks is defined as the probability that the logical
network is connected, and it is determined by the underlying lightpath routing
and the link failure probability. We show that in general the optimal lightpath
routing depends on the link failure probability, and characterize the
properties of lightpath routings that maximize the reliability in different
failure probability regimes. In
particular, we show that in the low failure probability regime, maximizing the Òcross-layerÓ
min cut of the (layered) network maximizes reliability, whereas in the high
failure probability regime, minimizing the spanning tree of the network
maximizes reliability. Motivated by these results, we develop lightpath routing
algorithms for reliability maximization.
V. Providing protection using recovery domains
In [10] we consider the problem of providing network
protection that guarantees the maximum amount of time that flow can be
interrupted after a failure. This is in contrast to schemes that offer no
recovery time guarantees, such as IP rerouting, or the prevalent local recovery
scheme of Fast ReRoute, which often over-provisions resources to meet recovery
time constraints.
To meet these recovery time guarantees, we provide a novel and
flexible solution by partitioning the network into failure independent Òrecovery
domainsÓ, where within each domain, the maximum amount of time to recover from
a failure is guaranteed. We develop an optimal solution in the form of an MILP
for both the case when backup capacity can and cannot be shared. This approach
provides protection with guaranteed recovery times using up to 45% less
protection resources than local recovery. In a related work [9] we also
develop schemes that provide availability guarantees to different sessions by
proper choice of the recovery path, and the spare capacity allocation.
VI. Participants
Prof. Eytan Modiano, PI
Dr. Hyang-Won Lee, Postdoc
Dr. Kayi Lee, Graduate
student, now at Google Research
Dr. Sebastian Neumayer, Graduate
student, now at MIT Lincoln Laboratory
Marzieh Parandehgheibi, Graduate
student
Matt Johnston, Graduate student
Greg Kuperman, Graduate
student
Publications
[1] Kayi Lee and Eytan Modiano, ÒCross-Layer Survivability in
WDM-based Networks,Ó IEEE Infocom, Rio De Janeiro, April 2009. Extended version in IEEE/ACM
Transactions on Networking, 2011.
[2] Hyang-Won Lee and Eytan Modiano, ÒDiverse Routing in
Networks with Probabilistic Failures,Ó IEEE Infocom, Rio De Janeiro, April
2009. Extended version in IEEE/ACM
Transaction on Networking, December, 2010.
[3] Kayi Lee, Hyang-Won Lee and Eytan Modiano, ÒReliability
in Layered Networks with Random Link Failures,Ó, IEEE Infocom, San Diego, March
2010. Extended version in IEEE/ACM
Transactions on Networking, 2011.
[4] Sebastian Neumayer and Eytan Modiano, ÒNetwork Reliability with Geographically Correlated Failures,Ó IEEE INFOCOM 2010, San Diego, CA. Extended version in IEEE Transactions on
Networking, 2012.
[5] Marzieh Parandehgheibi, Hyang-Won Lee and Eytan Modiano,ÓSurvivable
Paths in Multilayer Networks,Ó in
Conference on Information Science and Systems, March, 2012.
[6] Matthew Johnston, Hyang-Won Lee, Eytan Modiano, ÒA
Robust Optimization Approach to Backup Network Design with Random Failures,Ó
IEEE Infocom, Shanghai, China, April 2011.
[7] M. Johnston, H.W. Lee, E. Modiano, ÒRobust
Network Design for Stochastic Traffic Demands,Ó IEEE
Globecom, Next Generation Networking Symposium, Houston, TX, December 2011.
[8] Kayi Lee, Hyang-Won Lee and Eytan Modiano, ÒMaximizing
Reliability in WDM Networks through Lightpath Routing,Ó IEEE Globecom,
December, 2011.
[9] Greg Kuperman, Eytan Modiano, Aradhana Narula-Tam, ÒNetwork
Protection with Multiple Availability Guarantees,Ó IEEE ICC workshop on New
Trends in Optical Networks Survivability, June 2012.
[10] Greg Kuperman, Eytan Modiano ÒNetwork Protection with
Guaranteed Recovery Times using Recovery Domains,Ó Submitted to IEEE Infocom 2013.