Next: Lessons Learned
Up: Utility of the Monitor
Previous: LSAG in Day-to-day Operations
9.1.2 Use of OSPFScan for Detailed Analysis
In this section, we touch on ways in which the OSPFScan has been
used for analyzing long-term behavior of OSPF.
For both the networks where the monitor is deployed,
in addition to archiving all LSAs,
we also archive topology snapshots and
LSAG message logs.
Furthermore, we use the OSPFScan to extract change LSAs,
topology change records and to compute routing tables
for each router, grouped by 24-hour intervals.
All this data (raw and change LSAs, topology change records,
routing tables, topology snapshots, and LSAG message logs)
forms the data repository for the OSPFScan analysis.
Although there is a redundancy
(raw LSAs are sufficient to construct all other forms of data),
we have found that keeping the derived data greatly assists
interactive analysis of OSPF behavior.
To illustrate, suppose a user is interested in analyzing how the
path between two end-points evolved over time.
It is much faster to automatically compute paths between
two end-points using the routing table data than
to construct the paths from raw LSAs.
Specific illustrations of the OSPFScan usage include:
- 1.
- Duplicate LSA analysis:
The LSA traffic analysis in the enterprise network
by the OSPFScan [7] revealed excessive
duplicate LSA traffic.
For some OSPF areas, the duplicate LSA traffic
formed 33% of the overall LSA traffic.
Subsequent analysis led to the root-cause of the excessive traffic
and preventative measures, details of which can be found
in [7].
- 2.
- Change LSA statistics:
The SPF calculation on Cisco routers is paced by
two timers [20]:
(i) spf-delay, which specifies how long
OSPF waits between receiving a topology change
and starting an SPF computation; and
(ii) spf-holdtime, which determines the lag
between two successive SPF computations.
In order to reduce OSPF convergence time, it is desirable
to decrease these timers to small values; however,
reducing these values too much can lock the routers into performing
excessive SPF calculations, possibly destabilizing the network.
Analysis of the inter-arrival time of change LSAs in the network
can help administrators configure these timers to ``good'' values.
The network administrators of the ISP network have done precisely this.
To facilitate the process,
we built a web-site on top of the change LSA repository,
providing statistics such as minimum, maximum, mean, standard deviation
and empirical CDF of inter-arrival times of change LSAs over a given
time period and for a given LSA type.
- 3.
- Availability analysis:
Assessing reliability and availability of intra-domain routing
is crucial for deploying new services and associated service
assurances into the network.
OSPF monitor data has proved very useful
in answering questions such as:
what is the mean down-time and mean service-time
for links and routers in the network at the IP level?
Again, we created a web-site to answer such questions for the ISP
network.
The site relies on the topology change records stored in the repository.
- 4.
- Use of OSPF routing tables:
For each router, the routing table archive
contains the entire history of routing tables across the measurement
interval (e.g., several months or longer).
This data is being used by the ISP network engineering teams
to determine and analyze end-to-end paths
within the network at any instance of time,
to correlate OSPF routing changes with I-BGP updates
seen in the network [18], and
to analyze how OSPF events impact the traffic flow within
the network by correlating this data with active probing.
Next: Lessons Learned
Up: Utility of the Monitor
Previous: LSAG in Day-to-day Operations
aman shaikh
2004-02-07