Protecting a Cisco Catalyst 6500 Switch Against Layer 2 Loops

NetCraftsmen®

Introduction

Should you be concerned about a Layer 2 loop in your campus network? My answer is absolutely. In this article, I explain why you should take steps in protecting your campus network. This article also provides practical steps in reducing the effects of a Layer 2 loop on a Cisco Catalyst 6500. Before delving into details, let’s first define what a Layer 2 loop is. A Layer 2 loop occurs in a campus network when more than one Layer 2 forwarding path exists between two given switches. In this scenario, a switch that receives a broadcast frame sends it to all its trunk ports and access ports (same VLAN). In the presence of a loop, when campus switches forward broadcast frames to all their ports, this creates an amplification phenomenon for broadcast frames trapped indefinitely within the loop. This phenomenon is also known as a broadcast storm. It leads to an exhaustion of bandwidth, and CPU overutilization due to the presence of large volumes of broadcast frames. A broadcast storm brings a network to an unusable state, and in certain cases network administrators may lose the capability to access devices by console.

The Spanning Tree Protocol (STP) was designed to ensure loop-free Layer 2 topologies. Despite the use of STP, some situations can create Layer 2 loops such as wiring mistakes, misconfigured hosts (bridged interfaces), switch configuration mistakes, and loss of BPDU keepalives.

You may have a good campus network design, but human errors are still possible. It is critical to look beyond what can be prevented, and ask yourself the following question. If a Layer 2 loop were to occur, would my campus network be able to sustain it? If the answer to this question is no, or if you are not sure, I encourage you to explore a suggested solution presented in this article to mitigate the impact of a Layer 2 loop.

I got involved in exploring solutions to protect the Catalyst 6500 because one of our customers had experienced Layer 2 issues, and they wanted to leverage Cisco existing tools to alleviate the effects of Layer 2 loops on their Cisco Catalyst 6500 distribution switches. Their campus network consists of a three-layer model with Layer 2 connections between access and distribution switches, and Layer 3 connections between distribution and core switches.

This article presents the steps I took to develop a solution to protect the Catalyst 6500 in the presence of a Layer 2 loop. These steps include the following:

How to Simulate a Layer 2 Loop

Four switches were used in a lab environment to simulate a Layer 2 loop, as illustrated in the figure below. VLAN 10 was created as a user VLAN to simulate communication between two users’ PCs.

 

The configuration shown below was implemented on all switches to create a Layer 2 loop, specifically a spanning tree loop on VLAN 10.

Distribution-Sw1(config)no spanning-tree VLAN 10.

Within one minute after issuing the command above, the CPU utilization had risen to 99% for all four switches. To display the current CPU utilization, the command “show processes cpu” was used, as shown below.

Distribution-Sw1# show processes cpu sorted | exclude 0.00%__0.00%__0.00%
CPU utilization for five seconds: 99%/92%; one minute: 99%; five minutes: 99%
PID Runtime(ms)   Invoked   uSecs    5Sec    1Min    5Min    TTY Process
------------------------    Output omitted   ----------------------------------------------------

With the CPU reaching 99%, routing adjacencies were flapping, and communication within the user VLAN 10 between PC1 and PC2 was no longer possible. A set of measurements were taken and the results are shown on the figure below.

To develop a good understanding of the type of traffic responsible for high CPU utilization, it was essential to monitor the traffic processed by the control plane.

How to Monitor the Control Plane Traffic

The figure below illustrates the monitoring context for analyzing traffic that is processed by the Catalyst 6500 control plane. What is needed is a Switched Port Analyzer (SPAN), a host running a network protocol analyzer, and an Ethernet cable.

Control Plane Traffic Monitoring using SPAN

The configuration shown below defines port Gigabit 5/24 as a spanned port for CPU traffic. The traffic processed by the Route Processor (RP) and the Switch Processor (SP) is duplicated on Gigabit 5/24.

Distribution-Sw1(config)#monitor session 2 type local
Distribution-Sw1(config-mon-local)#source cpu rp
Distribution-Sw1(config-mon-local)#source cpu sp
Distribution-Sw1(config-mon-local)#destination interface gigabit 5/24

The configuration shown below checks the status of a created SPAN.

Distribution-Sw1# show monitor session 2
Session 2
---------
Type                   : Local Session
Status                 : Admin Disabled
Egress SPAN Replication State:
Operational mode       : -
Configured mode        : -

It is important to note that after the CPU SPAN is created, it defaults to “Admin Disabled”. To make it operational a “no shut” command is needed on the SPAN.

Distribution-Sw1(config)#monitor session 2
Distribution-Sw1(config-mon-local)#no shut

After the “no shut” command is issued, the SPAN is put in Admin Enabled mode as shown in the output below.

Distribution-Sw1# show monitor session 2
Session 2
---------
Type                   : Local Session
Status                 : Admin Enabled
Source Ports           :
Both               : rp,sp
Destination Ports      : Gi5/24
Egress SPAN Replication State:
Operational mode       : Centralized
Configured mode        : Centralized (default)

Now that the Layer 2 loop had been simulated, and that the traffic processed by the control plane was being monitored, the next step was to select what tools, or combination of tools could be used to mitigate the impact of the Layer 2 loop. The possible tools considered for the mitigation were:

  • Control plane policing
  • Storm control
  • Hardware rate limiting

Control Plane Policing

Control Plane Policing is a feature in Cisco routers and switches that enable administrators to configure QoS policies to protect the control plane against reconnaissance, denial-of-service (DoS) attacks, and other scenarios that can lead to exhaustion of CPU resources.

To limit the impact on the switches CPU, control plane policing was applied to limit traffic that could potentially affect CPU utilization and to protect other traffic such as routing against CPU resource starvation. Control plane policing uses the same class-map and policy-map commands that you may be familiar with when you configure Quality of Service. The process of creating and applying control plane policing consists of the following three steps:

  1. Use class-map to classify traffic processed by the control plane
  2. Use policy-map to apply policing on classified traffic
  3. Apply policy-map to the control plane

The traffic targeted for classification included: EIGRP, HSRP, SSH, SNMP, TACACS, DHCP, IGMP, and PIM. I created the following class-map:

class-map match-all class-eigrp
   match access-group name EIGRP

class-map match-all class-mgmt
   match access-group name MGMT

class-map match-all class-hsrp
   match access-group name HSRP

class-map match-all class-pim
   match access-group name PIM

class-map match-all class-igmp
   match access-group name IGMP

class-map match-all class-dhcp
   match access-group name DHCP


ip access-list extended EIGRP
   permit eigrp any host 224.0.0.10

ip access-list extended HSRP
   permit udp any host 224.0.0.2 eq 1985
   permit udp any host 224.0.0.102 eq 1985

ip access-list extended PIM
   permit pim any 224.0.0.0 0.0.0.255

ip access-list extended IGMP
   permit igmp any 224.0.0.0 31.255.255.255

ip access-list extended MGMT
   permit tcp any any tacacs
   permit tcp any any eq 22
   permit udp any any eq snmp
   permit icmp any any

ip access-list extended DHCP
   permit udp any eq bootpc any eq bootps
   permit udp any eq bootps any eq bootpc
   permit udp any eq bootps any eq bootps

I created a policy-map named copp-policy to apply policy restriction to traffic processed by the control place. As an example, for dhcp traffic I created the following policy.

class  class-dhcp
   police 32000 conform-action transmit exceed-action drop

In the policy statement above, the switch processes DHCP traffic up to a threshold of 32000 bits/sec, but any excess above this threshold is dropped by switch.

The values used for policing were based on measurement of actual traffic in the operational network(using the SPAN port). Below is a complete of list of policing statements applied to classified traffic.

policy-map copp-policy
   class  class-eigrp
      police 32000 conform-action transmit exceed-action transmit ! protection of EIGRP traffic
   
   class  class-hsrp
      police 32000 conform-action transmit exceed-action transmit ! protection of HSRP traffic

   class  class-mgmt
      police 512000 conform-action transmit exceed-action drop

   class  class-pim
      police 32000 conform-action transmit exceed-action drop

   class  class-igmp
      police 100000 conform-action transmit exceed-action drop

   class  class-dhcp
      police 32000 conform-action transmit exceed-action drop

   class  class-default
      police 2000000 conform-action transmit exceed-action drop

After the creation of the policy map, I applied it to the control plane as shown below

control-plane
   service-policy input  copp-policy

After applying the policy map to distribution switch control planes, the CPU utilization was reduced from 99% to an average of 92% as shown in the figure below.

 

With control policing applied, communication was now possible within user VLAN 10 between PC1 and PC2. Despite the CPU utilization reduction, intermittent packet losses were observed.

To alleviate further the impact of the Layer 2 loop, the next step was to use an additional tool. Given the nature of Layer 2 loops, large volumes of broadcast and multicast traffic get amplified, as explained earlier, and storm control was a logical selection for the tool to use.

Storm Control

Traffic storm control is a feature in Cisco switches that can be used to monitor broadcast, multicast, and unicast traffic levels entering a given interface over a 1-second interval. Traffic gets dropped during the monitoring interval when configured thresholds are exceeded.

After observing the amount of multicast and broadcast traffic in the operational network (using the SPAN port) it was estimated that 5% of the total bandwidth was sufficient to accommodate all broadcast and multicast traffic. The following configuration was applied to all switch trunk ports to set the storm control threshold to 5%.

Distribution-Sw1(config)#int  range f0/21 - 24
Distribution-Sw1(config-if)#storm-control multicast level  5
Distribution-Sw1(config-if)#storm-control broadcast level  5

Combining control policing with storm control resulted in a significant improvement of the CPU utilization as shown in the figure below.

 

Communication within the user VLAN 10 between PC1 and PC2 became normal with no packet losses. Although the CPU was less than 40% on average, CPU spikes for up to 95% were observed for short duration of time. An observation of the traffic processed by distribution switch control planes revealed large amount of BPDU, VTP, STP, and ARP traffic. To reduce the volume of this traffic, the hardware rate-limiting tool available on the Catalyst 6500 was selected as the next improvement step.

Hardware Rate Limiting

The Cisco Catalyst 6500 switches with Supervisor 720 or Supervisor 32 engine provide hardware rate limiters that can be used to limit specific Layer 2 and Layer 3 traffic that are processed by the control plane. The advantage of hardware rate limiters is that their use does not impact CPU utilization.

The following configuration was implemented on distribution switches. The selected parameters were based on measurement of actual traffic in the operational network (using the SPAN port).

Distribution-Sw1(config)#mls rate-limit layer2 pdu 620 10
Distribution-Sw1(config)#mls qos protocol ARP police 64000 2000

The command “mls rate-limit layer2 pdu 620 10” rate limits Layer 2 PDU protocol packets (including BPDUs, DTP, PAgP, CDP, STP, and VTP packets) to a maximum of 620 packets per second with a burst of 10 packets per seconds.

The command “mls qos protocol ARP police 64000 2000” rate limits ARP packets to a maximum of 64000 bps with a burst of 2000 bps

By combining control policing, storm control, and hardware rate-limiting, the CPU utilization was reduced to an average of 5% with a maximum peak of 28% as shown in the figure below.

 

With such a reduction of CPU utilization, no perceptible impact was observed on the performance of switches and the user VLAN 10 despite the presence of the Layer 2 loop.

Results from All Scenarios

The figure below depicts the results from all scenarios. This diagram provides a representation that enables a visual comparison of the results achieved for these scenarios.

 

Design Recommendations

The following are some basic recommendations that can help reduce the risks of occurrence of Layer 2 loops:

  • Limit VLANs to a single wiring closet, whenever possible.
  • Use UniDirectional Link Detection (UDLD) aggressive mode to prevent occurrence of spanning tree loops as a result of unidirectional links.
  • As applicable, use BGPU guard, loop guard, and root guard switch features to protect against undesirable changes in the spanning tree topology.
  • Disable the use of Dynamic Trunking Protocol (DTP) by using the “no negotiate” command on switch ports to prevent automatic negotiation of trunks and access ports.
  • Set unused ports to an undefined VLAN, and set the administrative mode to shutdown to prevent unauthorized users to connect devices to unused ports.

Conclusion

The results that I have presented show that by combining control plane policing, storm control, and hardware rate limiters, significant and consistent reduction of CPU utilization can be achieved in the presence of Layer 2 loops. Configuration parameters used to define control plane policing, storm control, and hardware rate limiters were based on a 72-hour measurement of traffic processed by switch control planes in the operational network using the SPAN port. These results are based on tests performed in lab settings, but the conclusions reached are representative of what can be achieved in operational environments.

Even in a well-designed campus network, Layer 2 loops may occur due to wiring mistakes, misconfigured hosts (bridged interfaces), switch configuration mistakes, and loss of BPDU keepalives. By implementing mitigating tools presented in this article, you can prevent the Catalyst 6500 from being overwhelmed in the presence of a Layer 2 loop.

You should follow Cisco design guidelines in campus network design to minimize risks of spanning tree loop occurrence. One important goal in the campus network design is to minimize, as much as possible, the span of broadcast domains by reducing VLANs to a single wiring closet, and using high-speed Layer 3 switching for the core layer instead Layer 2 switching. By reducing the scope of broadcast domains within a campus, a shorter diameter can be obtained for spanning-tree, which inherently reduces the risk and scope of Layer 2 loops.

References

1. Protecting the Cisco Catalyst 6500 Series Switches Against Denial-Of-Service Attacks

http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod_white_paper0900aecd802ca5d6.html

2. Configuring Control Plane Policing (CoPP)

http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_553261.html

3. Protecting Cisco Catalyst 6500 Series Switches Using Control Plane Policing, Hardware Rate Limiting, and Access-Control Lists

http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_553261.html

4. Configuring Traffic-Storm Control

http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SXF/native/configuration/guide/storm.pdf

 

 

Leave a Reply