EtherChannel, MAC Persistency, and the 3850 Switch Stack

Author
Carole Warner Reece
Architect

I have been thinking about the interaction between LACP system IDs, port-channels on switch stacks, and virtual port-channels on Nexus gear. I saw some ‘interesting’ behaviour last week when helping a customer update the software on his 3850 stack. His stack was connected to a vPC port-channel on a pair of Nexus 7000s. We saw a stack event taking down the entire port channel when the active stack member failed over to the standby member. (This event did not impact the vPC peer link, or the SVIs on the VLANs on the port-channel.)

I am happy to report that the new stack software (CAT3K_CAA-UNIVERSALK9-M) Version 03.02.02.SE) is pretty solid, because when I tried to replicate the issue this weekend in a maintenance window I could not break the port-channel with the 3.2.2 3850X image. I think the current port-channel behavior still is worth discussing.

Background

When a switch stack forms a port-channel, the active stack member’s MAC address is used in the LACP ID is used to identify the port-channel. Similiarly, when a Nexus 7000 or 5000 vPC pair forms a vPC port-channel, the vPC virtual system-ID is used in the LACP ID.

The behaviour of previous stack port-channel code has had issues. For example, the ASA 9.1 configuration documentation states:

The ASA does not support connecting an EtherChannel to a switch stack.

The next line of the docs somewhat explains why:

If the ASA EtherChannel is connected cross stack, and if the Master switch is powered down, then the EtherChannel connected to the remaining switch will not come up.

However, a port-channel to a 6500 VSS or a Nexus 7000 vPC is supported. There has been some discussion in the Cisco Support Community on this issue with the firewall and the 3750 switch stack: https://supportforums.cisco.com/thread/2198683

I believe I saw the EtherChannel failing to stay up on the remaining switch last week — before the software upgrade. Before the upgrade, when we failed over from the active switch member in a stack to the standby switch, the port-channel on the new active switch went down and stayed down until the previous master switch was reloaded. This was not very desirable!

Switch Stack Documentation
The 3850 and 3750 Configuring EtherChannels documentation mentions:

With LACP, the system-id uses the stack MAC address from the stack master, and if the stack master changes, the LACP system-id can change. If the LACP system-id changes, the entire EtherChannel will flap, and there will be an STP reconvergence. Use the stack-mac persistent timer command to control whether or not the stack MAC address changes during a master failover.

However, they don’t actually tell you how to set the timer in this document. The ‘Managing Switch Stacks’ guide has more information:

Use the persistent MAC address feature to set a time delay before the stack MAC address changes. During this time period, if the previous active switch rejoins the stack, the stack continues to use its MAC address as the stack MAC address, even if the switch is now a stack member and not an active switch. If the previous active switch does not rejoin the stack during this period, the switch stack takes the MAC address of the new active switch as the stack MAC address.

Default Configuration
When we looked at the issue last week, with no stack-mac persistent timer configured, the Mac persistency wait time was indefinite:

AS-01#sh swit  
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: Indefinite
                                             H/W   Current
Switch#   Role    Mac Address     Priority Version  State 
------------------------------------------------------------
*1       Active   44ad.d96c.ad00     10     V02     Ready               
 2       Standby  44ad.d912.c000     1      V02     Ready               

AS-01#

We updated the image to 3.2.2, set the timers to 8 (since the typical reload seemed to take under 7 minutes), did some testing, and it all seemed to work fine.

Testing After Software Image Update
So I  tried this weekend to replicate the issue – under the new 3.2.2 image. I removed the stack-mac persistent timer command, saved configs, forced a failover, and saw no issues with the port-channel status on the new active switch. Forced a failover back, no issues with new active switch.  With no stack-mac persistent timer command, the Mac persistency wait time was still Indefinite. (This means forever, I believe…)

I did learn how to update the RSA key on my Mac pretty quickly. If you are a Mac user, and you do testing of this sort while trying to SSH to a device, you may get a message like:

~ cwr$ ssh -l admin 10.18.2.15
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
56:6d:da:2f:b8:64:87:ad:53:b6:b8:7d:13:4d:8f:8f.
Please contact your system administrator.
Add correct host key in /Users/cwr/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/cwr/.ssh/known_hosts:225
RSA host key for 10.18.2.15 has changed and you have requested strict checking.
Host key verification failed.
~ cwr$

You can clean this up pretty easily by removing the RSA host key for the IP address:

~ cwr$ ssh-keygen -R 10.18.2.15
/Users/cwr/.ssh/known_hosts updated.
Original contents retained as /Users/cwr/.ssh/known_hosts.old
~ cwr$ ssh -l admin 10.18.2.15
The authenticity of host '10.18.2.15 (10.8.2.15)' can't be established.
RSA key fingerprint is 56:6d:da:2f:b8:64:87:ad:53:b6:b8:7d:13:4d:8f:8f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.18.2.15' (RSA) to the list of known hosts.
Password:

I did try several combinations to try and break the EtherChannel – no stack-mac persistent timer , stack-mac persistent timer 8, stack-mac persistent timer 4, and stack-mac persistent timer 1.

The good news for high availability is that the IOS appears to ignore this command when you are reloading a switch in the stack. As the previously-active switch goes through several role and state changes, the newly-active switch keeps the cross-stack port-channel up. Here is what the roles and states look like:

AS-01# sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Foreign Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 0000.0000.0000 0 0 Removed  
*2 Active 44ad.d912.c000 1 V02 Ready 


DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 44ad.d96c.ad00 10 0 Initializing 
*2 Active 44ad.d912.c000 1 V02 Ready 


DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1  Member 44ad.d96c.ad00 10 V02 Syncing  
*2 Active 44ad.d912.c000 1 V02 Ready 

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 44ad.d96c.ad00 10 V02 Ready  
*2 Active 44ad.d912.c000 1 V02 Ready 

 !! Note - not really ready, HA synch has not happened....

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Standby 44ad.d96c.ad00 10 V02 HA sync in progress 
*2 Active 44ad.d912.c000 1 V02 Ready 

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Standby 44ad.d96c.ad00 10 V02  Ready  
*2 Active 44ad.d912.c000 1 V02 Ready 

My guess based on my experiments with various sized timers is that the IOS knows there is a switch loading, and is ignoring the stack-mac persistent timer until the loading switch is up and its Mac address can be reviewed. This will help with high availability. It should remove the caveat from the ASA of ‘not supporting an EtherChannel with a switch stack…’

Summary

If you are running a cross-stack EtherChannel on a switch stack, you probably should update to 3.2.2 (or the 3750 equivalent) for improved EtherChannel high availability.

— cwr

2 responses to “EtherChannel, MAC Persistency, and the 3850 Switch Stack

  1. Great article and very helpful but i’ve got a question.
    You weren’t able to break the EtherChannel but this was all with the 3.2.2 software right? Have you tried the different commands on the older software also to confirm what you thought you saw with your customer and to confirm the bug?

    Thanks for your reply.

    Kind regards,
    Daniel

  2. Hi Daniel –
    No, I have not been able to verify the issues on older code. I don’t have access to other 3850s, so although I saw weird enough behavior that I wanted to capture screen output in a Saturday session, I could not document the perceived issues.

    Carole

Leave a Reply