This blog is a quick summary of some port-related tidbits I’ve run across recently.
Port Security Sticky
We recently updated a Dev lab data center with new Cisco 6509 switches, Sup720, 10 Gbps links (800+ physical servers, 1000+ Virtual Machines). Aka “the first stage Nortel-ectomy”.
Subsequently, Operations asked us to turn on port security to discourage server admins from moving Ethernet cables to other switch ports. Some of the server admins at the site clearly don’t understand that the switch ports may well not be in the same VLAN, let alone have other settings matching the server. I thought “OK, I’ve mostly seen port security used for closet switches, but that sounds like a reasonable goal. Maintenance could be a bit of work, but …”.
After some negotiation, the consensus was to use port security sticky, with a fairly high MAC number to accommodate VM flexibility on VMware ESX servers. Something like:
switchport port-security mac-address sticky
switchport port-security maximum 50
A couple of servers subsequently had problems. In most if not all cases, they were dual-homed to Cisco switches in a VSS pair.
Testing suggested the problem. When a MAC address has been learned on one port, you cannot then use that MAC on another port. So if you’re thinking that port security just controls the number and which MAC addresses can be used on a port, well, that’s half of it.
The other half is that the learned MAC addresses cannot be used on any other port. Which the Cisco reference guide doesn’t really say. Although if you think about it, that is actually an even more effective form of what we thought we intended.
The other half of the problem? (Have you figured it out?)
A few of the servers were set up for active – passive failure form of teaming. Connected to VSS switch pairs. So when active fails, the MAC shifts to the passive half of the “teamed” interface, and appears on the other switch. But since the VSS pair is logically one switch, that’s like moving a cable from one port to another. Bzzzzt! Port security event, packets dropped or port shutdown.
Conclusion: avoid “sticky” with active – passive link pairs on single or VSS switch pairs.
What Could Possibly Go Wrong?
I have also been having fun trying to think through the various ways servers or VMware (or other hypervisors’ networking components) could go awry, apropos of mis-communication while cutting over to new switches in the Production data center. That is probably a good exercise of your basic switch understanding.
Some of the servers we’re dealing with go to Nortel switch pairs that do “SMLT” — think of it as like Cisco VSS (or vice versa). They do EtherChannel (well, “teaming”) to dual chassis. The EtherChannel is currently hard-coded on. Yes, I’d prefer LACP with negotiation. A strong lesson learned is “make minimal changes, so when something doesn’t work, you have some idea that it is say re-patching that caused it, as compared to one of several things that all changed”.
In moving to a new switch, what can go wrong? Consider VMware ESX servers with 4 NIC cards. They could be:
(1) Mis-cabled so that two EtherChannel ports go to non-EtherChannel ports on a single or VSS switch (or cabled correctly but to switch ports not set up for EtherChannel)
(2) Mis-cabled or mis-configured so active-passive ports go to EtherChanneled ports on the switch
(3) Have trunking VLANs mis-matched with those allowed on switch trunk ports
(4) Have trunking where the switch is not configured for trunking
(5) The usual address / subnet / default gateway sorts of errors, also speed, duplex, and wrong access VLAN on port the server is cabled to
In cases (3) – (5), the problem is likely to show up as “can’t ping default gateway”, either for the physical chassis or for some or all VM’s on one physical chassis.
If you think about case (2), there probably isn’t a problem. Only one side is active at a time, and packets that go into most servers don’t check back out, unless someone deliberately enabled bridging on the server (rare, but has happened to one of our staff: STP loop via administrator doing server bridging!). If bridging is enabled, the Cisco switch will see its own BPDU back on the EtherChannel, and errdisable.
The one that seems the most interesting (to me, anyway) is (1). What is the problem with it? Well, if the server end load-balances like Cisco switches do, probably none. Since frames with a given source MAC probably only use one link or the other. But what if the server does some form of alternate-link or round-robin EtherChannel, for load balancing (rather than load sharing)? You then might have a source MAC address appearing on one port, then another, on the same physical or VSS switch. If that happens, a lot, the switch is probably going to be using some CPU capacity, unless the MAC learning is hardware-based, as in the 6500.
See also “Common Causes of Slow IntraVLAN and InterVLAN Connectivity in Campus Switch Networks“, at http://www.cisco.com/en/US/tech/tk389/tk689/technologies_tech_note09186a00801f9eb3.shtml.
It looks like the final answer to this one requires a lab with a packet generator that can fire off rapid frames with same source MAC alternating between two links.
What’s your answer?
Helping Operations Out
I mentioned that we’d been thrown some trouble tickets to report back on, containing only an SNMP ifIndex. It is kind of hard to figure out which interface is the problematic one without an SNMP tool. I mentioned this to our President, David Yarashus, and he came right back with the command illustrated below. I had never really explored this branch of show commands, since my first forays into it proved less than exciting. (And since the freeware GetIf software solved most of my “quick SNMP” needs, at least in the days before people started locking it down with ACLs.)
Async0/0/0: Ifindex = 4
FastEthernet0/0: Ifindex = 1
FastEthernet0/0-mpls layer: Ifindex = 15
Loopback0: Ifindex = 14
Null0: Ifindex = 3
Tunnel2: Ifindex = 12
Tunnel2-mpls layer: Ifindex = 13
FastEthernet0/1: Ifindex = 2
FastEthernet0/1-mpls layer: Ifindex = 20
Async0/0/1: Ifindex = 5
Async0/0/2: Ifindex = 6
Async0/0/3: Ifindex = 7
Async0/0/4: Ifindex = 8
Async0/0/5: Ifindex = 9
Async0/0/6: Ifindex = 10
Async0/0/7: Ifindex = 11
FastEthernet0/0.2-802.1Q vLAN subif: Ifindex = 16
FastEthernet0/0.3-802.1Q vLAN subif: Ifindex = 17
FastEthernet0/0.17-802.1Q vLAN subif: Ifindex = 18
FastEthernet0/0.77-802.1Q vLAN subif: Ifindex = 19