Fixing problems at the right OSI layer

Author
Terry Slattery
Principal Architect

Have you ever used the OSI model to aid your troubleshooting?  I’ve been able to use it to help me isolate the causes of problems and focus my troubleshooting to solve problems quickly.
Many years ago I encountered a problem that has become a good interview question.  I was at a prospect’s site and they were having network problems with connectivity at site A (see the network diagram below).  A CSU/DSU had died at site C on the 1.1.1.1 interface a few days prior and after replacing the CSU/DSU, they had not been able to get the link working.  We were at site A (1.1.1.2).

troubleshooting-layers

I had the engineer working on it do a ‘show interfaces‘, which showed the interface as up/up.  Since the link was showing an operational up status, I knew that it was passing HDLC keepalives.  So both the physical and datalink layers were working.  The problem had to be at layer 3.  But pings failed to the next hop router.  The next step was to determine why.

I had the customer enable ‘debug ip packet‘ on the interface.  Sure enough a packet soon arrived and the debug output showed that the source address was 2.2.2.1.  No wonder pings didn’t work.  The other end of the link was in a different subnet!  How did the packet originate from 2.2.21?  Well, in the haste of replacing the CSU/DSU, the technician had unplugged the links for both site A and site B, probably because they were not well labeled.  The CSU/DSU was replaced, but then the technician had to reconnect both CSU/DSUs to the right interface connectors.  Without labeling, he had a 50% chance of getting it wrong.  Sure enough, he connected it backwards, so neigher site A nor site B were able to pass traffic.

Using OSI layering allowed me to quickly identify that the problem was at Layer 3 and focus my troubleshooting at that layer.

This brings me to a more recent example, which has yet to be finally resolved.  I was reviewing the issues that NetMRI identified at a customer site and found a router that was reporting over 50,000 TTL exceeded messages in a day.  A few hundred TTL Exceeded messages might be the result of traceroute tests, particularly if there are automated traceroues being done.  But tens of thousands is certainly a sign that something has created a routing loop.  Strangely, no users had complained.

I enabled ‘debug ip icmp‘ to see what IP addresses were involved.  Debugging icmp is typically not a big deal because these messages tend to be low volume. I took the step of minimizing logging load (no logging console and logging buffered) to reduce the debugging load on the production router.

Note to MIB developers: if you add a counter for packet errors like TTL Exceeded or Destination Unreachable or port unreachable, please also create a set of variables to keep the values of important information like addresses, protocols, and port numbers so that the management station can report the systems that have been causing the problem.  Just keeping the last value would be very valuable.  Even better would be a small round-robin cache of the values from the last 4 or 8 packets.

One of the network engineers looked at the problem and found that the root bridge of the spanning tree for that subnet was not properly specified.  Once the root bridge was corrected, the volume of TTL Exceeded messages went down.  Hmmm.  Fix a problem at Layer 2 and the Layer 3 problem disappears.  That means that a future change in Layer 2 may cause the Layer 3 problem to resurface.  So the real Layer 3 problem has not been identified and corrected.  I eventually want to go back to this problem and determine the real cause and the correct fix.  I’m sure it will be interesting and I’ll learn something from it.

Good luck with your next troubleshooting task.  Use the OSI model to help solve it faster.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.