Exposing Hidden Costs in Network Operations

Author
Peter Welcher
Architect, Operations Technical Advisor

Many IT shops are facing budgetary challenges: Support more technology with the same (or smaller) budget. What happens? This is where you enter the Twilight Zone (cue eerie theme music) of hidden costs in network operations. Let’s take a look at some common things we at NetCraftsmen see in many networks.

Problem: When funding is tight, your staff can get spread thinner and thinner. One consequence is that the workload leads to stress, and people quit.

Hidden costs: Replacing lost employees costs more. Salaries are going up, and the number of skilled network engineers with experience is not growing as quickly. You may well end up with someone less skilled, after burning time you don’t have on many interviews. And then you have to bring the new employee up to speed on your network …

Solution: Don’t burn out your staff. And do say “thank you” and provide other forms of positive feedback — “well done” certificates of appreciation, cash cards, desk toys, or whatever works.

Problem: If you are under-staffed, it can result in rushed planning, sloppy rollback plans, lack of time for testing and deploying untested changes.

Hidden costs: Mistakes, rework, and worse: outages.

Solution: Don’t ask for too much, and do create an environment in which staff can safely report that they didn’t have time for adequate prep. Forcing staff to hit deadlines, without providing the necessary preparation time and resources, will result in cut corners and outages.

Problem: Person A is working on a project but has to wait for Person B to do something before he can continue. Meanwhile, Person B is waiting for Person C to do something, but C is waiting for A, perhaps on some other project. If everyone is too busy, then every task gets queued for processing by every person involved. This is a mutual lock condition.

Hidden cost: Projects take forever to be completed.

Solution: Establish clear priorities, with everyone on the same page as to the highest priority projects.

Problem: A slight variant of this occurs when multi-person projects with rushed planning get bottlenecked on missed steps. Similarly, rushed equipment orders may need follow-up orders, as components or optics inadvertently got left out of the bill of materials. If your procurement group is slow, this just made their queuing delay greater – and it sure held up your project.

Hidden cost: Rework, delay.

Bonus hidden cost: You just doubled the project delay due to the procurement cycle. If your procurement department is slow (are there any that are not?), that’s particularly painful.

Solution: Make sure that effective planning is a clear priority (without overdoing it). It is better to take a little time up front to make sure everything is accounted for than to suffer big delays later. Your staff needs to understand that.

Problem: An overly busy staff is the equivalent of a computer that is thrashing. It gets so busy switching tasks, and swapping virtual memory off slow disk drives, that little useful work gets done. About all you can do in response is to reduce the CPU and memory workload. The same goes for network or server staff.

Managerial hidden cost: You get ulcers trying to make it all work.

Solution: Prioritize. And don’t take on more than can realistically get accomplished.

Problem: Have you seen documentation actually getting updated before or as changes are made? Good documentation, diagrams, and a set of core tables of information represent cached information that took time to develop. If staff keeps putting the documentation/diagram updates off due to being “too busy,” the consequences can be serious.

Hidden cost: New staff and consultants will require verbal documentation, which ties up existing and new staff as they must continually explain the basics of how various aspects of the network work.

Bonus hidden cost: Someone quits or gets hit by a truck, and nobody else knows how that part of the network works, or how to configure some devices.

Solution: Documentation and diagrams that are up to date can at least provide solid clues as to how something is supposed to work. This is particularly important when the “something” is either complicated or not obvious. Even better: document changes before they happen: it can help staff spot hidden complexities or problems in proposed changes.

Problem: Diagrams that have become obsolete (see also above).

Hidden cost: When networking professionals troubleshoot, they connect to devices, use CDP, and develop a pencil sketch to diagram the part of the network they’re troubleshooting. The next time something happens, they do it all over again. This can become a vicious cycle.

Very hidden cost: Your Mean Time to Repair (MTTR) is far longer due to the rework. More downtime!

Solution: Selected documentation and diagrams are pure gold, and must be maintained. Help staff understand priorities regarding what must be kept up to date — as opposed to lower priority, “nice to have” items. If those pencil sketches are still happening, have staff take an extra 30-60 minutes to re-sketch cleanly, and take a picture of the sketch. It might just save time, next time around – or help someone later develop a solid Visio diagram.

Problem: Network management platforms don’t get maintained as carefully.

Hidden cost: Murphy’s Law says the router that just died is the one that the NM tool was unable to capture a configuration for, because nobody noticed and fixed it. Or the interface you need capacity or other data on, or that is acting up, was never “managed,” so you have no historical baseline.

Solution 1: Implement scheduled verification that network management tool maintenance is getting done, and that new devices and interfaces are managed by the tools, etc. (Be sure to avoid “tool-itis”— having too many tools, which consequently are poorly maintained, so nobody uses them.) It is useful to analyze the set of network management tools. We recommend tools based on critical functions and number of staff. Stick to the base set of tools, and make sure everyone is trained in using and maintaining them. Spreading the maintenance load around helps, particularly when the designated “owner” is too busy.

Solution 2: Outsource maintenance of network management tools.

Problem: Staff doesn’t have time to keep up with new Cisco/vendor features.

Hidden cost: Staff implements something, and then discovers a two-year-old feature that would have made it easier and faster. Ensuing choice: Rework the project, or stick with what’s already deployed.

Solution: Manage skills (new product awareness, new technology awareness, and deeper skill sets). Some time must be put into developing skills, but figuring out just how much time requires a balancing act. (A related management task is ensuring you have double coverage of each technical skill, so that there is no “single staff point of failure”). If/when Murphy’s Law kicks in, you don’t want to find out your <whatever> specialist is abroad.

Problem: Changing gears slightly, there can be problems at play other than understaffing. Staff being insufficiently skilled or experienced can cause similar symptoms, as can inefficient/under-productive staff. Identifying either condition might not be easy.

A related problem: A poor or overly complex design can really exacerbate time demands. The complexity just slows everything down, indirectly making staff less productive. Most training focuses on how to configure devices, not on good design. The Cisco CCDA and CCDP certifications (and related course and books) focus on design skills. Those skills, plus experience, plus some common sense, are pre-requisites for good design.

Vendors come up with new technology. Their marketing emphasizes the benefits and value. But just because the technology is there, cool, etc., does not mean you have to deploy it. First consider the pros and cons, especially any increment in the complexity level.

Solution: A NetCraftsmen network assessment can identify problem areas and suggest design improvements, or configuration/stability improvements. We can also talk to staff and suggest training and skills improvements. Staff efficiency is harder to pin down.

Solution 2: Use NetCraftsmen for strategic planning and periodic design/migration planning review.

Well, now that we’ve commiserated about your woes, how about a positive angle. What else can be done about the problems we’ve identified here?

Possible solutions:

  • Hire more staff.
  • If your team is thrashing, shed/defer some tasks and re-prioritize.
  • Give team members clear priorities to ensure the most important things get done. Some progress goes a long way toward making everyone happier and less stressed.
  • Use NetCraftsmen to help you hire more staff. Staff with stronger skills may cost more but may get the job done more efficiently.
  • Leverage our design or managed services offerings to supplement staff skills (especially in network design) or to offload tasks.
  • Outsource some tasks, services, etc.

The book, The Phoenix Project, is interesting reading concerning IT operations. We like the term “accrued technical debt.” Generally speaking, every time a project is rushed and corners are cut, it is like going into debt. Hidden cost: You end up paying “interest” on that debt, in the form of unplanned downtime and maintenance. The book also has some interesting observations about queuing and resource bottlenecks, and how that affects time to completion. If you cut enough corners, staff will not have any time available for projects!

Are there costs lurking within your own network operations that you may not be aware of? Contact us to start a conversation about how to find out.

Comments

Comments are welcome, both in agreement or informative disagreement with the above. Thanks in advance!

Hashtags: #HiddenCosts, #NetworkOperations, #NetworkInfrastructure, #NetOps

Twitter: @pjwelcher

Disclosure Statement

Cisco Champion 2014 Cisco Certified 15 Years

Leave a Reply