Designing for Maintainability? Start with your Team.

Author
Ryan Harris
Sr. Network Engineer

I’ve had the pleasure of working at several organizations, large and small, from internal IT roles and outside consulting positions, and have noted that the cultural structure of IT architecture teams is one of the markers of success or failure. Organizations with isolated teams making architectural decisions with only their domain interests in mind trend towards having a collection of point solutions rather than a cohesive design that incorporates all aspects of the IT landscape.

What is a point solution? A point solution is a product that solves a single use case. The problem with point solutions is that they are not scalable, so when the business needs to expand or change, the company must buy another product to solve the next problem that comes along. Point solutions also have a high management overhead and come with increased costs and outages. Each of these point solutions have their own problems, licenses that vary from platform to platform, different philosophies on how network traffic should flow and where inspection or filtering should occur.

In the case of the current industry trend towards SASE product models, point solutions are the bane of both a network and security administrators’ existence. Imagine a scenario where you were deploying TLS decryption with one product, sending unencrypted data out to another appliance for URL filtering, another appliance for IPS functionality and then all back into the TLS decryption appliance to be sent back out. This type of solution is hard to manage because your team needs to manage three solutions and if there’s a problem somewhere, you’ve got to spend hours arguing with different support providers to get someone to accept blame. That’s not to mention the performance headaches that are going to arise from this type of solution.

But how does an organization arrive at a solution like this in the first place? Multiple teams (or persons) making isolated decisions one at a time to fix their immediate problem. And further, their decision to focus on buying the best possible solution to that immediate problem without consideration for how it integrates or who manages it going forward.

How do you reconcile existing point solutions with a need for a holistic approach?

Take a step back and look at the big picture. Take stock of every tool you’ve currently got in production. Take stock of where your technological gaps are. Take stock of your team and their capabilities and knowledge. You’ve no doubt got multiple products that are on varying lifecycles with different renewal and upgrade needs, some of which are immediate and some of which are in the distant future.

Work across teams to find solutions that meet multiple needs. Most products across the market meet the needs of multiple teams. The truth is that part of the problem here is that teams aren’t talking to each other when they’re making decisions.

Once you’re at a point where you’re able to see the big picture of the organization, seek out solutions that cover as many areas as possible and discard the point solutions that are left over. Understand that the sunk cost fallacy is real, and that there’s going to be some tough conversations to be had when you’re advocating for replacing a solution that’s not pulling its weight, but it cost half a million dollars a year ago and countless hours to get working.

How do you design a solution that can be reasonably managed?

The ideal solution should be able to reduce costs, secure as much of your infrastructure as possible, and simplify your life.

Avoid a best-of-breed approach and find a vendor that provides the best overall solution for your organization, especially when we’re talking about secure solutions edge (SSE) products. It is important to keep in mind that your technology stack should not only make your life easier but also the lives of the people who are using it.

So how do you approach this process?

Identify a leader with a vision

A leader with a vision is someone who has a clear idea of what’s best for the organization and can make decisions accordingly. I don’t necessarily mean a manager of the team here but someone who knows the technology at hand and understands the goals of the organization. You may not get a volunteer raising their hand and many people are reluctant to take up leadership positions because of a variety of reasons. Look for someone who knows that it is necessary to be decisive and make tough decisions to push the team and company forward.

Don’t have that person internal to your organization? Hire professionals like NetCraftsmen to help you assess your needs and formulate a plan.

Formalize Inter-team Collaboration

A lot of IT teams that I’ve been a part of were very isolated and there was little collaboration across verticals. Network engineering wasn’t talking well enough with security, who wasn’t talking well enough with server teams and so on and so forth. Most of the communication between teams was accomplished through tickets that were lobbed over the metaphorical fence at another discipline. Collaboration is the key to success in IT architecture. In some ways more important than the actual design of any individual component because it fosters a more holistic design philosophy.

In the post-COVID world, there’s a lot of people saying that they’ll never need to go back to an office again, but I take issue with the idea that your team can function in a completely isolated way. I think it’s completely obvious that daily in-office work isn’t necessary, and can even be a hindrance, but regular on-site sessions can be extremely beneficial. Workshops that get multiple teams together to talk openly about issues they’re having and ways that other teams can help solve those problems are a great team building opportunity beyond a quarterly happy hour.

Design for Automation

Avoid those one-off designs that solve the problem in front of you today and think about tomorrow.

It’s easy to make a design concession here or there but after running a large network for years, you’re likely to find that those occasional concessions build up to an unmaintainable mess that requires more time and manpower to keep running. Don’t be afraid to tell people no when they come to you asking for a one-off solution, or look back to point one and make sure your team has a leader that’s willing to tell people no.

What I’m saying here is your goals for network architecture should be a cookie cutter design, standardize on the same hardware as much as possible and be consistent in its use, reuse VLAN IDs, standardize on a single subnet size for each purpose and use it everywhere.

In a previous job, we were able to drastically reduce the number of support requests that our network team received on a weekly basis by building the tools to enable help-desk staff to resolve requests without our involvement and providing self-service tools to other IT staff. The net result was that the team was able to spend more time focusing on long-term projects and end users had their issues resolved quicker.

In another project, we deployed nearly a thousand switches using automation to provision and manage configurations. The benefits of this were that we could dictate and enforce standards across the entire network and quickly iterate on any design components that needed to change.

Simplify for maintainability and reliability

I think a lot of people might read this headline and might understand this to mean that “simple” equates to older technologies that have been around a while and are understood. It could be interpreted as me advocating against newer design trends such as VXLAN based LAN designs, SD-WAN, or cloud-based security services, but what I’m advocating for is to shift your mindset to understanding that you should optimize designs to reduce unnecessary complexity.

The classic three-tier LAN design has a lot of Band-aids applied to solve for shortcomings in the hardware in which it was developed. These are problems we no longer have. Examples abound, there’s routing, first-hop redundancy protocols, spanning-tree, EtherChannel’s, stacking and more. Then on top of those protocols there are optimizations to each protocol that can help the design be more deterministic.

By contrast, a layer 3 LAN design is surprisingly simple, from a network perspective. We can eliminate most of the additional protocols in three-tier design with just a routing protocol that, incidentally, is faster to converge. Unfortunately, this design does introduce some additional complexity with subnetting and may not work the best for networks that have applications with L2 adjacency needs. Fortunately, VXLAN networks are becoming more popular and easier to build with automation.

This is of course one example among many. I’ll leave you with the concept of “Muntzing” where you reduce the number of components in an electrical circuit down until it breaks and then add back the last part. Take that mindset forth with you and you’ll find how easy the resulting architecture is to manage and how reliable it can be.