I just HAD to write this blog, given the title idea!
More seriously, several themes in networking and automation are coming together: the overall troubleshooting picture is getting less fuzzy?
This blog aims to share some of that new thinking and excitement, including basics about tools that may help you in your networking work, especially troubleshooting.
Design Driven Network Assurance
The starting point is the idea of SSOT: a Single Source of Truth that many have picked up on.
Roughly stated: SSOT means you start with NetBox or some such tool documenting what’s in your network, where it is, cabling, etc. Then, use that as the basis for deployment automation. Deployment starts by putting design info into the SSOT and automating from there (ideally).
What’s new was extremely well-articulated by Jeremy Schulman in a NANOG talk about an ongoing project for Major League Baseball. See the link below to watch it.
My summary of some key takeaways:
- As part of the design, you should specify validation tests to confirm proper deployment.
- Automation can do the deployment configuration part, but people/hands do things like cabling. People can make mistakes.
- Both aspects matter and need to be done correctly. Validation confirms that for deployment – so you don’t have to go back and fix hidden errors later. (E.g. two uplinks, one connected wrong. Or a missing/misconfigured routing peering. Only discovered when the correctly built component fails.)
- If you have design validation tests, they can also tell you on the fly when something is broken.
I’m going to call this “DDNA” for the duration of this blog. Yet Another Acronym!
One key benefit is that this catches silly deployment mistakes that bite you later. Another is that it detects changes from a working state and pinpoints key details when there is a deviation from the validated design, such as a link failure.
The potential is to make troubleshooting the basic connectivity and route peering problems much easier and faster. Those may well be the majority of outages and problems, so that’s useful! (YMMV) Which addresses the question about how much work it would take. I see it as another instance of “you can pay me now or you can pay me later.”
Watch the video, there’s a lot more to be said, and Jeremy says it much better (and even more enthusiastically) than I can.
How Does DDNA Fit In?
Well, DDNA clearly takes work and discipline up front, although deployment can be done incrementally. The basic approach as I understand it is to have an expanding region of the network that was deployed via DDNA, or where DDNA was retrofitted and validated.
As far as fit, the testing up front makes sure things get built correctly, which is a pretty good win, in terms of not having to track down errors in deployment. With ongoing testing, the automation system will detect changes that break one or more of the validation criteria, giving you an easily resolved trouble ticket.
I’ve recently blogged about AIOps (and there’s likely another coming). The DDNA approach is a bit more brute-force, but complements AIOps nicely. I see AIOps as potentially detecting performance problems indicated by telemetry: too much traffic, delays/errors on links or in app connections, etc. And with AIOps correlation, it may well be able to correlate app or security issues with network events, including those detected by the DDNA validation.
In short, DDNA solves the basic buildout side of things, and AIOps may help address the other Hard-To-Solve failure modes.
The pyATS tool was apparently built by Cisco (among other things) for code validation, e.g., for validating new features or bug fixes in their products in various labs.
For a bit more detail than I intend to provide here, see Dan Wade’s blog about it (link below), and his other blogs and videos about it (google “devnetdan pyats” and see link below).
There are also lots of other web resources for pyATS.
My first impression of PyATS (well before Dan) was that it was huge. That remains my impression: pyATS provides a *lot* of pre-built test coverage, perhaps 1000s of things. It has some multi-vendor capabilities (presumably to support interoperability testing). It is a large, complicated tool.
Relevance: pyATS connects to lab (or production) gear, collects and parses data, and compares it to a desired state. That sounds like what Jeremy is doing for MLB. No, I don’t know whether he uses pyATS or something else.
For what it’s worth, BlueAlly/NetCraftsmen’s Dan Wade and Cisco’s John Capobianco have a Pearson/Cisco Press book contract for a book about pyATS. The book should be available in 2024.
NUTS stands for Network Unit Testing System. It is a pyATS alternative.
The short version is that NUTS uses Nornir and the Pytest test framework, driving the testing via “test bundles” written in YAML. I interpret that as less Python coding, if not little Python coding, with testing specified via YAML rather than code.
At present, it “feels” to me like NUTS is lighter weight than pyATS: fewer dependencies a smaller suite of code overall. Maybe NUTS will be Dan and John’s next book – no pressure, guys!
Jeremy Schulman re MLB “Design Driven Network Assurance”: design and buildout validation test-driven methodology: https://www.youtube.com/watch?v=lIn_Cu5vYEw
D. Wade re pyATS: https://netcraftsmen.com/network-validation-with-pyats/
More D. Wade on pyATS: https://google.com/search?q=devnetdan%20pyats
PyPi pyATS: https://pypi.org/project/pyats/
pyATS home page: https://developer.cisco.com/pyats/
NUTS PyPi site: https://pypi.org/project/nuts/
NUTS on GitHub: https://github.com/network-unit-testing-system/nuts
NUTS Documentation: https://nuts.readthedocs.io/en/latest/
Coming pyATS test-driven automation book: https://twitter.com/John_Capobianco/status/1685762798247145472
Thanks to Dan Wade for making me aware of NUTS!
I refer you to him if you have any related coding questions (NOT kidding).
Let’s start a conversation! Contact us to see how NetCraftsmen experts can help with your complex challenges.