Network Validation with pyATS

TL;DR: pyATS is an automation testing framework that includes a parsing library called Genie. With over 1500 parsers available, Genie can parse device output from multiple vendors, including Cisco, Juniper, and BIG-IP. In combination with pyATS, you have a complete test suite that can provide confidence your network is running healthy.

Have you ever been asked by your manager, “Can we confirm feature ‘X’ is configured and working across every device in our network?” This may be a simple feature such as SNMPv3, or something more complex like a specific routing design. Validating the operation of your network at any point in time can be a difficult task. However, we can ease that pain with an automation testing framework, such as pyATS.

Cisco pyATS is an automation testing framework that Cisco has built and used internally to test/validate features in their NOS platforms. It was released to the public in 2014. The core pyATS framework is still closed-source, as the code is not publicly available. However, the companion parsing library, Genie (you may also hear it called pyATS library), has been open-sourced and encourages the public to contribute. This post will focus on the pyATS framework.

pyATS OVERVIEW

The pyATS framework is expansive and can seem daunting with the number of features it provides. The goal of this blog post is to stay at the “10,000 ft view”. There may be further blog posts that dive deeper into the individual features.

PyATS is a test automation framework, designed to create and run consistent tests against your infrastructure. The companion parsing library I mentioned earlier, Genie, is designed to be used with pyATS. Together, these two libraries create a complete test suite, with a testing framework and vendor-agnostic data parsers. The Genie library can be a whole separate blog post, but the one takeaway is that it’s a powerful data parsing library that couples very well with pyATS, but it is separate from pyATS.

pyATS COMPONENTS

Now that you know that pyATS is used for building and running tests against your infrastructure, let’s dive into some of the components that make up the library.

TESTBED

The testbed is essentially your device inventory file. Other automation frameworks, such as Nornir and Ansible, have a similar concept. A testbed file is a YAML file that describes the devices you are running tests against. Some of the important components include the device hostname, IP address, OS type (used to dictate which Genie parsers to use), and a set of credentials to connect to the device. There are plenty more data points you can include in the testbed, such as how to connect to the device (CLI, YANG, REST). You can even describe the testing environment’s topology by defining device interfaces and how the devices connect to one another by defining links within your testbed file. If you’re interested in learning more about the different data points, check out the links in the references at the end of the post.

TESTCASE

A testcase is a collection of smaller tests, aiming to validate a specific feature or functionality. For example, you may write a testcase to validate that BGP is up and operational. This testcase may include smaller tests that validate BGP neighbor relationships, BGP routes are present in the routing table, etc. The individual test results roll up to the testcase result. If you’d like to learn more about testcases and other sections that make up a testscript, check out the links in the references at the end of the post.

TESTSCRIPT

A testscript is a Python file used to structure test sections. Each testscript has its own reporting and logging. Testscripts are meant to be extensible, so that you can add testcases in the future. A testscript can be executed as a standalone script, with results printed to STDOUT, or as part of a job. Standalone execution is popular for rapid development but should be executed as part of a job once it’s ready for production use.

JOB

Jobs in pyATS allow you to run multiple testscripts. Within a job, each testscript is executed as a task. Each task aggregates its logs and results to a single log file and reporter object. The logging and reporting mechanisms within a job can be a separate post. For now, just know that a task’s logs and results are aggregated when being run within a job. After a job is run, an archive is created. An archive is a zipped folder containing results files (XML/JSON formats), log files, and some additional runtime information. These archives can be useful for further results analysis.

There are many more components that make up pyATS, but these are some of the important pieces. For more information about the other components, I highly recommend checking out the pyATS documentation (link in the references).

USE CASES

To get your creativity flowing, let’s take a look at a few use cases that would be great fits for utilizing pyATS.

Certifying a new network OS version
Validating operational state of the network before/after a change
Running intrusive tests to ensure network resilience

This list is not exhaustive and only used for demonstration. Let’s take a quick look at each one.

CERTIFYING A NETWORK OS VERSION

One of the worst things that can happen when you are upgrading devices on your network to a new version of software is running into a bug. This bug may be obvious or rooted deep in the OS and only triggered when a specific feature is configured. Regardless, management and other stakeholders do not care that a bug was triggered. They want to know why it wasn’t caught before rolling out OS upgrades to production devices. PyATS can provide a level of certainty that a new OS version works with the specific hardware and software features you have configured in your network. The pyATS testing framework can configure the features you care about, validate each feature’s functionality, and cleanup after testing has completed. It’s an automated approach that can quickly become a de facto process before a new OS is rolled out to production.

VALIDATING OPERATIONAL STATE OF THE NETWORK BEFORE/AFTER A CHANGE

Validating changes on the network has been an issue as old as time. It’s a part of every engineer’s change plan but can sometimes be forgotten when it comes to documentation. PyATS provides the framework to confirm the operational state of your network and has built-in reporting functionality for you to quickly figure out what validation checks have passed or failed. PyATS also provides extensive logging that captures device logs, so you will be able to provide all the proper documentation that shows you confirmed the change was successful.

RUNNING INTRUSIVE TESTS TO ENSURE NETWORK RESILIENCE

I would consider this to be a more advanced use case, and one that shouldn’t be attempted until you have buy-in from higher levels of management. Once you are comfortable with running read-only tests against your network, then you can begin introducing a little bit of “chaos”. The proper name for this type of testing is “chaos engineering”. Netflix became popular for utilizing this practice through a tool they built called Chaos Monkey (https://netflix.github.io/chaosmonkey/). The idea is that random configuration is pushed to your production environment to ensure the infrastructure is resilient to failures. This random configuration may include shutting down BGP on a core router or rebooting a few devices. Whatever the chaos may be, the idea is to purposely cause failures within the infrastructure. You may be asking yourself, “Why should I perform such a cruel act against myself and my team?”. Well, the intention is for you to gain exposure to the faults within your network (and fix them!) before a real catastrophic failure occurs.

Disclaimer: I’ve never experienced chaos engineering in a production environment. I included this use case to help showcase what’s possible once you’ve gained confidence in your infrastructure, using automation.

SAMPLE CODE

In December 2022, I held an internal tech talk at NetCraftsmen about writing pyATS testscripts. In the demo, I built a pyATS testscript that contains a testcase for testing BGP. The BGP testcase contains the following tests: check for established BGP neighbors, shut down BGP by shutting the WAN interface, check the routing table for received BGP routes (should be none), reactivate BGP, and finally, check the routing table again for received BGP routes (should see BGP routes). The purpose of this demo was to show how we can check BGP functionality using ‘show’ commands, while changing the test environment.

Here’s a link to the code repository: https://github.com/dannywade/20221215-pyATS-Testscripts

Feel free to open a Github issue to ask questions or provide feedback.

WRAPPING UP

We went over a lot in this blog post, and I definitely didn’t cover all the features of pyATS. The purpose of this post was to touch on some of the high-level concepts within the pyATS framework, and hopefully get you thinking about how you can introduce automated network testing into your environment. If you’re interested in automated network testing and not sure where to start, please feel free to contact us and we can get you started!

REFERENCES/BACKGROUND READING

pyATS documentation: https://pubhub.devnetcloud.com/media/pyats/docs/overview/index.html
pyATS testbed/topology: https://pubhub.devnetcloud.com/media/pyats/docs/topology/introduction.html
pyATS testscript structure: https://pubhub.devnetcloud.com/media/pyats/docs/aetest/structure.html#script-structure
Chaos Engineering: https://www.infoworld.com/article/3543233/what-is-chaos-monkey-chaos-engineering-explained.html
Chaos Monkey at Netflix: https://netflixtechblog.com/netflix-chaos-monkey-upgraded-1d679429be5d
Blog series on pyATS/Genie: https://devnetdan.com/2021/05/04/pyats-and-genie-part-1/

Related Topics