Building Configuration Policy Checks, Part 3

Author
Terry Slattery
Principal Architect

Implementing Changes

Once you have your network policies defined, see the prior posts Building Configuration Policy Checks, Part 1  and Building Configuration Policy Checks, Part 2, then implemented in a configuration policy check, you need a way to correct any non-compliant configurations. An obvious choice for remediation is to simply login to the offending device and type in the correct configuration statements. If you have a couple of devices to correct, that may well be the right choice, providing that you don’t make a silly mistake while entering the commands. But is there a better way?

As in most networking projects, the answer starts with “It depends…” There are tradeoffs that dictate the approach or solution for different situations. If the configuration change is consistent across all devices of a given OS, and the change needs to happen once for a number of devices, then you might consider using the NetMRI’s Ad-Hoc Batch function (or a similar mechanism in other products). It allows you to enter a set of commands to be executed on one or more devices. You then select the list of devices on which to run the commands, and NetMRI does the job for you, keeping track of the success or failure of each device’s configuration change. This works for simple changes that are the same across a group of devices.

Quite often, the configurations aren’t that easy. A change may need to be applied to different OS versions, each of which has a slightly different syntax. That’s where a script makes life easy. In NetMRI, sections of the script can have filters applied that allow that section to be executed only when the filter criteria matches. Here is an example of a script that includes commands for either IOS or CatOS. The OS type is determined by the filter criteria that’s in the braces.

Action-Commands: { $sysDescr like /IOS/ }
   conf t
   ntp server 10.1.1.200
   ntp server 10.1.2.200
   end
   wr

Action-Commands: { $sysDescr like /Catalyst Operating System/ }
   set ntp server 10.1.1.200
   sleep: 1
   set ntp server 10.1.2.200
   sleep: 1

Automating Hundreds of Changes

Correcting a few configurations is easy. Making the same correction to tens of devices is time consuming, but is still possible using cut-n-paste methods. But when hundreds of devices are involved, the task becomes challenging and boring. Assuming that it takes four minutes per config change, 100 devices would take 6.6 hours of continuous work to implement. At 1000 devices, that’s 66 hours. This level of effort means that other tasks don’t get performed or that this task takes weeks to complete.

Using a scripting tool to automate the config update process allows a config change process to complete the task very quickly. I’ve used scripts to change nearly 500 device configurations in a few hours. While it took a few hours to develop the script, I was able to implement the changes in a few hours instead of a week of work. I also knew that the changes were correct – there were no mistakes that might occur in a cut-n-paste operation.

Fixing Configuration Policy Check Failures

When a configuration policy check fails, the set of failed devices can easily be selected (at least within NetMRI) to have their configurations corrected. Even if there are only a few non-compliant devices that fail the configuration policy, using a script to correct their configurations guarantees that their configurations are consistent.

Modifying Script Behavior

Some configuration changes are very complex and require complex scripts to implement. A good configuration management system will contain a scripting mechanism that allows commands to be executed and to use the results of those commands to modify script behavior. For example, a configuration that needs to be applied to Core-Core interfaces (See Device and Interface Tagging) can first do a ‘show interfaces’ command and select only those interfaces whose description contains “TAG:Core-Core”.

The ability to execute commands is very powerful. Sometimes you need to determine the operational state of a device before making changes, such as checking neighbors (‘show cdp neighbor’) or whether a specific route exists in a routing table (‘show ip route’). The script can then modify its actions, perhaps changing the configuration or performing some other operation based on the operational data.

Simultaneous Changes

Of course, the normal procedures for performing configuration changes apply to making automated changes. In particular be careful to avoid multiple simultaneous changes, at least in Cisco gear. It is difficult to predict what will result from simultaneous changes from two or more users, so it is best to avoid it. Use a change control process that informs everyone of the time period in which you’ll be making changes so that they can avoid making changes during that time.

Script Creation and Testing

An automated script can install a bad configuration in a lot of devices very quickly, so it is valuable to create a process for testing scripts as they are created. I start with a single device of each major type (e.g. IOS or CatOS). Once I am confident that the script works for one device, I’ll run it on a few devices and check that the right commands were executed. At this step, I’ll run the script on a few different devices that run the same OS type (e.g. 3750, 6500, 7600, etc), making sure that there are no variances in IOS implementations across different device types. Once I’ve satisfied myself that the script will run correctly on all OS and model types, I run it on the remaining devices in the network. To facilitate this testing process, I create several device groups:

  1. A group for early testing and development, containing one or two devices.
  2. A group for testing multiple device models of the same OS type (3560, 3750, 6500).
  3. The remaining network devices in the network of the same OS type (this group is everything, excluding group 1 & 2).

I do development and testing using Group 1, then test against multiple hardware models using Group 2, and if that is successful, run the script on Group 3. It doesn’t take long to determine that a script is working correctly and to propagate the change across the network. Group 2’s purpose in the test is to make sure that any problems are found early and with only a subset of all devices, making it easy to back out the change, if needed.

Good luck with your network configuration compliance and automation tasks.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply