At NetCraftsmen, we’ve been doing network assessments and network designs on a regular basis for some pretty large customers. NetMRI has become a strategic component of our assessment process.
While there are multiple facets to a good assessment, I’m going to focus this article on understanding the customer’s existing network. There are numerous problems with manual methods to do network discovery and network assessment. They involve human error, take a long time, and require a smart network engineer to perform a lot of repetitive tasks (increasing the possibility of human error). As the network increases in size, the task becomes increasingly difficult. It becomes nearly impossible for the network engineer to provide significant value in the assessment when much of the time is spent collecting basic information.
NetMRI automates the process of network discovery, data collection, and analysis of the collected data, allowing us to focus on the assessment and providing value to our customers. We set it up, get it started, and perform other tasks while it discovers network devices, archives configurations, and collects operational data. After 2-3 days of data collection, NetMRI will have enough data for us to perform an assessment (we often let it run for a week so that we have a week of operational data). We then examine the following data:
- Physical 
- Inventory of devices and identify those that are End-of-Life or End-of-Sale
- OS versions and whether there are multiple versions per hardware platform
- Environmental data such as power supplies, fans, temperature
- Interfaces in up/down state (router interfaces and switch trunking interfaces)
- Duplex mismatch, and ports running at 100/Half, which is an unusual configuration
- Interface errors and discards
- Interfaces with high utilization
 
- Layer 2 (Switching) 
- Large VLANS (STP domains) spanning many switches – vulnerable to STP loops
- Number of VLANs and their naming and numbering
- Root bridge selection in each VLAN
- STP loop protection mechanisms – BPDUGuard, LoopGuard, RootGuard, etc
 
- Layer 3 (Routing)
- Identify routing domains, indicating routing complexity
- Routing protocols in use and which devices are running them
- Routers that are originating default routes, which are possible black holes
- HSRP/VRRP/GLBP routers with no peer router, compromising redundancy
- Subnets with no edge devices or subnets that are nearly full
- IP addressing
 
- Configurations 
- Basic service consistency (syslog, snmp, AAA, NTP)
- QoS configuration consistency
- Running configs that are not saved
- Repository to grep for various things
 
We use the data to learn how consistently the network has been maintained, what types of latent problems exist, whether it follows best practice designs, and where it can be improved. Consistency of operations tells how well the network is run. Large VLANs without STP loop protection mechanisms in place tells us that current design practices are not in use.
NetMRI allows us to focus our efforts on helping the customer understand risks of the current network design and how operations can be improved. The result of a network assessment for the customer is that they can improve their network, starting with the greatest risk items. As the network design and operations improves, there is less downtime, which translates into greater business productivity – and that’s what helps our customers be more productive.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

