Click here to request your free 14-day trial of Cisco Umbrella through NetCraftsmen today!

10/12
2009
Terry Slattery

Handling NMS Performance Data, Part 2

I described collecting network performance data in last week’s blog Handling NMS Performance Data, Part 1.  This week, I want to describe how to efficiently store the collected data.  I have heard the stories about vendors who used a relational DB to store interface performance data and how those systems didn’t perform well at large scale – over 50,000 interfaces per polling engine.

Most NMS developers are actually good database developers, so they naturally prefer storing data right into a relational database.  It makes their life easy because they can run SQL queries that do a lot of work for them.  It is also a common interface that they can use for all their interactions with the data.  But there’s a cost to taking this approach.  The DB API is relatively heavy-weight because of its relational capabilities.  What we have is a typical optimization tradeoff.  Is the time the developers spend more important than the time the system spends handling the data?  A number of NMS development efforts have had poor performance because the wrong tradeoffs were selected.

What causes the slow performance?  A relational database is powerful because it allows the developer to easily create relations between data and make powerful queries against that data and its relationships.  It reduces data storage in many cases because it can store metadata in one place and reference it from multiple places.  In a network, the metadata might be the device’s name, its management addresses, location, etc, all referenced by a unique device ID.  An interface or configuration entry in the DB can simply reference the device by its ID to get access to the higher-level meta-data about the device.  One change in the meta-data is reflected immediately in all references to that data instead of having it duplicated for each interface.  This is all good.

The problem occurs when high volumes of data need to be handled.  The performance problem is because a relational DB needs to index the data as it is inserted into the database in order to quickly extract it.  If indexing is not done, the DB read operations take longer.  So there’s a performance penalty on either the inserts or the reads (which are called ‘selects’ in the SQL language).  On top of the insert operation, we need to add DB logging, which is similar to real-time backups (most DBs will allow the log to be played back from a known checkpoint in order to bring a DB back up to date in case of a system crash).  Even though the log may be (and should be) on a different disk than the DB itself, the DB uses memory and CPU to perform the logging.  The ease of use comes with a price.

Is there an alternative?  Yes.  All NMS systems roll up the collected data over longer time intervals, typically an hour.  The roll-up calculations are typically to record values such as MIN, MAX, AVG, and 95th Percentile.  These are the values that are used in performance thresholding, error rate thresholds, trend analysis, and correlation.  Keep the collected data that is required for the roll-up period in an in-memory cache (memory is inexpensive these days, so use it to optimize system performance).  An efficient data structure will allow very rapid access to the data in the cache.  The roll-up data is created from the cache and stored in the DB.  This approach allows the power of the relational DB to be applied to the summaries, which is what is normally done.  The raw data in the cache is then written directly into the filesystem, using an on-disk data structure that makes it easy to access the raw data.

Why does this work well?  In normal use, the raw data is rarely accessed.  It is used to create the roll-up summary data that is used for network performance trending.  The network staff typically examines only a few interfaces each day, so the best case is to optimize the raw data storage mechanism.  The result is a big performance boost over using the DB to store raw data.

What are the advantages of this approach?

  • Reduced database storage requirements.
  • Improved database performance.
  • Less contention for database resources and disk I/O.
  • Raw data is more efficiently stored.
  • Historical raw data can be easily moved to a SAN for long-term storage.
  • Detailed displays of performance data is easily performed as long as the raw data is easily accessed.
  • Micro sampling of specific interfaces can be done without a major impact on the polling engine.
  • Remote collectors can perform the periodic roll-up calculations and forward only the required data to the NMS analysis engine.  Or, even better, keep all the data locally and have the central analysis system download rules to the polling engine where preliminary identification can be performed, matching those interfaces against a given criteria.

Using these techniques, an NMS can increase its data collection performance and decrease its database storage requirements.  The end result is an increase in overall system performance, which can be applied to making the UI run faster.  And that’s a good thing.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo
Terry Slattery

Terry Slattery

Principal Architect

Terry Slattery is a Principal Architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs. Terry is currently working on network management, SDN, business strategy consulting, and interesting legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, which became a Cisco premier training and consulting partner. At Chesapeake, he co-invented and patented the v-LAB system to provide hands-on access to real hardware for the hands-on component of internetwork training classes. Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect and Interop. He currently blogs at TechTarget, No Jitter and our very own NetCraftsmen.

View more Posts

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.