New Nexus 9K Items
I described collecting network performance data in last week’s blog Handling NMS Performance Data, Part 1. This week, I want to describe how to efficiently store the collected data. I have heard the stories about vendors who used a relational DB to store interface performance data and how those systems didn’t perform well at large scale – over 50,000 interfaces per polling engine.
Most NMS developers are actually good database developers, so they naturally prefer storing data right into a relational database. It makes their life easy because they can run SQL queries that do a lot of work for them. It is also a common interface that they can use for all their interactions with the data. But there’s a cost to taking this approach. The DB API is relatively heavy-weight because of its relational capabilities. What we have is a typical optimization tradeoff. Is the time the developers spend more important than the time the system spends handling the data? A number of NMS development efforts have had poor performance because the wrong tradeoffs were selected.
What causes the slow performance? A relational database is powerful because it allows the developer to easily create relations between data and make powerful queries against that data and its relationships. It reduces data storage in many cases because it can store metadata in one place and reference it from multiple places. In a network, the metadata might be the device’s name, its management addresses, location, etc, all referenced by a unique device ID. An interface or configuration entry in the DB can simply reference the device by its ID to get access to the higher-level meta-data about the device. One change in the meta-data is reflected immediately in all references to that data instead of having it duplicated for each interface. This is all good.
The problem occurs when high volumes of data need to be handled. The performance problem is because a relational DB needs to index the data as it is inserted into the database in order to quickly extract it. If indexing is not done, the DB read operations take longer. So there’s a performance penalty on either the inserts or the reads (which are called ‘selects’ in the SQL language). On top of the insert operation, we need to add DB logging, which is similar to real-time backups (most DBs will allow the log to be played back from a known checkpoint in order to bring a DB back up to date in case of a system crash). Even though the log may be (and should be) on a different disk than the DB itself, the DB uses memory and CPU to perform the logging. The ease of use comes with a price.
Is there an alternative? Yes. All NMS systems roll up the collected data over longer time intervals, typically an hour. The roll-up calculations are typically to record values such as MIN, MAX, AVG, and 95th Percentile. These are the values that are used in performance thresholding, error rate thresholds, trend analysis, and correlation. Keep the collected data that is required for the roll-up period in an in-memory cache (memory is inexpensive these days, so use it to optimize system performance). An efficient data structure will allow very rapid access to the data in the cache. The roll-up data is created from the cache and stored in the DB. This approach allows the power of the relational DB to be applied to the summaries, which is what is normally done. The raw data in the cache is then written directly into the filesystem, using an on-disk data structure that makes it easy to access the raw data.
Why does this work well? In normal use, the raw data is rarely accessed. It is used to create the roll-up summary data that is used for network performance trending. The network staff typically examines only a few interfaces each day, so the best case is to optimize the raw data storage mechanism. The result is a big performance boost over using the DB to store raw data.
What are the advantages of this approach?
Using these techniques, an NMS can increase its data collection performance and decrease its database storage requirements. The end result is an increase in overall system performance, which can be applied to making the UI run faster. And that’s a good thing.
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.