Last week I helped a customer with firewall throughput testing that generated some interesting results. It is educational to look at the test results to understand how we determined what was happening from the data we collected.
The test configuration was a Smartbits packet tester connected to a pair of Gigabit interfaces on the firewall as shown below.
We were using the Throughput test, configured to stop if it encountered any packet loss as it increases the load in steps. The traffic was UDP, to simulate a voice and video stream, going from a single source to a single destination.
Background: The Smartbits can be configured in two modes:
- Step mode, which increases in equal steps that you define.
- Binary mode, which increases in a binary progression where the second step is half way between the first step and the maximum. Using binary mode from 10% to 100% would test at 10%, 55%, 77.5%, 88.75%, 94.37%, 97.18%, 98.59%, 99.29%, and 100%.
In the first iteration, it encountered some loss at 10% load and stopped. It only lost 13 packets, so we set the loss threshold to 0.1%. It ran through the entire test, only dropping packets at the 10% load iteration.
At first, we thought that the ARP learning process was causing packets to get dropped as the test started. But the configuration of the Smartbits and the firewall both looked good.
After experimenting with some of the Smartbits configuration parameters, we discovered that there was no packet loss at any throughput load of 4% or less. Still not totally conclusive, but it hinted that the learning process was not a problem. We did consider that at 4% load, the firewall could be buffering the first packet and that the learning could have happened before the second packet arrived.
At that point, the firewall vendor engineer who was working in the lab mentioned that the firewall was stateful and that its state timer for UDP was 60 seconds. We thought about that information a bit and found that we could configure the Smartbits to ARP on each iteration in a test run and to also wait 70 seconds on each ARP attempt. Sure enough, we saw packet loss on each iteration in the test run. This time we switched to the step mode and set the increment to 5% so we had good resolution on the expected packet loss at each step(see the Step Throughput Test figure).
We ran some additional tests in which we watched the firewall state timeout. Every instance where the state existed would run with no loss and if the state had expired there would be loss. This was good enough for us. (To be absolutely sure, we’d need to look at the packet capture data. It would be interesting to see which packets were dropped.) We did note the interesting spike in packet loss around 65-70% load, but didn’t investigate this further.
The question was now whether the loss incurred when establishing a flow was significant enough to cause a problem in the production network. At 4%, the data rate is 40Mbps on the 1Gbps interface. We aren’t aware of any single source of UDP data that uses that much bandwidth. While the aggregate of a lot of flows may exceed that figure, we are sure that we wouldn’t see a single flow using that much bandwidth in this network.
We did not run any tests with a mix of traffic, which would have been a better test, but it would take a lot more time to setup. It would be interesting to see if the creation of multiple flows at one time causes a change in the packet loss characteristics.
The key point of this story is that we had to think about the data that we were seeing and come up with a possible cause for the packet loss and then test that theory. It gave us a good idea of some of the limitations of the firewall and whether they would impact the network’s operation.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html