The Cisco TFTP Service
The Cisco Unified Communications (CUCM) solution architecture has a TFTP service that you enable on one or more servers in a given cluster. Depending on the size of your deployment, you may have this service running on the Publisher node or multiple nodes dedicated to providing supplementary services (i.e. non call processing services) to the cluster and its endpoints. I typically recommend having two TFTP services in the cluster and I prefer not to run this service on the Publisher or call processing nodes in larger environments.
The role of the Cisco TFTP service is to serve files such as firmware images to requesting services or end points. In addition the TFTP service is responsible for generating configuration and security files. For binary images that the TFTP service provides you must load or install a package containing the appropriate firmware loads. A package usually contains multiple binary images that are used by the phone. Once you load the package, all binary files are stored on the TFTP server. You can view these files from the CM platform web portal or via the command shell:
admin: file list tftp *
For device configurations, the TFTP service creates configuration files based on the data entered into the CUCM admin web portal, via the AXL/SOAP interface, or through the Bulk Administration Tool. Whichever interface is used, all roads lead to the same place – DBL Helper. The process summary follows:
1. DBL writes configuration data to the database on publisher node
2. Database information is replicated to all cluster nodes
3. Notify process informs all CUCM nodes to reset the record instance
4. DBLNotify informs the TFTP service of changed record information
The Cisco SRND for Cisco Unified Communications 7.0 provides a solid discussion on what type of performance can be expected using the TFTP protocol. The basic idea is that each client requesting a firmware file takes up a TFTP session. While consideration on server performance capabilities should come into play, the network plays a huge role in actual performance.
The first factor is the round trip time or overall network latency between the requesting endpoint and the TFTP server. As with call signaling and media processing, more latency has an overall negative impact on performance. The second factor is packet loss. TFTP operates as a lock-step protocol. Which means the sender will send one packet and wait for a response before sending another packet. There is a prescribed “wait time” (4 seconds by default) that the TFTP service uses to determine if a packet should be re-transmitted. The formula one could use to estimate the time it takes to transmit a file using TFTP is:
TransferTime = FileSize * ((RoundTripTime + ErrorRate * Timeout) / 512000)
To get a feel for what that means to you check out some of the following data points.
|Round Trip Time (ms)||5||10||5|
|Size of files (bytes)||6,300,000||6,300,000||6,300,000|
|Assumed Error Rate (percent)||1%||1%||2%|
|Timeout period (ms)||4000||4000||4000|
|File Transfer time (seconds)||553.72||615.24||1045.9|
|File Transfer time (minutes)||9.23||10.26||17.44|
Using example A as a base line we have a 5 ms round trip time (RTT) value and a 1% error rate on the network. In Example B we see that doubling the RTT to 10ms increases our transfer time by about 1 minute. In Example C, the RTT is the same as A but the error rate increases by 1%. The net result is that the total transfer time increases by approximately 8 minutes. The data in this table is for one single download instance, in reality when using the traditional TFTP approach you will have multiple phones attempting TFTP downloads across your network. If any network link gets saturated at any point during the download foray, your RTT and error rate may increase. If you find yourself in this situation, you may also see multiple phones fail to upgrade their firmware due to multiple failed attempts. Not a nice scenario at all.
In additon to the standard TFTP approach we are going to discuss two other options available to Cisco customers. Regardless of which option you use, we recommend that customers consider two operational concepts when thinking about how to deploy TFTP in their CUCM environment. The first concept is having more than one TFTP server in the cluster. The obvious benefit is redundancy but there is a load balancing opportunity here as well. By staggering Option 150 configurations in DHCP scopes you can roughly distribute the load across multiple TFTP servers.
The second concept comes into play when pushing out the firmware. If you have Option 150 assignments balanced then you can use this “balance” when putting together your implementation plan. You can break your deployed IP phones into smaller groups, organize by primary TFTP. For instance, say you have four groups of phones two of these groups use TFTP-SERVER-A and two use TFTP-SERVER-B. You can then push firmware to TFTP-SERVER-A group 1 and TFTP-SERVER-B group 1 at the same time.
The net benefit you get from the above methods depends on the size of your deployment and whether you have a cluster geographically distributed across multiple data centers. In larger deployments you probably have TFTP servers in at least two data centers and you may have redundant network paths from WAN offices to these data centers. You may also be running a routing protocol that natively load balances across equal cost paths. Take all of this into consideration when you determine your TFTP design and make sure you optimize for operations. Again, this should be done whether you are using the standard “vanilla” method for pushing firmware or one of the following.
Options – Load Server
This option has actually been around for a long time but not many people (at least folks I have worked with) are aware of it or use it. The basic idea is that the administrator can assign a TFTP server to each individual phone record using the “Load Server” parameter on the record itself. This is only used for downloading firmware to the phone. The following figure illustrates the concept.
On the surface this sounds like a great idea. This approach does have a huge upside of localizing TFTP traffic, however it does have some potential drawbacks. Fortunately, most of the downside can be controlled if the correct operational discipline is applied. Some of the areas to be aware of:
Distribution and Synchronization (build your own automation):
When you use a Load Server you are typically pointing the phone to a TFTP service that is running on their local LAN (or other optimal location). This server can be a Windows box, Linux box, or even an IOS router configured to service TFTP files. In all cases, the Load Server is not a member of the CUCM cluster and this means you have to manually copy files to all Load Servers. Of course, you can mitigate potential errors here using scripting tools that automates the process of pulling files down from one of the CUCM TFTP servers and then pushing them to the Load Servers.
Moves-Adds-Changes (make sure you audit configs):
Since each phone is assigned a Load Server there is a huge potential for misconfiguration or configurations that do not correlate to your intended design. This is because it is a completely manual process. First and foremost, you have to be aware that there is no “fall back” capability when using the Load Server options. So, if someone fat fingers the Load Server’s address on a phone record, that phone is toast. The phone will try to start a TFTP session with the Load Server and when that fails the phone doesn’t fall back to the TFTP server.
As with other config aspects for phone records, you should incorporate the Load Server parameter in your bulk provisioning plan and you should incorporate checking/validating Load Server parameters in your regular system optimization plans. Don’t have a regular system optimization plan? Well, that is a topic for another day.
Diagnostics and Troubleshooting (using different tools):
When using the Load Server option you no longer have the ability to look at the CUCM syslogs to troubleshoot any issues that crop up related to firmware downloads. The phone also doesn’t give much in the way of diagnostics for TFTP download. This is simply because the TFTP server is not part of the CUCM cluster. Depending on the application you are using for TFTP you may or may not be able to enable diagnostics tracing. If you can enable tracing/logging you definitely should and if you can push the logs to a centralized syslog server that also collects the CUCM syslogs then that is optimal. In large environments, this still gets hairy and it all comes down to making sure your operational procedures are well defined and well communicated.
Options – Peer Firmware Sharing
Peer firmware sharing is a feature that was added to Cisco phone firmware around the 8.3(1) release. Basically, phones participating in this firmware distribution model will form a peering relationship in a tree-based hierarchy. One phone will peer with up to two other phones. Once the peering relationship is established, the root phone will retrieve the upgraded firmware files from the Cisco TFTP server and then distribute the files to associated peers after the files are downloaded. The objective is to minimize dependency on the WAN during bulk firmware upgrades and to increase the overall time needed to upgrade a large phone environment. The following diagram provides a basic illustration of the process.
First, phones use UDP broadcasts to ask for a specific upgrade file. One phone assumes the root for the firmware file distribution hierarchy. Each phone then establishes a TCP connection to their respective child nodes and the distribution tree is established. Once the root completes the download for a particular file, it will copy the file to its child nodes and move to the next download file in the firmware set.
Unlike the Load Server option, if there is a failure when trying to establish a peering relationship the phone experiencing the issue will fallback to the standard TFTP methodology. It should also be noted that the peering relationship is only established between phones that use the same firmware and are on the same subnet. So, if you have a mixture of 7962, 7942, and 9971 phones on a single subnet a total of two trees will be established within the subnet. The 7942/62 phones use the same firmware build and will establish one tree while the 9971 phones will establish a separate tree.
The choice that is made in your operational environment depends on many things. First, if you haven’t already done so, build some redundancy into your TFTP solution of choice. Second, you may want to consider leveraging DHCP configurations to distribute assignment of your TFTP server. Why? Well, these methods are only referring to distribution of phone firmware and do not help with distribution of any other file that may be serviced by the TFTP server (i.e. configuration files, ring tones, etc.). Also, aside from the Load Server option, all options assume that there is still some portion of the environment downloading firmware files from the TFTP servers. Finally, if there is some issue encountered during peer firmware sharing phones will still fall back to their TFTP configuration. So, it is better to design for the worst case scenario.
Once you get past the TFTP server roles then you have to determine what your network can handle. You will need to look at this within the network models you have established (i.e. small office, medium office, large office, metro office, etc. etc.). Basically, understand what your network can or can’t accommodate. You should make sure you have a decent optimization plan in place to keep things finely tuned.
Maybe a comparison table would be nice to wrap this up.
|Legacy TFTP||Proven distribution
|High bandwidth requirements
Multiple requests for the same file
High load on TFTP servers
|Load Server||Local LAN distribution (frees up WAN)
Can distribute load over multiple TFTP servers
Minimal load on TFTP servers (config files during reload)
|Must be enabled on each phone
Admin must manually copy files to Load Servers No fallback to TFTP on failure
Prone to user/admin error during record configuration/maintenance
|Peer Firmware Sharing||Minimize WAN download to one per phone model on a subnet
Uses TCP (local peers)
Fall back to TFTP if peering tree fails
Reduced load on TFTP server
|Must be enabled on each phone
Hierarchy formed for each phone model
Hierarchy limited to a single subnet