Introducing Changes in a Large UC Deployment - The need for lower environment testing

Intro

Most customers who have a Cisco UC deployment in production either have a functioning lab, had a functioning lab, or have wanted one. The need for a lab environment shouldn’t require too much debate. Especially if you are a part of the group of folks (like me) who have been working with Cisco UC solutions going back to the days when the solution was called AVVID. Back then, everything required exhaustive testing before going into production because it was “bleeding edge”. A term that for me now means “riddled with bugs” or “pain in the a**” or “my wife isn’t talking to me this week”.

Core components of the Cisco UC solution are way more stable then they used to be, and I don’t feel the need to try and stress test every software release, firmware release, or configuration the way I had to in the past. That being said, I still think customers should have some sort of lab for testing.

I also think that in larger environments, there is a need to have a small community of end users (preferably across various business units) running on a production system that is used as the “pilot” system for new features and configuration changes.

The Lab

Feature and functionality testing in a lab is still a must. When you are first trying something out, you should do so in a lab environment that is completely segregated from production. I also think that customers should have a lab environment for testing interoperability between Cisco UC applications (like UCCX, CUPS, etc.) as well as between Cisco UC apps and 3rd party applications (like AD/LDAP, Cistera, OCS, Exchange 2007).

It is also a good idea to have a lab environment where one can test procedures for executing upgrades. Especially if you have a large portion of the Cisco UC portfolio installed in production. Doing upgrades is still a sticky proposition even though rolling out of a failed upgrade is easier with most of the modern Cisco UC applications.

This lab could have real servers or use VMWare. I do most of my initial testing on VMWare and only use physical servers when it is required (such as testing a firmware upgrade procedure for a HDD array controller).

The Lower Environment

Of the customers who do have something that resembles a lab, very few actually have what my team calls a “lower environment”. At a previous position we called it the “Alpha Cluster”. Catchy name that embodies the objective of this “lower environment” solution. The idea is simple, you build out a smaller scale model of your production UC solution in your production environment. A percentage of your user base is provisioned on this system and they use it for day-to-day operations.

In every way, the “lower environment” solution should be treated like production. Meaning, it is connected into your production network. Using the same security measures applied in production. Integrates with LDAP, uses the corporate antivirus, etc. etc. It should be subjected to the same change control policies or a special subset of those policies. The main difference is that the lower environment is where new software versions, feature sets, configurations, etc. pop up after they are researched and vetted in the lab.

The Process

In this operational model, a customer would still do the necessary research such as reading release notes, talking to their Cisco account teams/Cisco partner team, etc. After doing their homework, then the customer would test the application, feature, software version, etc. in their lab.

Once the customer is satisfied with the research and lab tests, then they would load the software, add the application, or update the configuration in the lower environment before finally introducing the change in “full” production. So, for those that like lists, the high level milestones for introducing a new software version/feature/etc. into your production environment could be:

1. Research:

Yes, this means you have to read. Depending on the nature of your change you may have to read quite a bit.

2. Lab Test:

Configure your lab appropriately and test the change you are considering for production. Your change control policy should mandate a certain degree of lab testing before you unleash your change on those innocent people we call “end users”.

3. Lower Environment:

Introduce your changes into your lower environment. Your change control policy should mandate this. You should also have a change control procedure where you coordinate with end users and support staff as you normally would. You should plan on having a significant gap between the time you introduce a change in your lower environment and the time you roll that change into production.

This “bake in” period should be used to support the following activities:

Your operational/help desk staff need time to get familiar with what has changed
Your operational/help desk staff needs to incorporate changes in their support triage processes/work flows
If you use a help desk tool like Remedy or something similar, the admin team that supports that tool may need time to incorporate the changes into their tool
For changes that introduce a deviation in work flow processes for users you need to have time to get feedback from users on the expected versus actual improvements
You need to find out what effect human error will have on your new toy
Your network management teams need to incorporate any changes in thresholds, entities, MIBs, traps, escalation, etc.
You need to monitor event logs, performance attributes, and other key elements closely during the “trial period” in the lower environment system. If you have a baseline for the lower environment (which you should) then you should see if there is any different between the baseline and the system after introducing your change

Depending on how efficient your operation is, you should be able to get through all of these tasks within a 2-3 week window. The added benefit is that the system is being used by real people this whole time. So, you should feel more confident by the time you get to a full production deployment. The worst case is you find a major problem that causes you to delay deployment in production. That is actually a blessing in disguise. It is easier to deal with a problem when 90% of your end users are unaware it ever existed.

4. Production:

Finally, after you have done your research, tested in the lab, tested in your lower environment, you are ready to roll this solution into final production. At this point, you should have a certain level of mastery and confidence about what you are doing.

You should still monitor your solution’s behavior in the week following your change. You should compare baselines, create new ones, and have an escalation plan in place just in case the whole thing goes south. As a wise man once said: plan for the worst.

Conclusion

People try to push changes into production without fully vetting them out. I know people “talk” about planning but hardly ever demonstrate the discipline that is needed to execute a plan. I also believe that if we were to take a task and let “the planner” duke it out with “the non-planner” that “the planner” will probably have a fully functional final solution completed well ahead of “the non-planner”. Unfortunately, “the non-planner” will have the milestone box checked as completed first and few people ever go back to ensure that things are working as promised.

The folks that hold the technology mind share have a responsibility to their end users. This responsibility is to deliver solutions that have been thoroughly tested. These solutions should be wrapped in a package that comes with an operational plan for supporting the new features/configs/etc. and the users efficiently.

A final word of advice here:

If you are going into your change controls for production deployment with phrases like “I hope it works” or “knock on wood” or “we’ll see” then you have problems and are living on borrowed time.

Those people we call end users expect and deserve a certain commitment to operational discipline. They make assumptions that we are delivering solutions that we have tested and know how to support. While there is always a certain degree of the “unknown” in our particular technology vertical, we should still always strive to avoid subjecting end users to “half-baked” solutions.

2 responses to “Introducing Changes in a Large UC Deployment – The need for lower environment testing”

Mark Mendonca says:

April 15, 2010 at 2:42 am

Bill,

This is a great idea I’ve been after my boss to have a lab setup for sometime now. Having a lower environment would be even better. All of our gateways are MGCP. In your lower environment how do you have this setup? Does the lower environment route calls to the same gateways as the production system?

Mark
William Bell says:

April 15, 2010 at 3:23 am

Mark,

It is fine to integrate the lower environment to the production system. Some folks may have adequate money to stand up a dedicated PRI or SIP trunk to the lower environment, but most people will not.

So a viable alternative is to set up an Intercluster Trunk between the lower environment and the production system. Configure the dial-plan so that Offnet/PSTN calls from the lower environment tandem through production to the MGCP/H323/SIP/etc. gateways. Of course, if you are using H323/SIP you could also go direct to the gateways if you prefer.

Regards,
Bill

You must be logged in to post a comment.

2 responses to “Introducing Changes in a Large UC Deployment – The need for lower environment testing”

Leave a Reply

Related Topics