Device Registration and VCS Clustering
Introduction and Background
Introduction
When running a VCS cluster, some consideration needs to be given to the endpoint configuration parameters for client registration since providing a fault-tolerant solution for client registration and re-registration should be a critical element of any production VCS design. This article discusses several approaches for managing SIP and H.323 device registrations to a clustered VCS. (I previously provided some general information on the Cisco VCS Clustering Configuration.)
This article covers the following areas:
- Background
- Example Environment
- Leveraging DNS
- Endpoint Registration Approach: H.323
- Endpoint Registration Approach: SIP
- Other Considerations
A Cisco VCS cluster can contain up to 6 member peers. From a call routing perspective, a VCS cluster acts a a single Local Zone. In a fully provisioned cluster, only 4 VCS peers can be used for active registrations. The 2 remaining VCS peers are intended for failover. Endpoints and IPVC service providers can register to any VCS peer by IP address or hostname. Providing a redundant mechanism for handling client registration requests should be a critical element of any production design. Equally important is the ability for registered clients to handle failure events that impact communication between the client and the attached VCS cluster member.
To facilitate our discussion we will lay out an example configuration. Then we will go into design options related to client registrations. Since the Cisco VCS can be a H.323 gatekeeper and a SIP proxy/registrar server we shall look at each standard independently.
For this discussion, we are focusing on VCS version X6.
The discussion in this article is based on the following example VCS configurations.
VCS Control (Internal):
The VCS Control cluster is the primary call processing solution for endpoints and IPVC infrastructure resources (e.g. MCU, ISDN GW) deployed on the internal or corporate network.
Role | Hostname | IP Address |
VCS Master Peer | vcsc01.netcraftsmen.net | 10.1.1.32 |
VCS Member Peer | vcsc02.netcraftsmen.net | 10.2.2.32 |
The VCS Control cluster name is: vcsc.netcraftsmen.net
VCS Expressway (External):
The VCS Expressway cluster is used as the registration point for endpoints deployed outside of the organization’s autonomous network (e.g. the Internet). The Expressway can also peer with call processing systems in other autonomous domains. The Expressway is also used to facilitate firewall traversal between the VCS Control and Expressway.
Role | Hostname | IP Address |
VCS Master Peer | vcse01.netcraftsmen.net | 192.168.1.32 |
VCS Member Peer | vcse02.netcraftsmen.net | 192.168.1.33 |
The VCS Control cluster name is: vcsc.netcraftsmen.net
Methodology
Diverse Methodology
In my humble opinion, if you have a diverse environment with different endpoint types (vendor or version) then you will need to leverage multiple techniques to ensure registration fault tolerance.
You will also need to pay attention to the nuances between devices or clients from the same manufacturer. For instance, the E20 leverages Name Authority Pointer (NAPTR) queries to determine whether it is “internal” or “external”. Movi, on the other hand, has two separate entries for an “internal” and “external” SIP proxy.
A key component of the design approaches outlined in this article is DNS. For a design to address the needs of different end point models and versions it must leverage DNS SRV Resource Records (RRs), DNS host (“A”) RRs, and DNS round-robin techniques. For some devices Name Authority Pointers (NAPTR) RRs may be required. Finally, leveraging a “split dns” approach may help optimize client registrations for clients such as Cisco Movi.
DNS SRV Resource Records
RFC 2782 defines the DNS SRV resource record. A SRV record takes the form of:
_service._proto.name TTL class SRV priority weight port target
- service: The symbolic name of the hosted service (e.g. sip)
- proto: The transport used to access the hosted service (i.e. tcp or udp)
- name: The domain name or FQDN identifying the domain of the hosted service
- ttl: The “time to live” field (standard DNS TTL)
- class: The standard DNS class field. For SRV RRs this is always “IN”
- priority: The priority for the target host associated with the SRV RR. A lower priority value means “more preferred”
- weight: The weighted value for a target host. Leveraged when target hosts have equal priority value. This is a relative weight.
- port: The TCP or UDP port number used to access the hosted service
- target: The DNS A or AAA record de-referencing the IP address of the target host (NOTE: CNAME records are not permitted)
Wikipedia provides a decent discussion on DNS SRV records here.
H.323 and DNS SRV
H.323v5 Annex O extended the H.323 standard protocol stack to support DNS procedures for gatekeeper discovery. This is useful when statically provisioning endpoints with gatekeeper information. Leveraging mechanisms available in DNS SRV records allows gatekeeper redundancy and load balancing schemes to be deployed transparently.
H323 Annex O defines the following symbolic names to be used in the service field of the SRV record:
Service | Name | Meeting |
h323ls | Location Service | H.323 entity supporting H.225.0 LRQ procedure |
h323rs | Registration Service | H323 entity supporting H.225.0 RRQ procedure |
h323cs | Call Signaling | H.323 entity that supports H.225 call signaling |
h323be | Border Element | H.323 entity supporting communications as defined in Annex G/ H.225.0 |
SIP and DNS SRV
IETF RFC 3263 defines how SIP agents can leverage DNS to resolve a SIP Uniform Resource Identifier (URI) into the IP address, port, and transport protocol of the next hop contact. The method in RFC 3263 may be applied to SIP register messages. RFC 3263 defines the following symbolic names to be used in the service field of the SRV record.
Service | Name | Meeting |
sip | Session Initiation Protocol | SIP messages over TCP or UDP. |
sips | Secure SIP | SIP messages that leverage Transport Layer Security (TLS). TLS is only supported over TCP. |
Endpoint Registration Approach: H.323
We are going to assume that gatekeeper configurations on IPVC endpoints will be manual. Auto-gatekeeper discovery is beyond the scope of this article. Based on Cisco best practice, IPVC system designers should leverage the following methods for identifying gatekeepers to endpoints and infrastructure devices (in order of preference):
- DNS SRV Resource Records
- DNS A records using a round-robin methodology
- Static IP address assignments
Initial Registration DNS SRV
Using DNS SRV records is the preferred approach to providing redundant gatekeeper lists to IPVC endpoints and infrastructure devices for initial registration. From the endpoints perspective, you configure a hostname such as vcsc.netcraftsmen.net. You don’t configure the service or protocol fields on the client. You will notice that the hostname configured is identical to the VCS cluster name. Using the cluster name isn’t necessarily required but it is preferred.
When an endpoint that supports DNS SRV is configured with a gatekeeper hostname it will issue a DNS SRV request on startup. The assigned DNS resolver returns the SRV record with all VCS peer IP addresses, associated priorities, and associated weights. The endpoint will then attempt to register with the VCS peer which has the highest priority.
SRV records can be used to provide redundancy to VCS cluster peers as well as identifying backup or secondary VCS clusters. In very large deployments this can be used in conjunction with geographic proximity of the requesting endpoint to provide all kinds of fancy registration scenarios. For our purposes, we are going to keep it simple. Building on our example, we could have the following SRV records.
_h323ls._udp.vcsc.netcraftsmen.net. 86400 IN SRV 10 0 1719 vcsc02.netcraftsmen.net. 86400 IN SRV 10 0 1719 vcsc01.netcraftsmen.net. _h323rs._udp.vcsc.netcraftsmen.net. 86400 IN SRV 10 0 1719 vcsc02.netcraftsmen.net. 86400 IN SRV 10 0 1719 vcsc01.netcraftsmen.net. _h323cs._tcp.vcsc.netcraftsmen.net. 86400 IN SRV 10 0 1720 vcsc02.netcraftsmen.net. 86400 IN SRV 10 0 1720 vscc01.netcraftsmen.net.
In this example, we are using an equal priority and weight for all VCS peers. This should result in a balanced distribution of registrations over both cluster peers. We are also specifying a TTL value of 24 hours (86400 seconds). This is recommended because the endpoint will cache ALL addresses returned by the SRV request. Having a higher TTL will minimize the number of requests that need to be handled by the DNS server(s).
Initial Registration DNS Round-Robin
Your IPVC design should account for both DNS SRV and DNS Round-Robin if you have endpoints which do not support DNS SRV for H.323 registration/location services. It is also worth noting that endpoints that do leverage SRV will fall back to requesting DNS “A” resource records if there is no SRV records.
If the endpoint doesn’t support SRV or there is no SRV record response, then the endpoint will perform a DNS A-record lookup. The DNS server responds with an IP address and the endpoint attempts registration. If that IP address doesn’t respond, then the endpoint will submit another DNS A-record lookup. This will be repeated until the endpoint can register to a gatekeeper (VCS peer).
Using our example, we may have the following A-records configured in a round-robin fashion:
vcsc.netcraftsmen.net. 60 IN A 10.1.1.32 vcsc.netcraftsmen.net. 60 IN A 10.2.2.32
Again, we used the cluster name for the round-robin configuration. This is a good idea because you can then have a standard gatekeeper configuration for all endpoints, regardless of whether they support SRV record lookups or not. When using DNS A-records in a round-robin configuration, it is recommended to use relatively short TTL values. In our example, we use 60 seconds. I like to set the TTL to a value which is identical to my SIP and H.323 re-registration interval (which I adjust to 60 seconds).
Using DNS round-robin has the drawback of requiring the endpoint to make multiple DNS requests in the event that the VCS associated with the first DNS query response is unreachable.
Initial Registration IP Address
This is the least preferred approach because it represents a single point of failure with gatekeeper registration. You specify an IP address and hope the VCS device is online.
Re-Registration and Fault Tolerance
No matter which method is used, when an H.323 endpoint registers to a VCS cluster member the gatekeeper process provides the endpoint with an alternate gatekeeper list. The VCS builds this alternate gatekeeper list from the list of VCS cluster peer members. This alternate gatekeeper list will be used by the endpoint for all further re-registration transactions. The advantage of the alternate gatekeeper functionality is that endpoints will use this list in the event the gatekeeper holding the active registration should become unavailable. IOW, a H.323 endpoint that has an alternate gatekeeper list won’t rely on DNS in the event of failover.
The endpoint still needs to detect the failure that means that there is an impact to service. The actual impact is roughly equivalent to the re-registration timer value.
Endpoint Registration Approach: SIP
Based on Cisco best practice, IPVC system designers should leverage the following methods for identifying SIP registrars to endpoints and infrastructure devices (in order of preference):
- SIP Outbound (RFC 5626)
- DNS SRV Resource Records
- DNS A records using a round-robin methodology
- Static IP address assignment
Registration Using SIP Outbound
Some Cisco endpoints can leverage a parameter called SIP Outbound. Using this option, an endpoint is configured to have more than one SIP registrar/proxy address and has the “SIP Outbound” optional parameter enabled. When configured in this manner, the endpoint will actually open separate TCP connections to each configured SIP registrar/proxy server. The endpoint keeps all sessions open (assuming there is an available network path).
An example endpoint configuration:
- Proxy 1
- Server discovery: Manual
- Server address: vcsc01.netcraftsmen.net (or 10.1.1.32)
- Proxy 2
- Server discovery: Manual
- Server address: vcsc02.netcraftsmen.net (or 10.2.2.32)
- Outbound: On
Using the above configuration our endpoint will establish a SIP connection to 10.1.1.32 and 10.2.2.32. Since the endpoint is registering to both peers simultaneously, there is no service disruption in the event one of the VCS registrations is broken (i.e. due to a VCS or network path failure).
Registration Using DNS SRV
Using DNS SRV for SIP registrations is very similar to the method described for H.323. From the endpoints perspective, you configure a hostname such as vcsc.netcraftsmen.net. You don’t configure the service or protocol fields on the client. You will notice that the hostname configured is identical to the VCS cluster name. Using the cluster name isn’t necessarily required but it is preferred.
When an endpoint that supports DNS SRV is configured with a SIP proxy hostname it will issue a DNS SRV request on startup. The assigned DNS resolver returns the SRV record with all VCS peer IP addresses, associated priorities, and associated weights. The endpoint will then attempt to register with the VCS peer which has the highest priority.
SRV records can be used to provide redundancy to VCS cluster peers as well as identifying backup or secondary VCS clusters. In very large deployments this can be used in conjunction with geographic proximity of the requesting endpoint to provide all kinds of fancy registration scenarios. For our purposes, we are going to keep it simple. Building on our example, we could have the following SRV records.
_sip._tcp.vcsc.netcraftsmen.net. 86400 IN SRV 10 0 5060 vcsc02.netcraftsmen.net. 86400 IN SRV 10 0 5060 vcsc01.netcraftsmen.net. _sips._tcp.vcsc.netcraftsmen.net. 86400 IN SRV 10 0 5061 vcsc02.netcraftsmen.net. 86400 IN SRV 10 0 5061 vscc01.netcraftsmen.net.
The SRV records defined above facilitate SIP sessions that leverage TCP or TLS transport mechanisms. We don’t specify a UDP configuration because the call control channel for SIP conversations involve messages that are too large to be carried on a packed based (vs. stream based) transport.
In this example, we are using an equal priority and weight for all VCS peers. This should result in a balanced distribution of registrations over both cluster peers. We are also specifying a TTL value of 24 hours (86400 seconds). This is recommended because the endpoint will cache ALL addresses returned by the SRV request. Having a higher TTL will minimize the number of requests that need to be handled by the DNS server(s).
Registration Using DNS Round-Robin
As with H.323, when an endpoint doesn’t support method 1 or method 2 then we fall back to using a DNS round-robin approach. In our example, SIP clients would use the exact same configuration presented for H.323 clients.
Registration Using Static IP Address
This is the least preferred approach because it represents a single point of failure with gatekeeper registration. You specify an IP address and hope the VCS device is online.
Re-Registration and Fault Tolerance
The closest analog to the H.323 “Alternate Gatekeeper” feature in SIP is found in RFC 5626. An endpoint that supports the ability to establish and maintain divergent SIP connections offers the optimum fault tolerant model.
Endpoints that do not support this capability (or are not configured to support it) must rely on one of the other three approaches. Using the DNS SRV approach, an endpoint that loses connection to its primary VCS will use the cached DNS SRV response to establish a connection to an alternate SIP proxy/registrar. With DNS round-robin, the endpoint must query DNS again, which can add considerable lag time.
Cisco Movi Clients
The Cisco Movi client is a SIP-only client that supports DNS SRV. Therefore it shall prefer SIP registration option 2 (SIP Outbound is not supported). Movi clients are actually provisioned with separate parameters for “internal” and “external” registrations. This approach means that some thought must be given to how DNS records are resolved by internal and external DNS clients.
When the Movi client is initiated it will always attempt to register to the VCS host configured as the “internal” VCS. If this registration attempt fails then the VCS will try the external VCS. If you are using DNS names for your internal/external VCS parameters then the first step Movi will use to register to the internal VCS is to query DNS to resolve the hostname to an IP address.
Therefore, NetCraftsmen recommends that DNS be configured so that queries from external (i.e. Internet) clients cannot resolve DNS SRV or host records for the VCS Control cluster. This will optimize the time it takes for a Movi client on the Internet to determine it should register to the VCS Expressway (i.e. “external”) cluster.
SRV for Registration vs. SRV for Call Routing/Searching
The design approach outlined in this article is focused on facilitating the registration process only. The registration process should be treated separately from call processing. DNS SRV and NAPTR records supporting call routing may be different than those used to support the registration process.