Fixing FCIP and SRDF

Today’s blog is about things to look out for, and some things to look at, to fix FCIP and SRDF performance problems. FCIP and SRDF are technologies used to replicate SAN data over MAN or WAN — specialized but critical applications running on the IP network. I can see Terry Slattery is having so much fun writing about Application Performance problems that I want to join in! So we’ll put the Simplicity in Design series on hold for a day. I have been drafting a blog about Simple Design for the Datacenter, but it keeps getting longer and longer. It appears Datacenters are anything but Simple!

Here are some links to what Terry has written recently about Application Performance improvement:

And an article of mine about TCP performance (Terry also independently wrote about Mathis’ formula):

TCP/IP Performance Factors

I should note that these are all about things that have been on our minds recently in work we’ve been doing at a couple of sites. Of course we’re not identifying where or who, and wish to share some ideas we’ve had or things we’ve considered that might be useful to you.

Let’s focus on some things relating to FCIP (Fiber Channel over IP) and SRDF (EMC SAN replication software) performance.

What is FCIP?

FCIP is a TCP-based tunnel, typically between two Cisco MDS switches (or of course SAN devices from another vendor). You license the ability to run FCIP and you have to have the right hardware. When you configure FCIP, there are a bunch of parameters to set, and tuning tools to help estimate those parameters. Under Advanced Settings, you can tune retransmission timers and make sure FCIP doesn’t retransmit too quickly, selective acknowledgements, etc. TCP window size is estimated and can be a key set of parameters: min and max bandwidth available, and estimated Round Trip Time (which is measured dynamically as well). There are some other parameters available — but my goal here is to sketch out some ideas, and if they interest you, you can pursue the details in the Cisco documentation.

Making FCIP Efficient

FCIP just encapsulates Fiber Channel (FC) frames of up to 2148 bytes inside TCP/IP. That may suggest MTU problems to you. Normally, the Cisco MDS will use Path MTU Discovery (P-MTU-D) to determine the MTU size along the path and fragment as needed.

For more details, see the lucid documentation at Cisco – Configuring FCIP. I should probably make the point that if you don’t know what you’re doing, be careful with the advanced parameters, you can really reduce FCIP’s performance if you get them wrong.

From Cisco slideware found elsewhere, it appears that if P-MTU-D discovers a typical MTU of say 1500 bytes, there is some degradation in FCIP throughput, as you’d expect. So if your WAN or MAN path between SAN sites supports

The Fine Manual (above link) notes that:

The default TCP settings may well not be appropriate on WAN links and tuning will be helpful.
If your estimated Round Trip Time (RTT) is significantly less than that measured by FCIP, it may slow performance for performance.
You can turn on Selective ACK if you’re experiencing a lot of dropped packets within a send window.
The CWM Congestion Window Monitoring feature monitors congestion after an idle period and determines the max burst size after the idle period. You can tune it carefully if there is buffering in the IP network. The burst size in bytes is computed and (my interpretation of the documentation) lies somewhere between min-bandwidth x RTT and max-bandwidth x RTT.

Defaults: on a Gig link, min bandwidth is 500 Mbps, max is 1 Gbps, RTT 1 msec. (Note that the latter is darn quick: local LAN not MAN/WAN type latency.)

Yes, we’re talking advanced TCP behavior here. Did I mention: be careful when tuning?

Conclusions re Fixing FCIP

Make sure the tuning parameters are set correctly.

Use jumbos on the WAN or MAN path if possible.

Use QoS to reserve at least the configured min bandwidth for FCIP.

Set the max bandwidth to the worst case bandwidth available to FCIP, i.e. link speed minus other traffic on the link.

What is SRDF?

SRDF stands for “Symmetrix Remote Data Facility”, and is a family of software products from EMC for SAN replication. See for example Symmetrix Remote Data Facility and Symmetrix Remote Data Facility (SRDF). SRDF/S and SRDF/A are synchronous and asychronous replication products, and then there are cluster and other products that extend those basic products.

Basic storage fact of life: if you’re doing synchronous replication, the source has overhead and delay because remote write operations must be confirmed. That limits the distance for synchronous to 300 km over fiber, per EMC SRDF Synchronous. Asynchronous avoids that.

It was amazingly hard to find online references providing clarity as to how SRDF/S and SRDF/A actually transport the data (native FC, native FCIP hence TCP, other?). I note that WAN acceleration products generally claim to accelerate SRDF/A, and some also claim to do SRDF/S. The product link is the only lucid reference on this topic I’ve been able to find. It says that SRDF WAN connections connect via a FCIP or iFCP gateway or natively over TCP/IP via the EMC Symmetrix GigE Director, and then across the TCP/IP netework. FC over SONET or ESCON to Fast Ethernet or SONET are other possibilities.

Cisco has a whitepaper about running SRDF/S over FCIP, see EMC and Cisco: Business Continuance Solutions White Paper.

My conclusion: Cisco FCIP makes your MAN or WAN look like FC over local fiber. So you can spoof SRDF into thinking it has a fiber connection, or if you have the relevant EMC products, you can get EMC to send the SRDF traffic as FCIP. In either case, the MAN or WAN routers see TCP traffic, which is where WAN accelerators come in. They can also LZW compress the traffic, which is a form of de-duplication in effect.

I looked to see if SRDF uses Path MTU Discovery. Interestingly, the EMC WAN Accelerator Guide recommends turning P-MTU-D off! It isn’t clear why.

Conclusions re Fixing SRDF

Find out how the local site is doing SRDF. Make sure synchronous replication is not being used over a long distance.

If the site is using EMC-based FCIP, determine the settings, consider tuning. As with most SAN technology, once you get outside the basics, you’re supposed to call support. That probably explains why I’m not seeing EMC documentation about tuning SRDF FCIP.

See the FCIP conclusions above.

Do You Really Have Jumbos?

I’d like to finish with a story. Site X configured jumbos in the distribution and core layers of its MAN network, and on the attached Cisco MDS switches running FCIP. In terms of the above diagram, on all the links. Presumably FCIP ran happily over that for some period of time.

Then Site X decided to use MPLS in the core. Probably the site admins intended to ensure room for MPLS labels, so the interfaces were configured with “mpls mtu 1524“. The problem with that is that configuring that in fact causes MPLS traffic to use a smaller MTU of 1524, rather than the interface MTU of 9000 or whatever is configured. This may have been noticed (sharp eyes!) because MPLS best practice was followed, i.e. an attempt was made to use the mpls advertise-labels command to only advertise labels for loopback addresses. Unfortunately, the syntax for that changed a few years back, as I discovered painfully about 1 year ago. You now need to specify both the from and to fields, making it harder to use the command. So what was happening was that the FCIP destination subnet was getting a label assigned to it, causing the FCIP traffic to be MPLS labelled, hence it had to be fragmented. Since Cisco FCIP runs P-MTU-D by default, the MDS presumably discovered that, and was then doing the fragmentation rather than the first MPLS device. Since P-MTU-D relies on dropped packets every couple of minutes to test whether the max MTU has changed, in the implementations I’ve heard of, this guarantees a small level of dropped packets for the FCIP traffic.

Two fixes come to mind:

Don’t specify the MPLS MTU — there is no need to. Or specify it to match the interface. (The command above is for use where you want to allow use of baby giants, where the Ethernet MTU is 1500 but the hardware will allow you to get away with sending slightly larger frames.)
Be careful with the MPLS advertise-labels command — and note that older examples of how to configure it are probably now wrong.

I am telling this story here since this set of interactions is fairly subtle, and is easily something one might do when adding MPLS to a network core. If you have FCIP or other jumbo traffic, they might then mysteriously suffer from degraded performance.

Let me repeat my one piece of wisdom re enabling jumbos: do them thoroughly in some well-defined part of the network, and take alternative routing paths into account. You do NOT want to ever have to do link by link checking that the proper MTU is set.

And by the way, inconsistent MTU has interesting effects on OSPF. OSPF does check MTU when forming an adjacency. However, I hear that if you change MTU, OSPF does not re-check. Thus if you change MTU on one end of a link, sometime later when the link bounces OSPF will not be able to restore the former OSPF adjacency. Nasty! I just tested this behavior using GNS3 / Dynagen, and can now confirm that it does in fact happen that way!