Fifty Shades of Cloud

Networking professionals need to become aware of networking within the cloud(s), and from cloud to users and customers.

Cloud here could mean public cloud or SaaS.
“Within the cloud(s)” might be between different cloud providers, or between VPCs (Virtual Private Clouds) on one provider.
We need to provide good performance and reliability to internal and external users of our cloud-based applications and services.

Face it, we’re going to be involved in designing networking support for cloud-based applications. That may be optimistic — “should be involved in designing” may be more like it — but we all know who is going to get called when there’s a performance problem.

So, we’d best have some idea about how various forms of cloud go about networking, if only to be able to discuss, understand, and troubleshoot such applications.

We now have lots of choices concerning compute resources and overall approaches for delivering applications.

Cloud compute and storage (Amazon, Azure, Google, etc.) — virtual private clouds (VPCs)
SaaS, which is basically a managed service delivered via Internet rather than WAN
Containers (and various orchestration and service frameworks for them, e.g. service meshes)
More specialized cloud services (checked out the Amazon Overview or documentation lately?)
Serverless, historically based on per-customer VM instances
AWS FireCracker, serverless via much lighter weight compute resources running securely on a VM (micro-VMs), somewhat container-like. It is OpenSource and can be run on bare metal. It is KVM-based.
Edge Computing

That’s a lot of technology. Clearly, I am not going to get very detailed about any of the above in this blog.

Which Cloud Choice to Use?

The initial driver for cloud was perhaps agility, scale-out, automated scale-up, and high availability. The more recent drivers perhaps add things like matching micro-services coding styles, leveraging platforms or pre-packaged services for faster Time-to-Market, and lowering costs.

I think of this as waves of change, coming faster and faster, solving new problems, or decreasing costs.

Which wave should a technology surfer try to catch? Trying to catch the latest may not make sense — there will always be yet another wave, and it can be easier to pick up new technology if you follow its historical path, since complexity gets added over time.

It’s becoming clearer that there are factors limiting adoption of these cloud variations.

Start-ups are one thing: no history, can move quickly. Existing companies, not so much.

Technology adoption rate / learning is one factor limiting change for existing companies and code. Another is installed code base and budget. Existing companies can only fund and staff a limited amount of change at one time and are consequently forced to prioritize. That does put them at risk of disruption if they don’t prioritize and manage that well.

My sense is that right now, many / most existing code bases are moving to basic cloud. A VPC is not too much of a paradigm shift, whereas leveraging containers well may require more code rewriting, if done well. Serverless requires an even bigger shift, which might be more likely in a new micro-services-based application (or mobile application) than a more classic application.

Why Should You Care?

This blog started with designing or troubleshooting a cloud app. Let’s elaborate on that a bit.

Developers operate at different skill levels. Some (many?) can be somewhat vague about the impact of physical networks, e.g. packet loss, link failure, latency. Cloud brings increased risk regarding those very factors. That can cost a development project a lot of money if it charges off in an ill-conceived direction.

The second reason is that there is a lot of early cloud work currently, “lift and shift”. Most of it doesn’t worry about scaling up. So duplicate addresses, NAT, static routes, and other “quick” networking elements may creep in. Do you know what the alternatives are? Quirks of the various cloud providers?

Yet another reason: the IT world is evolving different ways of doing things, methods not on traditional networking staff’s “radar”.

For example, app developers can do a lot of what I’d call “server-based networking”. As in routing via Vyatta router code or other server-based code, or security via iptables or other access lists. We need to be able to discuss pros and cons.

How can you discuss the trade-offs between that and network physical or virtual appliances if you only know the latter?
How about comparing a container service mesh’s load balancer to a physical load balancer? Or discussing where leveraging both might be the best design?

Another example: I’ve been reading up and dabbling with containers on my Mac. Reading about service mesh frameworks introduced me to container sidecars. Wow!

Instead of battling to install agents on each app server, and dealing with odd interactions and performance hits, we might just do the container equivalent of network service chaining. The idea of sidecars is to spin up containers that do things like flow export, logging, packet analysis, security, or whatever. The sidecar front-ends the application or service container, i.e. all traffic has to pass through it. One benefit is modularity, the isolation of unexpected interactions. Also pay-as-you-go concerning performance.
Another big deal is the whole service mesh concept, which is an especially big deal in the container world. Instead of building distributed computing into the app, perhaps the framework can add it as part of deployment logic. This frees up the developer to focus on the application logic, without having to code the complexity of fault tolerance, etc. And not having to support each developer’s own code (and bugs) concerning resiliency.

All that has an architectural level impact, possibly changing how networking + dev teams build things.

We’ve heard that “networking people should learn to code”.

Coding to automate the provisioning and operational aspects of cloud compute and networking is one good reason to know how to code. But more than that, being able to communicate with coders and application architects / developers and have some idea what they’re talking about is likely a fairly crucial skillset as well. It’s not just network tool coders we’ll need to talk to! We need to broaden that to app dev teams!

Where to Start

Specifically, which of the cloud technologies should a networking person become more familiar with? What should one’s learning / hands-on / experience priority be?

I can’t really answer that, other than with the classic consultant response: “it depends”. Instead of analysis paralysis, do something.

Cloud providers have free trial periods. Used with a little care, they’re a great way to learn.
You can run containers on a laptop computer. There are various tutorials available, also sample mini-applications. For instance, Sock Shop.

Starting with classic VPCs may be the simplest, providing a good foundation for learning about the others.

The obvious answer is: what your employer or customer needs, or what they are doing.

Getting Edgy

I couldn’t decide where to put Edge Computing in the above list, so I put it at the end. Above that, the trend seems to be smaller, faster, cheaper. Edge is more about low latency and proximity to IOT devices doing real-time calculations. That’s qualitatively different.

I currently view Edge as “some of the same stuff, different place”. And evolving. Probably different tools and needs.

Networking and Cloud

I’ve heard the comment that networking for the cloud technologies is pretty much the same. That’s partially true but an oversimplification. The network has to provide connectivity to the cloud. And within the cloud, there is virtual networking, that acts somewhat like what we’re used to.

Cloud compute and containers do have routing in some form, access list capabilities, load balancing, and usually some NAT, both for outbound Internet access but perhaps other uses. And public IPs for inbound Internet access to applications.

However, there can be hidden surprises. One of them is the AWS limit on VPC (Virtual Private Cloud) to VPC routing — you can write rules to route from VPC A to B, and B to C, but your traffic will not be able to transit from A to C via B. reportedly now addresses this issue, albeit with some constraints.

Microsoft has per-interface routing tables. Put differently, their virtual router behaves a bit differently than we might expect.

Another difference is that broadcast or multicast may well not be emulated, meaning EIGRP or OSPF hellos don’t work with virtual routers. Since you won’t get link down conditions, you may need BFD to quickly detect loss of a neighbor.

Update: While this blog was sitting in our publication queue, Ivan Pepelnjak and Daniel Dib have clearly been digging into what happens “under the hood” in various clouds, based on some tweets and blogs. Ivan has webinars posted on AWS and Azure networking. Daniel has started a Slack channel on the topic. I’m glad they were able to explore in depth and share their findings, especially regarding things we are used to doing that may not work as expected in the cloud.

A different thought is that we networking people have to know enough about how things work to ask the right questions. To me, application flows are key. Who talks to who? Where are the resources in question located? Does the service or micro-service auto-scale up and down? How is it monitored? Etc.

Here’s another thing you might find somewhat unexpected: in VXLAN we might run BGP to the Top of Rack (TOR) switches. Cumulus and Dinesh Dutt’s book discuss that, and some interesting enhancement to BGP, also large data centers running BGP to the servers themselves. In a recent Arista webinar, I became aware of Tigera Calico. Each compute node acts as a router for the prefixes / endpoints on that server. Similar idea, different context.

Challenge: Pros, cons, what’s your reaction upon reading this? Quick, your manager just called you in and asked for your opinion!

Service Managers

Another item that might be novel to a networking person is the idea of ephemeral services, e.g. with containers that spin up and down. IP addresses are automatically assigned, and instances come and go. That is usually front-ended by a services manager, tracking which containers are running and what services they can provide, so that one service can find another. Upon spinning up, a service registers with the manager. Consumers of the service do service discovery to the service manager, to be informed how to contact a currently active provider of the service.

I’ve been thinking of the services manager as sort of an automated service load balancer, based on service names than virtual IPs. They track containers that can provide a given service and answer DNS queries to steer consumers of services to the services they need. There may well be NAT involved, depending on the features in use.

Virtual Network Devices

Networking people also need to be prepared to use virtual network devices. We might want fancier routing, VPN, or firewall functions than the cloud provider supplies, for instance.

Change seems to be happening fast in terms of connecting to cloud. Some sites have been putting in circuits to connect into various cloud networks. That takes time. It may have cost / bandwidth advantages, especially at higher speeds. Leveraging services like Equinix Cloud Exchange can be more agile.

Note: As of April / May 2019, Equinix has some very interesting and much more agile virtual networking offerings in progress. See my recent blog about Network Edge.

The CSPs (Cloud Service Providers) provide basic VPN access (and may well generate sample configurations for your Cisco or Juniper router). Putting a virtual router or SD-WAN device into the cloud lets you do more sophisticated routing and VPN access, supporting both agility and hybrid cloud. For what it’s worth, Cisco Live had several sessions with slide decks including sample CSRv configurations.

Today’s Reading Assignment

Fallacies of Distributed Computing — I’ve seen this before, but thanks to Ivan Pepelnjak for the reminder. Since I saw it in his blog, it seems to have popped up all over the container reading I’ve since been doing.

Comments

Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!

—————-

Hashtags: #CiscoChampion #TechFieldDay #TheNetCraftsmenWay #Cloud #SaaS #VPC

Twitter: @pjwelcher

Disclosure Statement

NetCraftsmen Services

Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at info@ncm2020.ainsleystaging.com.