NFD16: Automating Arista Networks

Author
Peter Welcher
Architect, Operations Technical Advisor

This blog post contains my impressions and comments about Arista’s presentations at #NFD16. Links to the recorded videos of the presentations can be found here.

My Take on Arista

I mentally associate Arista with New York City, because that’s where I’ve encountered Arista switches the most: financials and advertising. That seems to be due to New York aggressiveness and openness regarding vendor mix, or perhaps cost sensitivity.Networking Field Day

I’ve liked what I’ve seen of Arista. Instant CLI comfort zone for Cisco folks, with selective features thoughtfully adding to the standard L2, L3 ones. What’s not to like?

For years I’ve been a huge Cisco fan. Huge market share, does a lot of things well (and some, not). Lately, I’ve been feeling pain from complexity (OTV, First Hop localization, LISP with VRFs), and tending to shy away from new features that add complexity, are fragile, or that smaller organizations just cannot sustain.

Architectural solutions that tie together network management, hardware, and authentication inherently have complexity — and that’s where Cisco has been headed lately. That’s fine; it’s useful to large companies, as long as it works reasonably well. For smaller companies, the knowledge needed means cost: Either you need enough heads internally, or you need consultants (NetCraftsmen!) to design it right so that it works, or to advise on how to keep it working.

When doing design, a good part of that is finding the right solution for the customer in the first place. Which brings us back to Arista. If you look at some of my prior blog posts, like Designing for Cisco Nexus 9K, you’ll note several alternatives. As I and Ivan Pepelnjak have noted, all your datacenter may need is two switches. Or a small spine and leaf fabric. That’s what I’ve seen people doing with Arista.

Multi-Chassis Link Aggregation (MLAG) technology allows you to do a heck of a lot in a design, both for Campus and for moderately sized datacenters. If the MLAG technology is robust, then not-overly-large VLANs are simple to build and support. The Arista equipment I’ve seen was often used in just that way.

Arista also supports VXLAN with its own control plane, CVX, and more recently, it appears with EVPN. I’m liking EVPN technology (and possible future standard) for larger datacenters that want robustness plus L2 overlay without vendor lock-in (NSX or ACI, for example). Early days, yet apparently some interoperability issues between vendors.

So much for my reactions when someone says “Arista.” Now on to the #NFD16 part.

NFD16 and Arista

This blog post cannot possibly cover over two hours of NFD16 presentations in any detail, so I’ll hit high points and point you at the recorded videos. Arista gave five presentations:

  • Arista Overview
  • 400G Landscape
  • EOS Programmability
  • Network Automation and Telemetry
  • Routing Architecture Transformations

The first talk (Arista Overview) was a useful overview of Arista capabilities. My notes include things like leaf-spine, VXLAN, on-prem storage trending to IP based (I tend to agree), content driving increasing bandwidth. Arista being one OS, EOS, across all platforms. New: 128 fold ECMP, 15,000 Tbps/switch, over 1 million servers in one cluster. MACSEC with DWDM as a DCI option. Increased market share. OpenConfig support. 7500 model provides 100 G with 16 slots, 2 million routes on chip now with merchant silicon. CloudVision providing ARP relay for DCI (I prefer MP-BGP EVPN for VXLAN myself to controller solutions). MPLS segment routing. Native Docker support.

I’m not going to comment much on the second talk (400 Gig): Andy Bechtolsheim gave a deep talk about where network speeds and optics are going, and how soon. For those deeply into that sort of thing, see the posted video. Short version: more speed, coming soon. (Does your enterprise need 400 Gbps?)

According to the third talk (Programmability), EOS is architected with a state orientation, put state into the config DB as atomic operations. That de-couples features, and extends to a fabric via NetDB.

There are five ways to program EOS: EAPI (external APIs), OpenConfig, NetDB Streaming, Turbines, EOS SDK. From my notes:

  • EAPI: run command, get JSON back. (Cisco did this in NXOS a while back; it’s a first step.)
  • OpenConfig: YANG, gRPC based, streaming status, models based. Google and AT&T providing impetus. Potential issue: completeness for vendor features, lowest common denominator.
  • NetDB Streaming: stream state using their gRPC-based protocol. Complete but requires customer sophistication. “No more SNMP for big operators.”
  • Turbines: CloudVision as a backend for NetDB streaming. Can store all state over time, using Arista’s own API with telemetry apps. Turbines API is like Apache Spark for stream processing. Can generate derived state and write that back to the DB, cleanup and reformat. Can do web sockets GUI to visualize data. If I understood correctly, this is being built now; not in common use.
  • EOS SDK: most sophisticated, C++ code on the box, full sysDB access. Update control plane via NetDB: enables custom control planes, e.g., traffic engineering or segment routing applications.

While mulling over what I wanted to say in this blog post, I was amused to find, as often happens, that Ivan Pepelnjak said it for me, and some time ago.

“Arista initially focused on DIY people and those people loved the tools Arista EOS gave them: Linux on the box, programmability, APIs…. However, if you want to enter the traditional enterprise segments where people don’t want to build their own tools or play a system integrator, there’s an enormous amount of work to be done….”

I have to agree. Developing end-user automation and monitoring solutions is not something you or even a vendor can build quickly, so providing well-planned tools and then building on them would be the smart way for a vendor to tackle enterprise solutions incrementally.

The fourth presentation (Automation and Telemetry) addressed “why streaming telemetry” and real-time, not near-real-time, monitoring. Not missing events, e.g. Spanning Tree (“STP”) events. Good stuff! The presentation included demos and talked about DANZ, turbines, and anomaly detection. See the video for details. It would be nice to have standards around streaming telemetry (dare I say “MIB-II done right”?), so the coders don’t have to waste time dealing with per-vendor variations. Not holding my breath.

The fifth presentation, Routing Architecture Transformations, provided an overall summary, also covering scaling and performance, “Cloud principles now driving routing,” simplification, and spine-leaf fabric with ECMP: hard to break it.

Other Thoughts

A recent Arista webinar touted L3 switches as routers. With ubiquitous Ethernet, that’s happening — although, as always, there are trade-offs (QoS, VPN support, other CPU-intensive tasks that can’t be done in chips, yet). That’s the topic of an upcoming blog post.

References

Relevant blogs:

I did some Arista homework, and so am including more than usual vendor links that may be helpful:

Comments

Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!

Disclosure Statement
Cisco Certified 20 Years

Leave a Reply