I just became aware of AWS Transit Gateway Network Manager. (“AWSTGNM”) I’m not sure how I missed it, seeing as I’ve become a fan of transit gateways. It was announced in December of 2019.
I’m blogging about it here in case you missed it or didn’t explore what it does.
TL;DR: AWS makes it sound like the perfect tool. Digging into their (pretty good) documentation, I ended up with a mild bit more “meh” in my reaction. Yes, it provides some visibility and centralized logging. It provides central control for configuring regional VPN connections. The performance data seems like it is version 1.0 and could be better/do more.
This blog also somewhat updates my previous NaaS blog.
[Apologies: I’d love to show you screen captures, but time/cost or copyright on the AWS videos precludes that.]
What is AWS Transit Gateway Network Manager?
You can watch the video sales pitch (short = 1 minute 48 seconds, currently).
Here’s my summary from that and the demo (see below):
AWS TG NM gives you one tool which can monitor basic performance and logging data for your interconnected AWS Transit Gateways, an “operational dashboard.” At no extra cost!
It assumes you have some number of global Transit Gateways interconnected within AWS, using Inter-Region Peering. You can set up a full mesh of inter-site connections, and traffic between regions is encrypted. You can also connect various non-AWS sites via VPN, presumably to the nearest transit gateway. There is also now VPN acceleration available (for a fee).
You might use Transit Gateways to form your global corporate backbone. And yes, if you’ve built that, then you will likely want to manage it. In one central manager, letting you manage and monitor. That’s the niche where AWSTGNM really might help.
(By the way, AWS gets awarded points for NOT saying “Single Pane of Glass”!)
AWSTGNM is based on registering your AWS Transit Gateways and your on-premises “resources.” (And for the picky, yes, they did say “premises.”) That gets you an interactive topology and usage metrics. Also, logging of events such as connection and routing changes.
AWSTGNM also lets you set up AWS site-to-site VPN connections. AWSTGNM integrates with SD-WAN equipment from several vendors (at the time of the video recording, Cisco, Silver Peak, Aruba, and Aviatrix). A Site to Site tool lets you create and manage VPNs, showing the ones you’ve created. For the SD-WAN vendors just listed, the remote SD-WAN device will be configured by AWSTGNM. Checking online, I see that Versa is now also supported.
AWSTGNM apparently also works with DirectConnect links: “When you register a transit gateway, the following transit gateway attachments are automatically included in your global network: VPCs, Site-to-Site VPN connections, AWS Direct Connect gateways, Transit Gateway Connect, Transit gateway peering connections.” It also lets you create connections between two devices it manages.
There is an early demo from 2019 with more detail (36 minutes!). The above sales video chains into it: just wait, and it’ll start playing. It gives a good tour of the product.
The setup appears simple: you register your transit gateways via checking a box from the displayed list of all your transit gateways. Any connections to those checked are learned.
The Topology tab shows you a diagram of your AWS global network. That includes VPCs connected to each transit gateway. A geographic view is also provided, showing status and providing per-site drill down to details. Up/down status is tracked.
On-premises (sites) can be added manually, as can devices (routers) at such sites.
Any changes etc., are detected by CloudWatch. The dashboard lets you view:
- Real-time events for the AWS TG-based global network
- Topology, routing, status changes
- Due to the use of CloudWatch, you can feed events to lambda functions for other uses or customization.
- Drill-down to see all the event details
You can query and visualize (graph) event data from CloudWatch if you wish.
I have strong opinions on performance statistics. You are about to see them in action!
AWSTGNM reports on bytes and packets.
Good: data is always helpful!
Good: in and out direction statistics are kept separate.
Good: drops are also reported, as well as black hole/no route counters.
Not so good: the stats are data in and out. Is that aggregate in and out for the device? Or just in and out of AWS from the edge or something? I’d hope the latter. Put differently: are they in and out sums across all (virtual) interfaces or what?
The demo presenter said, “aggregated.” So I’m going with summed data. The documentation confirms that.
So those metrics allow us to see changes in behavior but provide less detail than a network per-interface type of view might. Or so it seems to me! I’d think most people would like to, e.g., be able to tell how much of their Internet bandwidth each VPN tunnel is using.
(Related pet peeve: graphs that lack contextual information, information telling you exactly what you’re seeing. My favorite mistake in other products is a graph with no indication whether it is showing 1 hour or 24 hours or what. AWS fails in that the polling/reporting interval is not stated. AWS does label the time axis in its graphs hourly. The data points shown are more frequent. How frequent?)
Not so good: network people think in bits, not bytes. This is clearly a server-style report! (I’m mildly joking.) Also, networking people want bits per second. Just do the simple math for me, so I don’t have to look up the polling interval and do the math in my head?
For what it’s worth, the VPC polling measurement interval is 60 seconds. I looked for more info, but that’s all I could find.
Yes, you can see if your raw bytes drop off or climb through the ceiling. And for internal AWS links, bandwidth is likely somewhat irrelevant (how BIG is the pipe?). I’d prefer percentages, but if there’s no clear max value, percentages can be a bit hard to calculate.
If you have multiple AWS Transit Gateways, AWSTGNM might well be worth checking out.
If you’re planning on an AWS-centric NaaS buildout, AWSTGNM might well be helpful in building it, especially if you’re doing regional SD-WAN with one of the supported vendors.
You’re not going to be able to do SNMP to AWS objects, so getting some performance data is a lot better than nothing. I could wish for more finely-grained data.