Getting to AWS S3

Author
Peter Welcher
Architect, Operations Technical Advisor

I am impressed with the Amazon Web Services (AWS) documentation. Pretty high quality, and a lot of it.

A recent customer need caused me to realize there was a gap in what I know, and as it turned out, it wasn’t just me.

So, I propose to cover an area where the AWS documentation maybe could be a little bit better or less scattered. I’d ultimately prefer a page covering all the options, costs, and pros/cons in one place.

I’ll take off my ‘expert’ hat and do my best to explain what I found, hopefully saving you some time. Please do let me know if I got something wrong or there’s another possible approach to the following.

Caveat: I have not field tested this. (Time, lack of cloud lab budget.)

The specific issue that initiated discussion was that the customer was uploading a lot of documents to AWS S3, probably as a step towards providing web services from there rather than on-premises.

AWS S3 is storage, which an organization’s apps may need to access. Or data may need to be transferred to or from the S3 storage. Your use of S3 consists of one or more buckets, which are different chunks of storage (permissions, owners, etc.). They can provide different kinds of storage. From a network design perspective, that’s in “app land.”

You, the reader’s, and my concerns are more the networking aspect: getting data back and forth to AWS S3.

The data might be coming from AWS VPC VM instances. They’ve got that documented pretty well. In this case, the relevant VPC had a private connection to S3.

To send data to AWS S3, the customer’s on-premises server(s) communicate the data to a DNS name using HTTPS. That results in a public IP, a public interface to S3.

The problem is that there is traffic competition for the Internet (Internet2) links the traffic traverses, and overall capacity was slowing progress to the point where the copying process would likely miss a deadline. As there would be ongoing synchronization of data, ongoing utilization and timeliness were also of interest. Who pays for the bandwidth was also a concern.

We discussed some quick fixes, one of which was to do some /32 routing to cause the Amazon traffic to use one or two other Internet links. Then the app team brought up the idea of setting up AWS DirectConnect links. We’re still trying to nail down the details of what’s being proposed: is it dedicated links for just the AWS S3 traffic or is the idea to also use the Direct Connects for their Amazon VPC instances as well. (Think ‘virtual small data center’ if not familiar with VPCs.) This is more of a management / budgetary (who pays) decision. But that’s details of how the Direct Connect is set up and how the related routing is set up. All well-documented.

It turns out there is one snag, however. If you search, you’ll find that you apparently cannot get from outside through a VPC to AWS S3. At least, not without games with NAT.

This is where I knew enough to note that AWS S3 is not a VPC and that I didn’t really know what the full set of design choices are for how S3 can be accessed. Research time!

AWS S3 Access Points

Doing some RTFM (Reading The Fine Manual), I learned that one could create S3 Access Points. (Poor choice of name, too much like Wireless Access Point?). And they can be used to control access to the S3 bucket. They are unique DNS names tied to the access permissions. They can be used to control access to portions or all of a massive set of shared, stored data if that is desired. The access includes the ability to control which VPC(s) can access data in the S3 bucket in question. Ok, so an S3 Access Point is not HOW you access (send or receive) data, but WHO or WHAT is allowed to do so via a connection. Good to know. It doesn’t help with the connection itself.

Link: AWS S3 Access Points

VPC Endpoint for AWS S3

It turns out you can create a VPC Endpoint for Amazon S3. Funny thing, there’s even a HowTo document for that.

Link: VPC Endpoint for AWS S3

The short version of endpoints is that it appears to be a fine way to add storage access to a VPC, which is what you want for apps running in AWS.

Assuming the VPC exists, you create the Endpoint and then add an S3 bucket security policy allowing access from the VPC. Pay attention to the details since the example policy does not block external access to the Endpoint for an authenticated user.

Link: Endpoints for Amazon S3

Be sure to read that web page since creating a VPC Endpoint is disruptive of some VPC or services to AWS S3 traffic.

Another article makes it clear that a VPC Endpoint can only be routed to from addresses within the VPC and discusses other limitations. It also discussed services the VPC uses and making sure your policy allows them access.

Link: Gateway Endpoint Limitations

Various articles note that S3 hostnames normally resolve to public IP addresses. A quick search did not find anything covering if / when they do not do so.

Private Virtual Interfaces

Searching leads to a documentation page on Direct Connect and S3:

Link: Access S3 over Direct Connect

It talks about a private virtual interface, also VPC endpoints not extending outside the VPC. It appears to be implying that a VPC endpoint is a private virtual interface (“VIF”) but does not actually explicitly say that.

It does, however, provide the solution I was looking for: set up the Direct Connect (direct or hosted) with a public virtual interface (public VIF) to the S3 bucket(s).

Gateways and Transit Gateways

If we’re networking, we might well be using Gateways or Transit Gateways to connect our Direct Connect link to AWS VPCs and services. I prefer this approach, which provides for one (or more) cloud entities that your locations and VPCs can all connect to and be routed via. If a public VIF is tied to the Direct Connect, I expect it can participate in the routing for a Transit Gateway the Direct Connect ties into.

This is where your various ways of connecting come into play. It seems that if you use private VIFs in VPCs to connect to S3, you have tight per-VPC control over S3 access, but the control is scattered. With the public VIF, you can use routing to use the DirectConnect to get to S3, rather than the Internet if you want. Security then probably looks more like what you’d do with Internet access, at least something acting like ACLs for what IPs can access it. And probably which users/identities.

Securing AWS S3

This is a big subject, and I know enough to know not to attempt including a book-sized topic here.

From what I read in various places, it is also easy to get AWS S3 security wrong, so you really do not want to leave S3 buckets exposed.

Also, I can imagine getting it wrong because it isn’t clear who is responsible for making sure it happens and is done well. Did the app folks bring in the Security Team to review their access to storage? Not something you’d do in a physical data center with FC zoning, probably?

Here’s a link to get you started: Blocking Public Access to Your AWS S3 Storage.

Routing to AWS / AWS S3

Let’s get back to networking now.

You set up a Direct Connect. You know the prefixes in your VPC(s) because you assigned them. So routing to VPC(s) should be straightforward.

I’ve seen with VPN to a VPC that AWS will give you 169.254.x.x/30 addressing for the VPN link if you don’t specify addressing. I’ll note in passing that you can ping the other end from the router endpoint, but not from elsewhere: traffic to or from that subnet isn’t routable. To avoid someone thinking, “I can’t ping it, so it must be down” (network management tools?), use routable addressing.

So, if you’re trying to route to S3, you could have apps set up to use the public virtual interface /32 and route to that. A more general alternative means you need to know the IP ranges AWS might resolve for the S3 DNS names and route to those. That could be handy for other reasons, so let’s take a look at that.

It helps that AWS provides that information in a fairly easily consumed form (see below). Appreciated!

Link: IP Address Ranges Used by AWS S3

The above link tells you about certain tools for working with JSON. I couldn’t resist doing some Python. Start with the following:

Link: JSON file download

A Little Python

I’ve been having too much fun with Python lately, as in using it for all sorts of tasks.

My brain runs Mac / Linux apparently, which means I think in terms of combining Linux tools with little scripts (or Excel spreadsheets) to get me what I need. Like Python processing a routing table into a CSV form that I can ingest into Excel to use a lookup function to identify and colorize routing next hops. Yeah, a lookup table in Python would have worked too, but it would not have been colorful!

In this case, I wrote a little hasty Python to flatten the AWS JSON, putting each entry onto one line.

 

#! /usr/bin/python
#
# This flattens the JSON download of AWS addresses
# from https://ip-ranges.amazonaws.com/ip-ranges.json
#
# USAGE: python ipranges.py <<json-file-name>>
#

import json
import sys

fmtStr='{0:20} {1:20} {2:20} {3:20}'

if __name__ == '__main__':

    if len(sys.argv) < 2:
        print("ARGV: ", sys.argv)
        print("ERROR: Please provide the IP range JSON filename.")
        exit()
    else:
        filename=sys.argv[1]
 
    with open(filename,'r') as file:
        json_data=file.read()
    info=json.loads(json_data)

    print(fmtStr.format('Prefix', 'Region', 'Service', 'Border'))
    for item in info['prefixes']:
        prefix=item['ip_prefix']
        region=item['region']
        service=item['service']
        border=item['network_border_group']
        print(fmtStr.format(prefix, region, service, border))

Sample output:

Prefix             Region               Service            Border              

3.5.140.0/22       ap-northeast-2       AMAZON             ap-northeast-2      

13.34.37.64/27     ap-southeast-4       AMAZON             ap-southeast-4      

15.230.56.104/31   us-east-1            AMAZON             us-east-1          

That way, I can pipe the output or ‘cat’ saved output and use ‘grep’ twice to filter for matches on search criteria, e.g., S3 and us-east-1.

grep S3 aws-output.txt | grep us-east-1          

Conclusions

We’ve toured several ways of connecting to AWS S3. I doubt I’ve exhausted the subject, and I’m certainly not ready to provide pros/cons discussion. But I hope there’s enough above to get you started.

VPCs are network-like / routing-focused entities. They have their quirks, e.g. built-in rules about what can route to what through a VPC.

It may help to think of AWS S3 as a disk drive. The virtual connections are in effect cabling connecting the disk drive to a VPC or a TGW or a Direct Connect. Does that help?

 

Disclosure statement