Network Stability Through Resilience Engineering
Who owns Cloud Security in your organization? Or what is the division of responsibility? Are they treating cloud as just another part of your network, or are they considering risks unique to the cloud?
These can be good questions to ask. I’ve been at sites where the security and network guys pointed fingers at each other. Oops! I can imagine some sites where cloud is “owned” by server / app / dev teams. Do they also own cloud security? If not, are they communicating well with whoever does own cloud security?
Even if you’re a network person and not responsible for cloud security, it may be helpful for you to be aware of potential issues.
This blog will look at some of the issues that come up. Some themes:
What’s different between internal networks and the cloud?
What actually got me started on this blog was a customer doing an MPLS risk assessment, in conjunction with another that needs cloud security. It can be useful to think about what changes, in terms of control, and what might go wrong or be subverted.
With cloud, you’ll be the one providing cloud access. What is different is access controls on the cloud presence, think of it as the cloud data center.
For the internal network or data center, one generally has to be onsite to have access, or use VPN or VDI for offsite access for admins. All those are usually fairly tightly controlled, and under local control, with local standards of rigor. If a hacker does gain remote access to the CLI, they might reconfigure a router to allow them broader VPN access and further exploit from there. Would your network management tools alert you to the configuration change?
One plus to onsite: you do control user authentication and authorization permissions.
For cloud, if your admin credentials are compromised, the hacker can then do as they will, subject to any limitations on those credentials. Cloud root / account admin access owns it all, as far as that account is concerned. Multi-factor authentication helps. Some feel that non-cloud key management has value, retaining control over such an important aspect of operation.
Worse, suppose it is not your admin credentials, but the cloud provider’s credentials that get compromised… “keys to the kingdom”!
From one perspective, that’s not that much different than with a physical data center and VPN admin access. You have to protect your credentials. You have to trust that the provider’s staff cannot easily access or change settings on your account or cloud VPC / vNet. But in addition, you have to trust that they guarded their credentials, key store, etc.
If you use a cloud automation product that is cloud hosted, the same applies to your credentials for access to their product, but also their access credentials to their product. Plus, the security with which they store your credentials for accessing the various supported other tools and cloud instances. In short, someone else is responsible for the security of your entire key ring, so to speak!
There was a news story a few years back on a Silicon Valley startup where a disgruntled admin wiped out the Amazon VPC / server instances, which wiped out the backup data in AWS Glacier. Game over!
Just as a nuclear launch reportedly requires dual keys, you might want any drastic cloud actions to require two people: “split key” or “dual control” and “split knowledge”.
You probably will want the master account keys known to at most 1 or 2 people and stored in a secure form somewhere in case something happens to those people. You don’t want to end up locked out of your cloud instance!
Aside from root keys for an account, you can also limit admin control scopes, so that different teams administer different parts of the cloud footprint. Or tie different teams to different accounts or sub-accounts. The point being limiting what any one person or group can impact. The thinking here is like designing networks with ‘blast radius’ or ‘failure domain size’ in mind — if or when something goes wrong, one wants to limit the damage.
Another factor to consider is billing notification. If the spending authority, credit card or whatever expires and a notification goes to someone who left the company, that might be bad. Similarly, if your cloud automation has a mistake that just multiplied your daily costs by 10 or 100, you want to know about that ASAP, not when the bill comes in and brought to your attention 1-2 months later. If you’re using API or third-party automation tools, there might be no one looking at a web dashboard that might clue you in.
Another concern is access keys. Micro-services are subject to access controls. It is wise to tightly limit what each micro-service can access. If the keys are left exposed, your data and perhaps more is at risk. From what I have read, a common mistake is having keys visible in either deployment scripts or API code, or in publicly readable log files or storage buckets. Reportedly, there are tools which will alert hackers to unsecured Amazon S3 buckets being accessible, perhaps rather quickly. Google search suggests that claim is accurate. I’ve included one link in the References section with security suggestions.
Conclusion: your DevOps team needs to work with your security team to make sure keys are properly secured, and that consumed services are properly configured!
If you think of the cloud as an extension of your corporate network and / or data centers, you extended connectivity to it. Did you firewall the connection from your internal network to the cloud?
If you have an internal firewall with access lists, that’s good.
If you’re using ACL rules in the cloud router, not so much. If someone gets admin access to your cloud account, they could change those rules. The risk may be mildly greater than on-premises, since you’re trusting your cloud vendor’s security concerning logins and access permissions.
Let’s follow that to its logical conclusion. If your admin credentials are exposed, a hacker could create new VM instances, or alter those that are in place, perhaps installing hacker tools. So, if you have an onsite firewall and it is only limiting internal access to known VM IPs, well, now the hacker can probe your internal network. That means tight IP and port-based rules for cloud to onsite traffic might be a good idea.
As far as whether to use a firewall, router, or switch to enforce those rules, that’s your choice.
The drawback: your DevOps team is no longer as agile if you lock it down to the IP and port level. IP subnet and ports would be a little more agile… It’s not an agility issue if you pre-negotiate what IPs and ports need to be reached from the cloud.
Having said that, you can perhaps think of it as a security contract around what ports will be exposed.
If you’ve outsourced your server management, are the risks much different?
If you’re doing SaaS, it would be good to understand the SaaS provider’s RPO and RTO, and some of their DR / redundancy features. As I write this, SalesForce’s Pardot operation just had a big multi-day outage. Did that impact your employer?
What I’ve seen elsewhere is that the business may critically depend on a provider who is running a monolithic application out of a single data center. If that’s down, your business is down. If the outage is days (like Pardot), that could really impact the bottom line.
Having a candid discussion with each SaaS provider might be a good starting point — if you can get them to tell you anything. What are their DR / COOP provisions, what is their high-level application architecture? How fast can they recover if a key database is corrupted or fails? If you don’t like what you hear, there might be some defensive measures you could take (like internal replica of key data).
And if you think about it, if there’s a problem, which is going to be faster to recover: your company’s data, or data from 10s or 100s of companies like yours?
The above was somewhat focused on the negative and fear side of things. My intent was to identify some of the potential exposures, precisely with the thought that if you have some idea of what can go wrong, then you can start doing something about it.
What are the positive actions one might take?
NIST has good security / risk management documents. SANS as well. They can help you determine what the risks are, which ones your organization feels the need to do something about, establishing controls, and verifying they are working. There are other sources of such information.
For cloud (and IOT, and SaaS), you might want to start over and think outside the box. There may be risks not (previously) present in your internal network. The prior portion of this blog went into some of those.
The best-known cloud and SaaS providers are huge. If they’re good, they may well have deeper skills and better-defined processes than you might have in-house, especially small to medium-sized organizations. If they’re smaller or lean on budget, maybe not so much.
Suggestion: do some due diligence around how robust your cloud apps are, or how robust your SaaS providers’ operations are. A discussion of internal security controls and processes with each provider may also help. How would they notice that something had changed? What kind of changes would they notice? What keeps the provider’s staff from messing with your cloud networks or VM instances (the answer will probably include the word “keys”)?
Cloud (and other) security can start with securing your data at rest and in motion, and a secure key store.
Beyond that, thinking about risks and security controls may help you identify concerns and do something about them. Good communication with providers may help — if they’ll tell you some of what they do to prevent and detect unauthorized changes, provide backup, do DR recovery, etc. Consider doing some “due diligence” and see what your providers can and will tell you.
The same applies to High Availability features. We just encountered a well-known CoLo provider with sloppy practices around diverse electrical power supply to the cage. Process maturity matters!
Establishing security checks for DevOps teams, whether in-house or consultants, along with code and deployment audits, is also advisable.
Here are some reference links that might help get you started:
(And many other books on the topic)
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!
Hashtags: #CiscoChampion #TechFieldDay #TheNetCraftsmenWay #DevOps #CloudSecurity #DataCenter
Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at firstname.lastname@example.org.
Network Stability Through Resilience Engineering
Cloud Security 101
BGP Traffic Engineering
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.