NetCraftsmen’s online retailer client had failed both an internal and external PCI DSS Audit for PCI compliance and was paying fines. An additional external audit failure would result in losing their ability to use credit cards with their highly profitable online eCommerce portal. The project had the attention of the CIO and other members of the C-Suite as their business was at significant risk if they failed the next audit.
What exactly is PCI DSS?
The Payment Card Industry (PCI) Data Security Standard (DSS) identifies Card Holder Data (CHD) and defines how to protect it. The standard defines three categories of systems with regard to CHD:
- Category 1: any systems which handles or stores CHD (1a) or a system so tightly linked with a 1a that it cannot be separated (1b)
- Category 2: systems that are used to manage or send/receive data from Category 1 systems. This would include systems management, logging, NOC and SOC access and the like.
- Category 3: systems with no access to CHD and which cannot access Category 1 systems
It also defines the communications between these categories:
Figure 1: PCI DSS Communications. Source https://www.pcisecuritystandards.org
Summary: NetCraftsmen started with a review of the failed audits which led to an assessment of the environment. The assessment uncovered that there were several hundred systems involved and much of it was still in transition from bare metal to VMware. The overall project involved network, security, server and application teams who were operating with no clear and coordinated direction from management.
There were multiple paths to take for a solution so NetCraftsmen presented an abstract on the workload associated with each and proposed leading the effort on an integrated solution.
Audit Failure Analysis
The audit failures looked random at first, but analysis actually showed a pattern. Well known PCI Category 1 systems were often reachable by Category 3 systems or unreachable from other Category 1 systems after change windows associated with data center security. Adding or changing any Category 1 and 2 systems required work to be done across up to a dozen firewall pairs which led to a lot of security holes.
Scope of Required Firewall Rules
We quickly estimated that they were using over 500 PCI Category 1 (a & b) application systems that communicated among themselves and with over 100 PCI Category 2 systems. Compliance required the creation and testing of well over 100,000 IP address-based firewall rules using their existing firewall systems.
Further complicating the task, adding or changing any Category 1 and 2 systems was difficult to automate since each subsystem was different (some had only three firewall pairs while others used as many as 12).
There were additional complications surrounding the overall eCommerce environment.
- Applications – The Applications team was in the process of migrating from a waterfall development paradigm to agile, but this work had been very slow. The PCI DSS security issues were highly problematic for them and they did not like having their process burdened by the requirements imposed on Category 1 systems.
- Systems – The Systems team dealt with servers and operating systems and worked independently from the Applications and Network teams. They had selected and were partially implemented with VMware and had purchased VMware NSX but had not yet been trained on it when this project kicked off
- Network – The data centers were very large and been built over a period of 10 years and had three distinct generations of switching equipment. Applications were placed on systems based on available rack space and power instead of security and connectivity requirements, greatly complicating firewall configurations.
- Security – There was no cohesive strategy; security rules were added and deleted on request which left unqualified application owners requesting the rules piecemeal.
A Path Forward
A consensus emerged to have a PCI team meeting with leads from each organization and NetCraftsmen was able to facilitate cooperative work:
- Applications – After some simple discussion (and some pressure by their CIO) the team realized the systems could be modularized and the number of Category 1 systems could be reduced dramatically without significant impact to their schedules. The final count of Category 1 systems was reduced to under 100.
- Systems – NetCraftsmen engineers assisted with spinning up an NSX demonstration and a decision was made to prioritize and migrate the eCommerce portal in its entirety to VMware.
- Network – NSX required changes to the data center fabric, so steps were taken to reconfigure the data centers to support the VXLAN protocol it needed.
- Security – NetCraftsmen developed a security strategy to align with the compliance model.
Summary: It was our belief that no single IT group could solve the issues. The solution was to engage all of the teams in a coordinated, all-out effort to meet the deadlines. This involved having the systems team accelerate the VMware conversion and bringing their network and security operations teams up to speed on the technology. In addition, NetCraftsmen worked with their compliance and applications teams on the importance of clearly identifying PCI impacted systems.
The PCI standard documents the protected data and defines the communications permitted between categories (as shown in Figure 1 above).
So, to make this simple we created a new security strategy based on using a network overlay with the seven PCI DSS categories as our guide. Now all of the rules associated with PCI DSS compliance could be enforced with firewall rules governing communications between the segments.
This permitted the solution to move from over 100,000 IP address-based firewall rules – down to less than 1,000 rules with the bulk of the security handled by just a few dozen rules within VMware NSX.
Furthermore, the attestation and audit processes were also greatly simplified:
Figure 2: PCI DSS Requirements and Security Assessment Procedures. Source https://www.pcisecuritystandards.org
There were no regulatory requirements regarding micro segmentation, but as a bonus the NSX Security Group features used for security segmentation also permitted the capability to limit lateral spread within segments.
Why do this? In modern eCommerce environments multiple VMs exist for a given function and they can even spin up (and down) resources in response to load. Lateral spread between networks is well understood, but not so much within a functional subnetwork. Micro segmentation can prevent systems that run in parallel from infecting each other.
The customer passed their PCI audit and created systems, procedures and processes to maintain compliance.
The security segmentation strategy permitted easy attestation by the external auditor that no PCI Category 1 systems could be reached by any Category 3 systems (and vice versa). Furthermore, the varying rules associated with PCI Type 2 (a, b, c and x) were also greatly simplified.
While micro segmentation within each virtual network was not required by PCI DSS the auditor noted that it also ensured protection from lateral spread.