Open menu
-->

AWS EMR Cluster In VPC

Cloud Conformity allows you to automate the auditing process of this resolution page. Register for a 14 day evaluation and check your compliance level for free!

Start a Free Trial Product features
Security

Risk level: Medium (should be achieved)

Ensure that your Amazon Elastic MapReduce (EMR) clusters are provisioned using the AWS EC2-VPC platform instead of EC2-Classic platform (outdated from 2013.12.04) for better flexibility and control over security, better traffic routing and availability.

Launching and managing AWS EMR clusters using EC2-VPC platform instead of EC2-Classic can bring multiple advantages such as better networking infrastructure (network isolation, private subnets and private IP addresses), much more flexible control over access security (network ACLs and security group outbound/egress traffic filtering) and access to newer and powerful EC2 instance types (C4, M4, R4, etc) for your clusters. Even more, if you are processing sensitive data within your EMR clusters, you may want the additional access control provided by the EC2-VPC platform, that can be enabled by launching your clusters into a VPC. Note: If your AWS account was created after 2013.12.04, it supports EC2-VPC only.

Audit

To determine the EC2 platform (EC2-Classic or EC2-VPC) used to launch your Amazon EMR clusters, perform the following:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to EC2 dashboard at https://console.aws.amazon.com/ec2/.

03 On the EC2 console dashboard, in the Account Attributes upper-right section, check the EC2 Supported Platforms for your AWS account:

  1. If the Supported Platforms value is VPC, your account supports only the EC2-VPC platform and all your cluster instances are launched within a Virtual Private Cloud (VPC) environment, therefore the EMR cluster platform checkup ends here.
  2. If the Supported Platforms value is set to EC2 and VPC, your account supports both EC2-Classic and EC2-VPC platforms. To identify any AWS EMR clusters launched using EC2-Classic, continue with the next step.

04 Navigate to EMR dashboard at https://console.aws.amazon.com/elasticmapreduce/.

05 In the left navigation panel, under Amazon EMR, click Clusters to access your AWS EMR clusters page.

06 Select the EMR cluster that you want to examine, then click on the View details button from the dashboard top menu.

07 On the selected cluster configuration details page, click on the Summary tab to access the EMR cluster configuration details.

08 On the Summary panel, search for the Subnet ID configuration attribute. The Subnet ID attribute value references the identifier of the VPC subnet where the EMR cluster instances have been provisioned. If there is no Subnet ID attribute listed on the Summary panel, the selected Amazon Elastic MapReduce (EMR) cluster was launched using the EC2-Classic platform and needs to be migrated to the EC2-VPC platform (see Remediation/Resolution section).

09 Repeat steps no. 6 – 8 to verify the platform used by other EMR clusters available in the current region.

10 Change the AWS region from the navigation bar and repeat the audit process for other regions.

Using AWS CLI

01 Run describe-account-attributes command (OSX/Linux/UNIX) to list the platform type(s) currently supported by your AWS account:

aws ec2 describe-account-attributes
    --region us-east-1
    --attribute-names supported-platforms

02 The command output should return the type(s) of the platform used for the current AWS account:

{
    "AccountAttributes": [
        {
            "AttributeName": "supported-platforms",
            "AttributeValues": [
                {
                    "AttributeValue": "EC2"
                },
                {
                    "AttributeValue": "VPC"
                }
            ]
        }
    ]
}

If the AttributeValues array returns only the VPC value, your AWS account supports only the EC2-VPC platform and all your EMR clusters are launched within a Virtual Private Cloud (VPC). If AttributeValues array returns both EC2 and VPC values (as shown in the output example above), your AWS account supports both EC2-Classic and EC2-VPC platforms. To identify any EC2-Classic based EMR clusters, continue with the next step.

03 Run list-clusters command (OSX/Linux/UNIX) using custom query filters to list the identifiers (IDs) of all the active Amazon EMR clusters available in the selected region:

aws emr list-clusters
    --region us-east-1
    --active
    --output table
    --query 'Clusters[*].Id'

04 The command output should return a table with the requested cluster IDs:

---------------------
|   ListClusters    |
+-------------------+
|  j-AAAABBBBCCCCD  |
|  j-BBBBCCCCDDDDE  |
+-------------------+ 

05 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the cluster that you want to examine as identifier, returned at the previous step, and custom query filters to expose the ID of the VPC subnet where the EMR cluster instances have been deployed:

aws emr describe-cluster
    --region us-east-1
    --cluster-id j-AAAABBBBCCCCD
    --query 'Cluster.Ec2InstanceAttributes.Ec2SubnetId'

06 The command output should return the requested subnet ID, e.g. "subnet-aaaabbbb" if the selected EMR cluster was provisioned within a VPC, otherwise, the command does not return an output. If the describe-cluster command does not produce an output, the cluster instances do not belong to a subnet, therefore the selected Amazon Elastic MapReduce (EMR) cluster was not launched within a VPC using the EC2-VPC platform, instead the cluster was created using the EC2-Classic platform.

07 Repeat step no. 5 and 6 to verify the platform used by other EMR clusters available in the current region.

08 Change the AWS region by updating the --region command parameter value and repeat steps no. 1 - 7 to perform the audit process for other regions.

Remediation / Resolution

To migrate your AWS EMR clusters from EC2-Classic platform to EC2-VPC platform, you must re-create your clusters within a Virtual Private Cloud (VPC). To relaunch and configure your EMR clusters in an AWS VPC, perform the following actions:

Using AWS Console

01 Login to the AWS Management Console.

02 Navigate to EMR dashboard at https://console.aws.amazon.com/elasticmapreduce/.

03 In the navigation panel, under Amazon EMR, click Clusters to access your AWS EMR clusters page.

04 Select the EMR cluster that you want to relaunch into a VPC (see Audit section part I to identify the right resource) then click on the Clone button from the dashboard top menu.

05 Inside the Cloning <your-cluster-ID> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Click Clone to start the cloning process.

06 On the Create Cluster page, select Step 1: Software and Steps from the left navigation panel and configure the software that will be installed on the new cluster. Click Next to continue the setup process.

07 On the Hardware Configuration panel, select the VPC network and the EC2 subnet where the new EMR cluster instances will be provisioned, set the EBS volume size for the root device and configure the cluster nodes (instances) as needed. Click the Next button until your reach Step 4: Security page, without changing any other configuration attributes.

08 Review the security options, then click Create Cluster to provision your new Amazon EMR cluster.

09 Once you have moved the existing data and verified that your new EMR cluster is working 100% within the selected VPC network, terminate the original cluster in order to stop incurring charges for it. To terminate the original EMR cluster, launched with the EC2-Classic platform, perform the following:

  1. Go back to the navigation panel and under Amazon EMR choose Cluster list.
  2. Select the AWS EMR cluster that you want to shut down.
  3. Click on the Terminate button from the dashboard top menu.
  4. In the Terminate clusters confirmation box, review the original cluster details then click Terminate.

10 Repeat steps no. 4 - 9 to migrate other AWS EMR clusters, provisioned in the current region, from EC2-Classic platform to EC2-VPC platform.

11 Change the AWS region from the navigation bar and repeat the entire process for other regions.

Using AWS CLI

01 Get the configuration details from the original (EC2-Classic) EMR cluster, required for the next step. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the cluster that you want to re-create (see Audit section part II to identify the right resource), to describe its configuration details:

aws emr describe-cluster
    --region us-east-1
    --cluster-id j-AAAABBBBCCCCD

02 The command output should return the running EMR cluster configuration information:

{
   "Cluster": {
     "Name": "cc-hadoop1-cluster",
     "ServiceRole": "EMR_DefaultRole",
     "Tags": [],
     "TerminationProtected": false,
     "NormalizedInstanceHours": 4,
 
     ...
 
     "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR",
     "VisibleToAllUsers": true,
     "BootstrapActions": [],
     "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/",
     "AutoTerminate": false,
     "Id": "j-AAAABBBBCCCCD"
   }
}

03 Run create-cluster command (OSX/Linux/UNIX) using the configuration details returned at the previous step as values for the necessary parameters to re-create the running (EC2-Classic) EMR cluster within the selected VPC network. The following command example creates an AWS Elastic MapReduce cluster with one c4.xlarge type master instance and 2 c4.xlarge type core instances, named "cc-vpc-emr-cluster", inside a VPC subnet identified by the ID "subnet-aaaabbbb":

aws emr create-cluster
    --region us-east-1
    --name cc-vpc-emr-cluster
    --release-label emr-4.0.0
    --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c4.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=c4.xlarge
    --service-role EMR_DefaultRole
    --log-uri s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/
    --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-aaaabbbb,EmrManagedSlaveSecurityGroup=sg-ddddeeee,AvailabilityZone=us-east-1a,SubnetId=subnet-aaaabbbb
    --visible-to-all-users
    --no-auto-terminate
    --no-termination-protected

04 The command output should return the new EMR cluster ID:

{
    "ClusterId": "j-BBBBCCCCDDDDE"
}

05 Once the original cluster data is migrated and you have verified that your new EMR cluster is working 100% within the selected VPC network, terminate the original cluster to stop incurring charges for it. To shut down the original EMR cluster, launched with the EC2-Classic platform, run terminate-clusters command (OSX/Linux/UNIX) using its ID as identifier (the command does not produce an output):

aws emr terminate-clusters
    --region us-east-1
    --cluster-ids j-AAAABBBBCCCCD

06 Repeat steps no. 1 – 5 to migrate other AWS EMR clusters, provisioned in the current region, from EC2-Classic platform to EC2-VPC platform.

07 Change the AWS region by updating the --region command parameter value and repeat the entire process for other regions.

References

Publication date Dec 19, 2017