Open menu
-->

AWS EMR Instance Type Generation

Cloud Conformity allows you to automate the auditing process of this resolution page. Register for a 14 day evaluation and check your compliance level for free!

Start a Free Trial Product features
Cost
optimisation
Performance
efficiency

Risk level: Medium (should be achieved)

Ensure that all Amazon Elastic MapReduce (EMR) clusters provisioned within your AWS account are using the latest generation of instances in order to get better performance at lower cost.

By using the latest generation of Amazon Elastic MapReduce instances instead of the previous generation of instances you can upgrade your EMR clusters for better hardware performance (faster CPUs, additional RAM memory, superior I/O and higher network throughput) at lower costs. For example, the new generation memory-optimized (R3) instances are 9% faster than the previous ones and the compute-optimized (C3 and C4) instances are 37% faster than the old generation (C1) instances. On top of all these performance improvements the latest generation instances are cheaper than the old ones, e.g. a c3.xlarge instance provisioned by AWS EMR in the US East region costs $0.263/hour whereas an old generation c1.xlarge instance costs $0.640/hour.

Audit

The following table (named EMR Previous Generation Instance Types) will help you to identify any previous generation EMR instance types in use:

Instance FamilyPrevious Generation Instance Types
General Purposem1.small | m1.medium | m1.large | m1.xlarge
Memory Optimizedm2.xlarge | m2.2xlarge | m2.4xlarge | cr1.8xlarge
Compute Optimizedc1.medium | c1.xlarge | cc2.8xlarge
torage Optimizedhi1.4xlarge | hs1.8xlarge
GPU Instancescg1.4xlarge

To determine if your Amazon Elastic MapReduce (EMR) clusters are using instance types from the previous generation, perform the following:

Using AWS Console

01 Login to the AWS Management Console.

02 Navigate to EMR dashboard at https://console.aws.amazon.com/elasticmapreduce/.

03 In the left navigation panel, under Amazon EMR, click Cluster list to access your AWS EMR clusters page.

04 Select the EMR cluster that you want to examine then click on the View details button from the dashboard top menu.

05 On the selected cluster configuration details page, click on the Hardware tab to expand the EMR cluster hardware panel.

06 Inside the Instance Groups section, verify the cluster instances type displayed in the Instance Type column:

Instance Type

If the Instance Type value found for the master, core and task instances is also listed in the EMR Previous Generation Instance Types table (Audit section), the selected Amazon Elastic MapReduce cluster is using instances from a previous generation, therefore a hardware upgrade is highly recommended (see Remediation/Resolution section for the upgrade process).

07 Repeat steps no. 4 – 6 to determine the instances type for other AWS EMR clusters provisioned in the current region.

08 Change the AWS region from the navigation bar and repeat the entire audit process for other regions.

Using AWS CLI

01 Run list-clusters command (OSX/Linux/UNIX) using custom query filters to list the identifiers (IDs) of all the active Amazon EMR clusters available in the selected region:

aws emr list-clusters
    --region us-east-1
    --active
    --output table
    --query 'Clusters[*].Id'

02 The command output should return a table with the requested cluster IDs:

---------------------
|   ListClusters    |
+-------------------+
|  j-2U0II6K04FDNX  |
|  j-2PD91U2E1F3MX  |
+-------------------+

03 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the cluster that you want to examine, returned at the previous step, and custom query filters to expose the instance(s) type used by the selected Amazon EMR cluster:

aws emr describe-cluster
    --region us-east-1
    --cluster-id j-2U0II6K04FDNX
    --query 'Cluster.InstanceGroups[*].InstanceType'

04 The command output should return the cluster instance(s) type:

[
    "c1.xlarge"
]

If the value returned by the command output is listed in the EMR Previous Generation Instance Types table, the selected AWS Amazon Elastic MapReduce cluster is using instances from a previous generation, therefore an upgrade to the current generation is highly recommended.

05 Repeat step no. 3 and 4 for each Amazon EMR cluster available in the current region.

06 Change the AWS region by updating the --region command parameter value and repeat steps no. 1 - 5 to perform the audit process for other regions.

Remediation / Resolution

The following table will help you choose the equivalent current generation instance type required for the AWS EMR hardware upgrade process:

EMR Previous Generation Instance TypesEMR Current Generation Instance Types
m1.small | m1.medium | m1.large | m1.xlarget2.small | m3.medium | m3.large | m3.xlarge
c1.medium | c1.xlarge | cc2.8xlargec3.large | c3.xlarge | c3.2xlarge
m2.xlarge | m2.2xlarge | m2.4xlarge | cr1.8xlarger3.large | r3.xlarge | r3.2xlarge | r3.8xlarge
hi1.4xlarge | hs1.8xlargei2.4xlarge | d2.4xlarge
cg1.4xlargeg2.8xlarge

To upgrade your previous generation EMR instances to their latest generation equivalents you need to clone the required clusters and change their instances type by performing the following actions:

Using AWS Console

01 Login to the AWS Management Console.

02 Navigate to EMR dashboard at https://console.aws.amazon.com/elasticmapreduce/.

03 In the navigation panel, under Amazon EMR, click Cluster list to access your AWS EMR clusters page.

04 Select the EMR cluster that you want to upgrade (see Audit section part I to identify the right resource) then click on the Clone button from the dashboard top menu.

05 Inside the Cloning <your-cluster-ID> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Click Clone to start the cloning process.

06 On the Create Cluster page, select Step 2: Hardware from the left navigation panel to access the cloned cluster instances configuration.

07 On the Hardware Configuration panel, select the instance type equivalent from the EC2 instance Type dropdown list:

EC2 instance Type dropdown list

(consult the Remediation/Resolution section table to find the equivalent instance type required), regardless of the instance job type (i.e. master, core or task).

08 Click the Next button until your reach Step 4: Security page, without changing any other configuration attributes.

09 Now click Create Cluster to provision your new (cloned) AWS EMR cluster.

10 Once you have moved the existing cluster data and verified that your new EMR cluster is working 100% with the current generation instances type, shut down/terminate the original cluster to stop incurring charges for it. To terminate the old EMR cluster, perform the following:

  1. Go back to the navigation panel and under Amazon EMR choose Cluster list.
  2. Select the AWS EMR cluster that you want to shut down.
  3. Click on the Terminate button from the dashboard top menu.
  4. In the Terminate clusters confirmation box, review the original cluster details then click Terminate.

11 Repeat steps no. 4 - 10 to upgrade the instances type for other Amazon EMR cluster provisioned in the current region.

12 Change the AWS region from the navigation bar and repeat the remediation process for other regions.

Using AWS CLI

01 Get the configuration details from the running (original) EMR cluster, required for the next step. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the cluster that you want to re-create (see Audit section part II to identify the right resource), to describe all its configuration details:

aws emr describe-cluster
    --region us-east-1
    --cluster-id j-2U0II6K04FDNX

02 The command output should return the running EMR cluster configuration information:

{
   "Cluster": {
     "Name": "HadoopEMRCluster",
     "ServiceRole": "EMR_DefaultRole",
     "Tags": [],
     "TerminationProtected": false,
     "ReleaseLabel": "emr-4.6.0",
     "NormalizedInstanceHours": 4,

     ...

     "MasterPublicDnsName": "ec2-183-53-85-245.compute-1.amazonaws.com",
     "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR",
     "VisibleToAllUsers": true,
     "BootstrapActions": [],
     "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/",
     "AutoTerminate": false,
     "Id": "j-2U0II6K04FDNX",
     "Configurations": []
   }
}

03 Run create-cluster command (OSX/Linux/UNIX) using the configuration details returned at the previous step as values for the necessary parameters to re-create the running EMR cluster using the equivalent instance type from the current generation. The following command example creates an AWS Elastic MapReduce cluster with one c3.xlarge type master instance and 2 c3.xlarge type core instances, named NewHadoopEMRCluster:

aws emr create-cluster
    --region us-east-1
    --name NewHadoopEMRCluster
    --release-label emr-4.6.0
    --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=c3.xlarge
    --service-role EMR_DefaultRole
    --log-uri s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/
    --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-5a4f68e0,EmrManagedSlaveSecurityGroup=sg-576f63e9,AvailabilityZone=us-east-1b
    --visible-to-all-users
    --no-auto-terminate
    --no-termination-protected

04 The command output should return the new EMR cluster ID:

{
    "ClusterId": "j-1LXVEF6NDU9I2"
}

05 Once the original cluster data is migrated and you have verified that your new EMR cluster is working 100% with the current generation instances type, terminate the original cluster to stop incurring charges for it. To shut down the old EMR cluster run terminate-clusters command (OSX/Linux/UNIX) using its ID as identifier (the command does not produce an output):

aws emr terminate-clusters
    --region us-east-1
    --cluster-ids j-2U0II6K04FDNX

06 Repeat steps no. 1 – 5 for each Amazon EMR cluster that requires instance type upgrades, available in the current region.

07 Change the AWS region by updating the --region command parameter value and repeat the entire process for other regions.

References

Publication date Feb 24, 2017