Open menu
-->

AWS ElasticSearch Cluster Status

Cloud Conformity allows you to automate the auditing process of this resolution page. Register for a 14 day evaluation and check your compliance level for free!

Start a Free Trial Product features
Performance
efficiency

Risk level: High (not acceptable risk)

Ensure that your AWS ElasticSearch (ES) clusters are healthy, i.e. they all have shard allocation status set to "Green". When an Amazon ES cluster is unhealthy, the shard allocation status is set to "Red", which means that at least one primary shard and its replicas are not allocated to a node. The most common causes of an AWS ES cluster with the status set to "Red" are failed cluster nodes or ElasticSearch process crashes due to a continuous heavy processing load. To get notified in case your Amazon ES clusters become unhealthy and implement a plan to recover them, Cloud Conformity recommends creating AWS CloudWatch alarms that get triggered whenever your clusters health status becomes "Red" for longer than one minute.
The AWS CloudWatch metric used to detect unhealthy ElasticSearch clusters (Red) is:

ClusterStatus.red – which indicates that the primary and replica shards of at least one index are not allocated to nodes within an ES cluster. Relevant statistic: Maximum. Units: Count.

Detecting unhealthy Amazon ES clusters with the status set to "Red" is imperative for your ElasticSearch applications availability. Also, AWS ElasticSearch service stops taking automatic snapshots while the cluster status is set to "Red" and when this status persists for more than 16 days, permanent data loss can occur.

Audit

To identify unhealthy Amazon ElasticSearch (ES) clusters, perform the following actions:

Using AWS Console

01 Sign in to AWS Management Console.

02 Navigate to ElasticSearch (ES) dashboard at https://console.aws.amazon.com/es/.

03 Choose the ES cluster (domain) that you want to examine and click on the domain name (link) to access its configuration page.

04 Select Cluster health tab from the dashboard top panel, then check the Status attribute value available in the Summary section. If the attribute value is set to Red, the selected Amazon ElasticSearch cluster is unhealthy, therefore actions need to be taken in order to recover the selected cluster.

05 Repeat step no. 3 and 4 to determine if there are other unhealthy Amazon ES clusters provisioned within the current region.

06 Change the AWS region from the navigation bar and repeat the process for the other regions.

Using AWS CLI

01 Run list-domain-names command (OSX/Linux/UNIX) using custom query filters to list the names of the AWS ElasticSearch (ES) clusters available in the selected region:

aws es list-domain-names
	--region us-east-1
	--query 'DomainNames[*].DomainName'

02 The command output should return the requested ES cluster names:

[
    "cc-project5-cluster",
    "cc-prod-es-cluster"
]

03 Run get-metric-statistics command (OSX/Linux/UNIX) to get the statistics recorded by AWS CloudWatch for the ClusterStatus.red metric. The following command request output returns positive values if the selected Amazon ElasticSearch cluster, identified by the name "cc-project5-cluster", has the shard allocation status set to "Red", i.e. when at least one primary shard and its replicas are not allocated to a node within the cluster:

aws cloudwatch get-metric-statistics
	--region us-east-1
	--metric-name ClusterStatus.red
	--start-time 2018-12-16T17:03:10Z
	--end-time 2018-12-17T17:03:10Z
	--period 3600
	--namespace AWS/ES
	--statistics Maximum
	--dimensions Name=DomainName,Value=cc-project5-cluster

04 The command output should return the requested ElasticSearch metric data:

{
    "Datapoints": [
        {
            "Timestamp": "2018-12-16T17:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        },
        {
            "Timestamp": "2018-12-16T18:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        },
        {
            "Timestamp": "2018-12-16T19:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        },
 
        ...
 
        {
            "Timestamp": "2018-12-17T15:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        },
        {
            "Timestamp": "2018-12-17T16:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        },
        {
            "Timestamp": "2018-12-17T17:03:10Z",
            "Maximum": 1.333,
            "Unit": "Count"
        }
    ],
    "Label": "ClusterStatus.red"
}

If the "Maximum" (statistic) attribute value is greater than or equal to 1, as shown in the example above, the ES cluster has the shard allocation status set to "Red", therefore the selected Amazon ElasticSearch cluster is unhealthy.

05 Repeat step no. 3 and 4 to determine if there are other unhealthy Amazon ES clusters available in the selected region.

06 Change the AWS region by updating the --region command parameter value and repeat steps no. 1 – 5 to perform the audit process for other regions.

Remediation / Resolution

Step 1: Create and configure the Amazon CloudWatch alarm required to send alert notifications whenever your ElasticSearch cluster health status becomes "Red" for more than one minute:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to SNS dashboard at https://console.aws.amazon.com/sns/v2/.

03 In the navigation panel, select Topics and click the Create new topic button.

04 In the Create new topic dialog box, enter a name and a display name for your new SNS topic, then click Create Topic.

05 Open the newly created SNS topic configuration page by clicking on its Amazon Resource Name (ARN) link.

06 Under Subscription section, click Create Subscription.

07 Select Email as subscription protocol from the Protocol dropdown list.

08 In the Endpoint box, enter the email address where you want to receive the AWS CloudWatch alarm notifications, then click Create Subscription to create the required subscription.

09 Use your preferred email client application to open the message received from AWS Notifications, then click on the appropriate link to confirm your new email subscription.

10 Navigate to CloudWatch dashboard at https://console.aws.amazon.com/cloudwatch/.

11 In the left navigation panel, click Alarms.

12 Click Create Alarm button from the dashboard top menu to initiate the alarm setup process.

13 On Create new alarm page, provide the following information:

  1. Within Metric section, click Select metric button and select the ClusterStatus.red metric from the list of metrics available for the AWS ElasticSearch cluster that you want to monitor.
  2. Inside the Alarm details section, in the Name and Description boxes, provide a unique name and a short description for your new CloudWatch alarm.
  3. Under xWhenever: ClusterStatus.red, select >= (greater than or equal to) from the is dropdown list and enter 1 as the threshold value in the box next to the dropdown list to trigger the alarm every time the cluster status gets set to "Red". Type 1 in the datapoints box to set the metric data points to 1 minute.
  4. In the Actions section, click the + Notification button, select State is ALARM from the Whenever this alarm dropdown menu and choose the AWS SNS topic name created earlier from Send notification to dropdown list.
  5. Review the alarm configuration details then click Create Alarm. Once created, the new CloudWatch alarm will be listed on the Alarms page. After the monitoring data is loaded, the State (status) of the new CloudWatch alarm will change from INSUFFICIENT_DATA to OK.

Using AWS CLI

01 First, run create-topic command (OSX/Linux/UNIX) to create a new SNS topic for sending email notifications whenever the required AWS CloudWatch alarm is triggered:

aws sns create-topic 
	--name cc-unhealthy-cluster-notifications

02 The command output should return the ARN for the newly created AWS SNS topic:

{
   "TopicArn": "arn:aws:sns:us-east-1:12345678901:cc-unhealthy-cluster-notifications" 
}

03 Run subscribe command (OSX/Linux/UNIX) to send the subscription confirmation message to the notification endpoint (the email address provided as endpoint):

aws sns subscribe
	--topic-arn arn:aws:sns:us-east-1:123456789012:cc-unhealthy-cluster-notifications
	--protocol email
	--notification-endpoint notifications@cloudconformity.com

04 Run confirm-subscription command (OSX/Linux/UNIX) to confirm the email subscription by validating the token sent to the notification endpoint selected (the command does not produce an output):

aws sns confirm-subscription
	--topic-arn arn:aws:sns:us-east-1:123456789012:cc-unhealthy-cluster-notifications
	--token du25e15f37fb687f5d51e6e241d7700ae02f7124d8268910b858cb4db727cesb2474bb937929d3bdd7ce5d0cce19325d036bc498d3c217426bcafa9c501a2cace93b83f1dd3797627467553dc438a8c974119496fc3eff026eaa5d14472ded6f9a5c43aec62d83ef5f49109da71efb7g664

05 Run put-metric-alarm command (OSX/Linux/UNIX) to create the AWS CloudWatch alarm that will fire every time the specified Amazon ElasticSearch cluster status gets set to "Red" (i.e. becomes unhealthy). The following command example creates an AWS CloudWatch alarm named "cc-unhealthy-es-cluster-alarm", within the US East (N. Virginia) region, for a metric filter called "ClusterStatus.red", alarm that sends notifications to an SNS topic named "cc-unhealthy-cluster-notifications" when the selected ES cluster becomes unhealthy (if successful, the command does not produce an output):

aws cloudwatch put-metric-alarm
	--region us-east-1
	--alarm-name cc-unhealthy-es-cluster-alarm
	--alarm-description "Triggered by the 'Red' health status."
	--metric-name ClusterStatus.red
	--namespace AWS/ES
	--statistic Maximum
	--comparison-operator GreaterThanOrEqualToThreshold
	--evaluation-periods 1
	--period 60
	--threshold 1
	--actions-enabled
	--alarm-actions arn:aws:sns:us-east-1:123456789012: cc-unhealthy-cluster-notifications

Step 2: Recovering unhealthy Amazon ElasticSearch clusters can be a complex task so you may want the AWS support team to assist. To ask AWS for assistance, create a support case using the Support Center console, as shown in the example below:

Using AWS Console

01 Sign in to AWS Management Console.

02 Navigate to AWS Support Center page at https://console.aws.amazon.com/support/.

03 On Support Center page, select My support cases tab to access your support cases.

04 On My support cases panel, click Create case to open the support case form.

05 On the Create Case page, perform the following actions:

  1. Under Regarding, select Technical Support option.
  2. Choose ElasticSearch Service from the Service dropdown list.
  3. Select Cluster Issue from the Category dropdown list.
  4. Inside the Subject box, enter a subject for your request such as "Recover unhealthy Amazon ElasticSearch cluster".
  5. In the Description textbox, enter a short description for your request so that AWS support team can evaluate your request.
  6. Under Contact method, select a preferred contact method that AWS support team can use to respond to your request.
  7. Click Submit to send the technical assistance request for your unhealthy ElasticSearch cluster to Amazon Web Services.

References

Publication date Oct 12, 2018