Ensure that your AWS ElasticSearch (ES) clusters are healthy, i.e. they all have shard allocation status set to "Green". When an Amazon ES cluster is unhealthy, the shard allocation status is set to "Red", which means that at least one primary shard and its replicas are not allocated to a node. The most common causes of an AWS ES cluster with the status set to "Red" are failed cluster nodes or ElasticSearch process crashes due to a continuous heavy processing load. To get notified in case your Amazon ES clusters become unhealthy and implement a plan to recover them, Cloud Conformity recommends creating AWS CloudWatch alarms that get triggered whenever your clusters health status becomes "Red" for longer than one minute.
The AWS CloudWatch metric used to detect unhealthy ElasticSearch clusters (Red) is:
ClusterStatus.red – which indicates that the primary and replica shards of at least one index are not allocated to nodes within an ES cluster. Relevant statistic: Maximum. Units: Count.
This rule can help you with the following compliance standards:
This rule can help you work with the AWS Well-Architected Framework
This rule resolution is part of the Cloud Conformity Security & Compliance tool for AWS
Detecting unhealthy Amazon ES clusters with the status set to "Red" is imperative for your ElasticSearch applications availability. Also, AWS ElasticSearch service stops taking automatic snapshots while the cluster status is set to "Red" and when this status persists for more than 16 days, permanent data loss can occur.
To identify unhealthy Amazon ElasticSearch (ES) clusters, perform the following actions:
Remediation / Resolution
Step 1: Create and configure the Amazon CloudWatch alarm required to send alert notifications whenever your ElasticSearch cluster health status becomes "Red" for more than one minute:
Step 2: Recovering unhealthy Amazon ElasticSearch clusters can be a complex task so you may want the AWS support team to assist. To ask AWS for assistance, create a support case using the Support Center console, as shown in the example below:
- AWS Documentation
- Amazon Elasticsearch Service FAQs
- Managing Amazon Elasticsearch Service Domains
- Recommended CloudWatch Alarms
Unlock the Remediation Steps
Gain free unlimited access
to our full Knowledge Base
Over 750 rules & best practices
Get started for FREE
You are auditing:
Risk level: High