Ensure that your AWS ElasticSearch (ES) clusters are healthy, i.e. they all have shard allocation status set to "Green". When an Amazon ES cluster is unhealthy, the shard allocation status is set to "Red", which means that at least one primary shard and its replicas are not allocated to a node. The most common causes of an AWS ES cluster with the status set to "Red" are failed cluster nodes or ElasticSearch process crashes due to a continuous heavy processing load. To get notified in case your Amazon ES clusters become unhealthy and implement a plan to recover them, Cloud Conformity recommends creating AWS CloudWatch alarms that get triggered whenever your clusters health status becomes "Red" for longer than one minute.
The AWS CloudWatch metric used to detect unhealthy ElasticSearch clusters (Red) is:
ClusterStatus.red – which indicates that the primary and replica shards of at least one index are not allocated to nodes within an ES cluster. Relevant statistic: Maximum. Units: Count.
Detecting unhealthy Amazon ES clusters with the status set to "Red" is imperative for your ElasticSearch applications availability. Also, AWS ElasticSearch service stops taking automatic snapshots while the cluster status is set to "Red" and when this status persists for more than 16 days, permanent data loss can occur.
To identify unhealthy Amazon ElasticSearch (ES) clusters, perform the following actions:
Step 1: Create and configure the Amazon CloudWatch alarm required to send alert notifications whenever your ElasticSearch cluster health status becomes "Red" for more than one minute:
Step 2: Recovering unhealthy Amazon ElasticSearch clusters can be a complex task so you may want the AWS support team to assist. To ask AWS for assistance, create a support case using the Support Center console, as shown in the example below: