Ensure that all Amazon EMR cluster log files are periodically archived and uploaded to S3 in order to keep the logging data for historical purposes or to track and analyze the EMR clusters behavior for a long period of time.
By default, all EMR log files are automatically deleted from the clusters after the retention period ends. With this feature enabled, Elastic MapReduce uploads the log files from the cluster master instance(s) to Amazon S3 so the logging data (step logs, Hadoop logs, instance state logs, etc) can be utilized later for troubleshooting or compliance purposes. Once active, the EMR service archives and sends the log files to Amazon S3 at 5 minute intervals.
To determine if Amazon EMR clusters captures log data to S3, perform the following:
To enable Amazon EMR cluster logging to S3 you need to clone the required cluster and change its logging configuration by performing the following commands: