About Amazon EMR Releases
This document provides information about an Amazon EMR 4.x software releases. A release is a set of software applications and components which can be installed and configured on an Amazon EMR cluster. Amazon EMR releases are packaged using a system based on Apache BigTop, which is an open source project associated with the Hadoop ecosystem. In addition to Hadoop and Spark ecosystem projects, each Amazon EMR release provides components which enable cluster and resource management, interoperability with other AWS services, and additional configuration optimizations for installed software.
Applications
Each Amazon EMR release contains several distributed applications available for installation on your cluster. Amazon EMR defines each application as not only the set of the components which comprise that open source project but also a set of associated components which are required for that the application to function. When you choose to install an application using the console, API, or CLI, Amazon EMR installs and configures this set of components across nodes in your cluster. The following applications are supported for this release: Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Mahout, Oozie-Sandbox, Pig, Presto-Sandbox, Spark, Sqoop-Sandbox, Zeppelin-Sandbox, and ZooKeeper-Sandbox.

For information about applications and their associated components, see the following sections:
Sandbox Applications
Sandbox applications provide early access to software that is not presently ready for a generally available release in Amazon EMR. You configure sandbox applications through the configuration API provided with Amazon EMR. For more information, see Amazon EMR Sandbox Applications.
Components
The Amazon EMR releases include various components that can be installed by specifying an application which uses them. The versions of these components are typically those found in the community. Amazon EMR makes an effort to make community releases available in a timely fashion. However, there may be a need to make changes to specific components. If those components are modified, they have a release version such as the following:
communityVersion-amzn-emrReleaseVersion
As an example, assume that the component, ExampleComponent1, has
not been modified by Amazon EMR; the version is 1.0, which is the
community version. However, another component,
ExampleComponent2, is modified and its Amazon EMR release
version is 1.0.0-amzn-0.
There are also components provided exclusively by Amazon EMR. For example, the DynamoDB connector
component, emr-ddb, is provided by Amazon EMR for use with applications running on Amazon EMR
clusters. Amazon components have just one version number. For example, an emr-ddb
version is 2.1.0. For more information about using Hive to query
DynamoDB and an example, see Amazon EMR Hive Queries to Accommodate Partial DynamoDB
Schemas.
The following components are included with Amazon EMR:
| Component | Version | Description |
|---|---|---|
| emr-ddb | 3.0.0 | Amazon DynamoDB connector for Hadoop ecosystem applications. |
| emr-goodies | 2.0.0 | Extra convenience libraries for the Hadoop ecosystem. |
| emr-kinesis | 3.1.0 | Amazon Kinesis connector for Hadoop ecosystem applications. |
| emr-s3-dist-cp | 2.3.0 | Distributed copy application optimized for Amazon S3. |
| emrfs | 2.6.0 | Amazon S3 connector for Hadoop ecosystem applications. |
| ganglia-monitor | 3.7.2 | Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. |
| ganglia-metadata-collector | 3.7.2 | Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. |
| ganglia-web | 3.7.1 | Web application for viewing metrics collected by the Ganglia metadata collector. |
| hadoop-client | 2.7.2-amzn-0 | Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. |
| hadoop-hdfs-datanode | 2.7.2-amzn-0 | HDFS node-level service for storing blocks. |
| hadoop-hdfs-library | 2.7.2-amzn-0 | HDFS command-line client and library |
| hadoop-hdfs-namenode | 2.7.2-amzn-0 | HDFS service for tracking file names and block locations. |
| hadoop-httpfs-server | 2.7.2-amzn-0 | HTTP endpoint for HDFS operations. |
| hadoop-kms-server | 2.7.2-amzn-0 | Cryptographic key management server based on Hadoop's KeyProvider API. |
| hadoop-mapred | 2.7.2-amzn-0 | MapReduce execution engine libraries for running a MapReduce application. |
| hadoop-yarn-nodemanager | 2.7.2-amzn-0 | YARN service for managing containers on an individual node. |
| hadoop-yarn-resourcemanager | 2.7.2-amzn-0 | YARN service for allocating and managing cluster resources and distributed applications. |
| hbase-hmaster | 1.2.0 | Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. |
| hbase-region-server | 1.2.0 | Service for serving one or more HBase regions. |
| hbase-client | 1.2.0 | HBase command-line client. |
| hbase-rest-server | 1.2.0 | Service providing a RESTful HTTP endpoint for HBase. |
| hbase-thrift-server | 1.2.0 | Service providing a Thrift endpoint to HBase. |
| hcatalog-client | 1.0.0-amzn-4 | The 'hcat' command line client for manipulating hcatalog-server. |
| hcatalog-server | 1.0.0-amzn-4 | Service providing HCatalog, a table and storage management layer for distributed applications. |
| hcatalog-webhcat-server | 1.0.0-amzn-4 | HTTP endpoint providing a REST interface to HCatalog. |
| hive-client | 1.0.0-amzn-4 | Hive command line client. |
| hive-metastore-server | 1.0.0-amzn-4 | Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. |
| hive-server | 1.0.0-amzn-4 | Service for accepting Hive queries as web requests. |
| hue-server | 3.7.1-amzn-6 | Web application for analyzing data using Hadoop ecosystem applications |
| mahout-client | 0.11.1 | Library for machine learning. |
| mysql-server | 5.5 | MySQL database server. |
| oozie-client | 4.2.0 | Oozie command-line client. |
| oozie-server | 4.2.0 | Service for accepting Oozie workflow requests. |
| presto-coordinator | 0.143 | Service for accepting queries and managing query execution among presto-workers. |
| presto-worker | 0.143 | Service for executing pieces of a query. |
| pig-client | 0.14.0-amzn-0 | Pig command-line client. |
| spark-client | 1.6.1 | Spark command-line clients. |
| spark-history-server | 1.6.1 | Web UI for viewing logged events for the lifetime of a completed Spark application. |
| spark-on-yarn | 1.6.1 | In-memory execution engine for YARN. |
| spark-yarn-slave | 1.6.1 | Apache Spark libraries needed by YARN slaves. |
| sqoop-client | 1.4.6 | Apache Sqoop command-line client. |
| webserver | 2.4 | Apache HTTP server. |
| zeppelin-server | 0.5.6-incubating | Web-based notebook that enables interactive data analytics. |
| zookeeper-server | 3.4.8 | Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. |
| zookeeper-client | 3.4.8 | ZooKeeper command line client. |
Learn More
If you are looking for additional information, see the following guides and sites:
Information about the Amazon EMR service, getting started, and how to launch or manage clusters, specifically for emr-4.0.0 or greater — Amazon EMR Management Guide
Information about Amazon EMR AMI versions 2.x and 3.x — Amazon Elastic MapReduce Developer Guide

