Using Amazon S3 Client-Side Encryption in EMRFS
EMRFS support for Amazon S3 client-side encryption enables your EMR cluster to work with S3 objects that were previously encrypted using an Amazon S3 encryption client. When Amazon S3 client-side encryption is enabled, EMRFS supports the decryption of objects encrypted using keys in AWS KMS or from your own key management system. Amazon S3 client-side encryption in EMRFS also supports re-encrypting the output from your EMR cluster using keys from either AWS KMS or your own key management system.
Note
EMRFS client-side encryption only ensures that output written from an enabled cluster to Amazon S3 will be encrypted. Data written to the local file systems and HDFS on the cluster are not encrypted. Furthermore, because Hue does not use EMRFS, objects written to Amazon S3 using the Hue S3 File Browser are not encrypted. For more information about security controls available for applications running on EC2 instances, see the “Overview of Security Processes” white paper.
EMRFS support for Amazon S3 client-side encryption uses a process called envelope encryption, with keys stored in a location of your choosing, to encrypt and decrypt data stored in Amazon S3. In contrast to Amazon S3 server-side encryption, the decryption and encryption actions in Amazon S3 client-side encryption take place in the EMRFS client on your EMR cluster; the encrypted object streams from Amazon S3 to your EMR cluster in an encrypted form to be decrypted by the client on the cluster. Output from the cluster is then encrypted by the client before being written to Amazon S3.
The envelope encryption process uses a one-time symmetric data key generated by the encryption client, unique to each object, to encrypt data. The data key is then encrypted by your master key (stored in AWS KMS or your custom provider) and stored with the associated object in Amazon S3. When decrypting data on the client (e.g., an EMRFS client or your own Amazon S3 encryption client retrieving data for post-processing), the reverse process occurs: the encrypted data key is retrieved from the metadata of the object in Amazon S3. It is decrypted using the master key and then the client uses the data key to decrypt the object data. When Amazon S3 client-side encryption is enabled, the EMRFS client on the cluster can read either encrypted or unencrypted objects in Amazon S3.
When Amazon S3 client-side encryption in EMRFS is enabled, the behavior of the encryption client depends on the provider specified and the metadata of the object being decrypted or encrypted. When EMRFS encrypts an object before writing it to Amazon S3, the provider (e.g., AWS KMS or your custom provider) that you specified at cluster creation time is always used to supply the encryption key. When EMRFS reads an object from Amazon S3, it checks the object metadata for information about the master key used to encrypt the data key. If there is an AWS KMS key ID, EMRFS attempts to decrypt the object using AWS KMS. If there is metadata containing an EncryptionMaterialsDescription instance, EMRFS tries to fetch the key using the EncryptionMaterialsProvider instance. The provider uses this description to determine which key should be used and to retrieve it. If you do not have access to the required key, this raises an exception and causes an error. If there is no EncryptionMaterialsDescription instance in the Amazon S3 object metadata, EMRFS assumes that the object is unencrypted.
Amazon S3 client-side encryption in EMRFS provides two methods to supply the master keys for decryption when reading from Amazon S3 and encryption when writing to Amazon S3:
With a built-in AWS KMS provider, which can use a master key stored in AWS KMS. You specify the key to use for encryption, but EMRFS can use any AWS KMS key for decryption, assuming your cluster has permission to access it. AWS KMS charges apply for the storage and use of encryption keys.
With a custom Java class implementing both the Amazon S3 EncryptionMaterialsProvider and Hadoop Configurable classes. The EncryptionMaterialsProvider class is used to provide the materials description, detailing how and where to get the master keys.
For more information about Amazon S3 client-side encryption see, Protecting Data Using Client-Side Encryption. For more information about how to use the AWS SDK for Java with Amazon S3 client-side encryption, see the article Client-Side Data Encryption with the AWS SDK for Java and Amazon S3.
For information about how to create and manage keys in AWS KMS and associated pricing, see AWS KMS Frequently Asked Questions and the AWS Key Management Service Developer Guide.

