AWS Blog

AWS Storage Update – S3 & Glacier Price Reductions + Additional Retrieval Options for Glacier

by Jeff Barr | on 21 NOV 2016 | in Amazon Glacier, Amazon S3, Price Reduction | Permalink | Comments

Back in 2006, we launched S3 with a revolutionary pay-as-you-go pricing model, with an initial price of 15 cents per GB per month. Over the intervening decade, we reduced the price per GB by 80%, launched S3 in every AWS Region, and enhanced the original one-size-fits-all model with user-driven features such as web site hosting, VPC integration, and IPv6 support, while adding new storage options including S3 Infrequent Access.

Because many AWS customers archive important data for legal, compliance, or other reasons and reference it only infrequently, we launched Glacier in 2012, and then gave you the ability to transition data between S3, S3 Infrequent Access, and Glacier by using lifecycle rules.

Today I have two big pieces of news for you: we are reducing the prices for S3 Standard Storage and for Glacier storage. We are also introducing additional retrieval options for Glacier.

S3 & Glacier Price Reduction
As long-time AWS customers already know, we work relentlessly to reduce our own costs, and to pass the resulting savings along in the form of a steady stream of AWS Price Reductions.

We are reducing the per-GB price for S3 Standard Storage in most AWS regions, effective December 1, 2016. The bill for your December usage will automatically reflect the new, lower prices. Here are the new prices for Standard Storage:

Regions	0-50 TB ($ / GB / Month)	51 – 500 TB ($ / GB / Month)	500+ TB ($ / GB / Month)
US East (Northern Virginia) US East (Ohio) US West (Oregon) EU (Ireland) (Reductions range from 23.33% to 23.64%)	$0.0230	$0.0220	$0.0210
US West (Northern California) (Reductions range from 20.53% to 21.21%)	$0.0260	$0.0250	$0.0240
EU (Frankfurt) (Reductions range from 24.24% to 24.38%)	$0.0245	$0.0235	$0.0225
Asia Pacific (Singapore) Asia Pacific (Tokyo) Asia Pacific (Sydney) Asia Pacific (Seoul) Asia Pacific (Bombay) (Reductions range from 16.36% to 28.13%)	$0.0250	$0.0240	$0.0230

As you can see from the table above, we are also simplifying the pricing model by consolidating six pricing tiers into three new tiers.

We are also reducing the price of Glacier storage in most AWS Regions. For example, you can now store 1 GB for 1 month in the US East (Northern Virginia), US West (Oregon), or EU (Ireland) Regions for just $0.004 (less than half a cent) per month, a 43% decrease. For reference purposes, this amount of storage cost $0.010 when we launched Glacier in 2012, and $0.007 after our last Glacier price reduction (a 30% decrease).

The lower pricing is a direct result of the scale that comes about when our customers trust us with trillions of objects, but it is just one of the benefits. Based on the feedback that I get when we add new features, the real value of a cloud storage platform is the rapid, steady evolution. Our customers often tell me that they love the fact that we anticipate their needs and respond with new features accordingly.

New Glacier Retrieval Options
Many AWS customers use Amazon Glacier as the archival component of their tiered storage architecture. Glacier allows them to meet compliance requirements (either organizational or regulatory) while allowing them to use any desired amount of cloud-based compute power to process and extract value from the data.

Today we are enhancing Glacier with two new retrieval options for your Glacier data. You can now pay a little bit more to expedite your data retrieval. Alternatively, you can indicate that speed is not of the essence and pay a lower price for retrieval.

We launched Glacier with a pricing model for data retrieval that was based on the amount of data that you had stored in Glacier and the rate at which you retrieved it. While this was an accurate reflection of our own costs to provide the service, it was somewhat difficult to explain. Today we are replacing the rate-based retrieval fees with simpler per-GB pricing.

Our customers in the Media and Entertainment industry archive their TV footage to Glacier. When an emergent situation calls for them to retrieve a specific piece of footage, minutes count and they want fast, cost-effective access to the footage. Healthcare customers are looking for rapid, “while you wait” access to archived medical imagery and genome data; photo archives and companies selling satellite data turn out to have similar requirements. On the other hand, some customers have the ability to plan their retrievals ahead of time, and are perfectly happy to get their data in 5 to 12 hours.

Taking all of this in to account, you can now select one of the following options for retrieving your data from Glacier (The original rate-based retrieval model is no longer applicable):

Standard retrieval is the new name for what Glacier already provides, and is the default for all API-driven retrieval requests. You get your data back in a matter of hours (typically 3 to 5), and pay $0.01 per GB along with $0.05 for every 1,000 requests.

Expedited retrieval addresses the need for “while you wait access.” You can get your data back quickly, with retrieval typically taking 1 to 5 minutes. If you store (or plan to store) more than 100 TB of data in Glacier and need to make infrequent, yet urgent requests for subsets of your data, this is a great model for you (if you have less data, S3’s Infrequent Access storage class can be a better value). Retrievals cost $0.03 per GB and $0.01 per request.

Retrieval generally takes between 1 and 5 minutes, depending on overall demand. If you need to get your data back in this time frame even in rare situations where demand is exceptionally high, you can provision retrieval capacity. Once you have done this, all Expedited retrievals will automatically be served via your Provisioned capacity. Each unit of Provisioned capacity costs $100 per month and ensures that you can perform at least 3 Expedited Retrievals every 5 minutes, with up to 150 MB/second of retrieval throughput.

Bulk retrieval is a great fit for planned or non-urgent use cases, with retrieval typically taking 5 to 12 hours at a cost of $0.0025 per GB (75% less than for Standard Retrieval) along with $0.025 for every 1,000 requests. Bulk retrievals are perfect when you need to retrieve large amounts of data within a day, and are willing to wait a few extra hours in exchange for a very significant discount.

If you do not specify a retrieval option when you call InitiateJob to retrieve an archive, a Standard Retrieval will be initiated. Your existing jobs will continue to work as expected, and will be charged at the new rate.

To learn more, read about Data Retrieval in the Glacier FAQ.

As always, I am thrilled to be able to share this news with you, and I hope that you are equally excited!

— Jeff;

CloudTrail Update – Capture and Process Amazon S3 Object-Level API Activity

by Jeff Barr | on 21 NOV 2016 | in Amazon CloudTrail, Amazon CloudWatch, Amazon S3, AWS Lambda | Permalink | Comments

I would like to show you how several different AWS services can be used together to address a challenge faced by many of our customers. Along the way I will introduce you to a new AWS CloudTrail feature that launches today and show you how you can use it in conjunction with CloudWatch Events.

The Challenge
Our customers store many different types of mission-critical data in Amazon Simple Storage Service (S3) and want to be able to track object-level activity on their data. While some of this activity is captured and stored in the S3 access logs, the level of detail is limited and log delivery can take several hours. Customers, particularly in financial services and other regulated industries, are asking for additional detail, delivered on a more timely basis. For example, they would like to be able to know when a particular IAM user accesses sensitive information stored in a specific part of an S3 bucket.

In order to meet the needs of these customers, we are now giving CloudTrail the power to capture object-level API activity on S3 objects, which we call Data events (the original CloudTrail events are now called Management events). Data events include “read” operations such as GET, HEAD, and Get Object ACL as well as “write” operations such as PUT and POST. The level of detail captured for these operations is intended to provide support for many types of security, auditing, governance, and compliance use cases. For example, it can be used to scan newly uploaded data for Personally Identifiable Information (PII), audit attempts to access data in a protected bucket, or to verify that the desired access policies are in effect.

Processing Object-Level API Activity
Putting this all together, we can easily set up a Lambda function that will take a custom action whenever an S3 operation takes place on any object within a selected bucket or a selected folder within a bucket.

Before starting on this post, I created a new CloudTrail trail called jbarr-s3-trail:

I want to use this trail to log object-level activity on one of my S3 buckets (jbarr-s3-trail-demo). In order to do this I need to add an event selector to the trail. The selector is specific to S3, and allows me to focus on logging the events that are of interest to me. Event selectors are a new CloudTrail feature and are being introduced as part of today’s launch, in case you were wondering.

I indicate that I want to log both read and write events, and specify the bucket of interest. I can limit the events to part of the bucket by specifying a prefix, and I can also specify multiple buckets. I can also control the logging of Management events:

CloudTrail supports up to 5 event selectors per trail. Each event selector can specify up to 50 S3 buckets and optional bucket prefixes.

I set this up, opened my bucket in the S3 Console, uploaded a file, and took a look at one of the entries in the trail. Here’s what it looked like:

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "Root",
    "principalId": "99999999999",
    "arn": "arn:aws:iam::99999999999:root",
    "accountId": "99999999999",
    "username": "jbarr",
    "sessionContext": {
      "attributes": {
        "creationDate": "2016-11-15T17:55:17Z",
        "mfaAuthenticated": "false"
      }
    }
  },
  "eventTime": "2016-11-15T23:02:12Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "PutObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "72.21.196.67",
  "userAgent": "[S3Console/0.4]",
  "requestParameters": {
    "X-Amz-Date": "20161115T230211Z",
    "bucketName": "jbarr-s3-trail-demo",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "storageClass": "STANDARD",
    "cannedAcl": "private",
    "X-Amz-SignedHeaders": "Content-Type;Host;x-amz-acl;x-amz-storage-class",
    "X-Amz-Expires": "300",
    "key": "ie_sb_device_4.png"
  }

Then I create a simple Lambda function:

Next, I create a CloudWatch Events rule that matches the function name of interest (PutObject) and invokes my Lambda function (S3Watcher):

Now I upload some files to my bucket and check to see that my Lambda function has been invoked as expected:

I can also find the CloudWatch entry that contains the output from my Lambda function:

Pricing and Availability
Data events are recorded only for the S3 buckets that you specify, and are charged at the rate of $0.10 per 100,000 events. This feature is available in all commercial AWS Regions.

— Jeff;

AWS Price Reduction – CloudWatch Metrics

by Jeff Barr | on 21 NOV 2016 | in Amazon CloudWatch, Price Reduction | Permalink | Comments

Back in 2011 I introduced you to Custom Metrics for CloudWatch and showed you how to publish them from your applications and scripts. At that time, the first ten custom metrics were free of charge and additional metrics were $0.50 per metric per month, regardless of the number of metrics that you published.

Today, I am happy to announce a price change and a quantity discount for CloudWatch metrics. Based on the number of metrics that you publish every month, you can realize savings of up to 96%. Here is the new pricing for the US East (Northern Virginia) Region (the first ten metrics are still free of charge):

Tier	From	To	Price Per Metric Per Month	Discount Over Current Price
First 10,000 Metrics	0	10,000	$0.30	40%
Next 240,000 Metrics	10,001	250,000	$0.10	80%
Next 750,000 Metrics	250,001	1,000,000	$0.05	90%
All Remaining Metrics	1,000,001	–	$0.02	96%

If you have EC2 Detailed Monitoring enabled you will also see a price reduction with per-month charges reduced from $3.50 per instance per month to $2.10 or lower based on the volume tier. The new prices will take effect on December 1, 2016 with no effort on your part. At that time, the updated prices will be published on the CloudWatch Pricing page.

By the way, if you are using CloudWatch Metrics, be sure to take advantage of other recently announced features such as Extended Metrics Retention, the CloudWatch Plugin for Collectd, CloudWatch Dashboards, and the new Metrics-to-Logs navigation feature.

— Jeff;

New – Web Access for Amazon WorkSpaces

by Jeff Barr | on 18 NOV 2016 | in Amazon WorkSpaces | Permalink | Comments

We launched WorkSpaces in late 2013 (Amazon WorkSpaces – Desktop Computing in the Cloud) and have been adding new features at a rapid clip. Here are some highlights from 2016:

November 2016 – WorkSpaces adds GPU-Powered Graphics Bundles.
October 2016 – WorkSpaces becomes available in the EU (Frankfurt) Region.
August 2016 – WorkSpaces offers hourly pricing for all WorkSpaces bundles & AWS Marketplace for Desktop Apps in the Asia Pacific (Singapore) Region.
July 2016 – WorkSpaces allows you to bring your own Windows 10 desktop licenses.
June 2016 – WorkSpaces now come with larger root volumes.
May 2016 – WorkSpaces support tagging.
April 2016 – AWS Marketplace for Desktop Apps in the EU (Ireland) Region.
February 2016 – WorkSpaces Application Manager now available in the Asia Pacific (Sydney) and Asia Pacific (Singapore) Regions.
January 2016 – Support for audio-in, high-DPI devices, and saved registrations.

Today we are adding to this list with the addition of Amazon WorkSpaces Web Access. You can now access your WorkSpace from recent versions of Chrome or Firefox running on Windows, Mac OS X, or Linux. You can now be productive on heavily restricted networks and in situations where installing a WorkSpaces client is not an option. You don’t have to download or install anything, and you can use this from a public computer without leaving any private or cached data behind.

To use Amazon WorkSpaces Web Access, simply visit the registration page using a supported browser and enter the registration code for your WorkSpace:

Then log in with your user name and password:

And here you go (yes, this is IE and Firefox running on WorkSpaces, displayed in Chrome):

This feature is available for all new WorkSpaces that are running the Value, Standard, or Performance bundles or their Plus counterparts. You can access it at no additional charge after your administrator enables it:

Existing WorkSpaces must be rebuilt and custom images must be refreshed in order to take advantage of Web Access.

— Jeff;

New for AWS Lambda – Environment Variables and Serverless Application Model (SAM)

by Jeff Barr | on 18 NOV 2016 | in AWS Lambda | Permalink | Comments

I am thrilled by all of the excitement that I see around AWS Lambda and serverless application development. I have shared many serverless success stories, tools, and open source projects in the AWS Week in Review over the last year or two.

Today I would like to tell you about two important additions to Lambda: environment variables and the new Serverless Application Model.

Environment Variables
Every developer likes to build code that can be used in more than one environment. In order to do this in a clean and reusable fashion, the code should be able to accept configuration values at run time. The configuration values customize the environment for the code: table names, device names, file paths, and so forth. For example, many projects have distinct configurations for their development, test, and production environments.

You can now supply environment variables to your Lambda functions. This allows you to effect configuration changes without modifying or redeploying your code, and should make your serverless application development even more efficient. Each environment variable is a key/value pair. The keys and the values are encrypted using AWS Key Management Service (KMS) and decrypted on an as-needed basis. There’s no per-function limit on the number of environment variables, but the total size can be no more than 4 kb.

When you create a new version of a Lambda function, you also set the environment variables for that version of the function. You can modify the values for the latest version of the function, but not for older versions. Here’s how I would create a simple Python function, set some environment variables, and then reference them from my code (note that I had to import the os library):

There’s no charge for this feature if you use the default service key provided by Lambda (the usual per-request KMS charges apply if you choose to use your own key).

To learn more and to get some ideas for other ways to make use of this new feature, read Simplify Serverless Applications With Lambda Environment Variables on the AWS Compute Blog.

AWS Serverless Application Model
Lambda functions, Amazon API Gateway resources, and Amazon DynamoDB tables are often used together to build serverless applications. The new AWS Serverless Application Model (AWS SAM) allows you describe all of these components using a simplified syntax that is natively supported by AWS CloudFormation. In order to use this syntax, your CloudFormation template must include a Transform section (this is a new aspect of CloudFormation) that looks like this:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

The remainder of the template is used to specify the Lambda functions, API Gateway endpoints & resources, and DynamoDB tables. Each function declaration specifies a handler, a runtime, and a URI to a ZIP file that contains the code for the function.

APIs can be declared implicitly by defining events, or explicitly, by providing a Swagger file.

DynamoDB tables are declared using a simplified syntax that requires just a table name, a primary key (name and type), and the provisioned throughput. The full range of options is also available for you to use if necessary.

You can now generate AWS SAM files and deployment packages for your Lamba functions using a new Export operation in the Lambda Console. Simply click on the Actions menu and select Export function:

Then click on Download AWS SAM file or Download deployment package:

Here is the AWS SAM file for my function:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: A starter AWS Lambda function.
Resources:
  ShowEnv:
    Type: 'AWS::Serverless::Function'
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: python2.7
      CodeUri: .
      Description: A starter AWS Lambda function.
      MemorySize: 128
      Timeout: 3
      Role: 'arn:aws:iam::99999999999:role/LambdaGeneralRole'

The deployment package is a ZIP file with the code for my function inside. I would simply upload the file to S3 and update the CodeUri in the SAM file in order to use it as part of my serverless application. You can do this manually or you can use a pair of new CLI commands (aws cloudformation package and aws cloudformation deploy) to automate it. To learn more about this option, read the section on Deploying a Serverless app in the new Introducing Simplified Serverless Application Management and Deployment post.

You can also export Lambda function blueprints. Simply click on the download link in the corner:

And click on Download blueprint:

The ZIP file contains the AWS SAM file and the code:

To learn more and to see this new specification in action, read Introducing Simplified Serverless Application Management and Deployment on the AWS Compute Blog.

— Jeff;

New – Auto Scaling for EMR Clusters

by Jeff Barr | on 18 NOV 2016 | in Amazon Elastic MapReduce | Permalink | Comments

The Amazon EMR team is cranking out new features at an impressive pace (guess they have lots of worker nodes)! So far this quarter they have added all of these features:

September – Data Encryption for Apache Spark, Tez, and Hadoop MapReduce.
September – Open-sourced EMR-DynamoDB Connector for Apache Hive.
November – Stream Processing at Scale with Apache Flink.
November – Fine-grained Access Control Using Cluster Tags.

Today we are adding to this list with the addition of automatic scaling for EMR clusters. You can now use scale out and scale in policies to adjust the number of core and task nodes in your clusters in response to changing workloads and to optimize your resource usage:

Scale out Policies add additional capacity and allow you to tackle bigger problems. Applications like Apache Spark and Apache Hive will automatically take advantage of the increased processing power as it comes online.

Scale in Policies remove capacity, either at the end of an instance billing hour or as tasks complete. If a node is removed while it is running a YARN container, YARN will rerun that container on another node (read Configure Cluster Scale-Down Behavior for more info).

Using Auto Scaling
In order to make use of Auto Scaling, an IAM role that give Auto Scaling permission to launch and terminate EC2 instances must be associated with your cluster. If you create a cluster from the EMR Console, it will create the EMR_AutoScaling_DefaultRole for you. You can use it as-is or customize it as needed. If you create a cluster programmatically or via the command-line, you will need to create it yourself. You can also create the default roles from the command line like this:

$ aws emr create-default-roles

From the console, you can edit the Auto Scaling policies by clicking on the Advanced Options when you create your cluster:

Simply click on the pencil icon to begin editing your policy. Here’s my Scale out policy:

Because this policy is driven by YARNMemoryAvailablePercentage, it will be activated under low-memory conditions when I am running a YARN-based framework such as Spark, Tez, or Hadoop MapReduce. I can choose many other metrics as well; here are some of my options:

And here’s my Scale in policy:

I can choose from the same set of metrics, and I can set a Cooldown period for each policy. This value sets the minimum amount of time between scaling activities, and allows the metrics to stabilize as the changes take effect.

Default policies (driven by YARNMemoryAvailablePercentage and ContainerPendingRatio) are also available in the console.

Available Now
To learn more about Auto Scaling, read about Scaling Cluster Resources in the EMR Management Guide.

This feature is available now and you start using it today. Simply select emr-5.1.0 from the Release menu to get started!

— Jeff;

Human Longevity, Inc. – Changing Medicine Through Genomics Research

by Jeff Barr | on 18 NOV 2016 | in Amazon EC2, Amazon EMR, Amazon S3, AWS OpsWorks, Customer Success | Permalink | Comments

Human Longevity, Inc. (HLI) is at the forefront of genomics research and wants to build the world’s largest database of human genomes along with related phenotype and clinical data, all in support of preventive healthcare. In today’s guest post, Yaron Turpaz, Bryan Coon, and Ashley Van Zeeland talk about how they are using AWS to store the massive amount of data that is being generated as part of this effort to revolutionize medicine.

— Jeff;

When Human Longevity, Inc. launched in 2013, our founders recognized the challenges ahead. A genome contains all the information needed to build and maintain an organism; in humans, a copy of the entire genome, which contains more than three billion DNA base pairs, is contained in all cells that have a nucleus. Our goal is to sequence one million genomes and deliver that information—along with integrated health records and disease-risk models—to researchers and physicians. They, in turn, can interpret the data to provide targeted, personalized health plans and identify the optimal treatment for cancer and other serious health risks far earlier than has been possible in the past. The intent is to transform medicine by fostering preventive healthcare and risk prevention in place of the traditional “sick care” model, when people wind up seeing their doctors only after symptoms manifest.

Our work in developing and applying large-scale computing and machine learning to genomics research entails the collection, analysis, and storage of immense amounts of data from DNA-sequencing technology provided by companies like Illumina. Raw data from a single genome consumes about 100 gigabytes; that number increases as we align the genomic information with annotation and phenotype sources and analyze it for health insights.

From the beginning, we knew our choice of compute and storage technology would have a direct impact on the success of the company. Using the cloud was clearly the best option. We’re experts in genomics, and don’t want to spend resources building and maintaining an IT infrastructure. We chose to go all in on AWS for the breadth of the platform, the critical scalability we need, and the expertise AWS has developed in big data. We also saw that the pace of innovation at AWS—and its deliberate strategy of keeping costs as low as possible for customers—would be critical in enabling our vision.

Leveraging the Range of AWS Services

Spectral karyotype analysis / Image courtesy of Human Longevity, Inc.

Today, we’re using a broad range of AWS services for all kinds of compute and storage tasks. For example, the HLI Knowledgebase leverages a distributed system infrastructure comprised of Amazon S3 storage and a large number of Amazon EC2 nodes. This helps us achieve resource isolation, scalability, speed of provisioning, and near real-time response time for our petabyte-scale database queries and dynamic cohort builder. The flexibility of AWS services makes it possible for our customized Amazon Machine Images and pre-built, BTRFS-partitioned Amazon EBS volumes to achieve turn-up time in seconds instead of minutes. We use Amazon EMR for executing Spark queries against our data lake at the scale we need. AWS Lambda is a fantastic tool for hooking into Amazon S3 events and communicating with apps, allowing us to simply drop in code with the business logic already taken care of. We use Auto Scaling based on demand, and AWS OpsWorks for managing a Docker pipeline.

We also leverage the cost controls provided by Amazon EC2 Spot and Reserved Instance types. When we first started, we used on-demand instances, but the costs started to grow significantly. With Spot and Reserved Instances, we can allocate compute resources based on specific needs and workflows. The flexibility of AWS services enables us to make extensive use of dockerized containers through the resource-management services provided by Apache Mesos. Hundreds of dynamic Amazon EC2 nodes in both our persistent and spot abstraction layers are dynamically adjusted to scale up or down based on usage demand and the latest AWS pricing information. We achieve substantial savings by sharing this dynamically scaled compute cluster with our Knowledgebase service and the internal genomic and oncology computation pipelines. This flexibility gives us the compute power we need while keeping costs down. We estimate these choices have helped us reduce our compute costs by up to 50 percent from the on-demand model.

We’ve also worked with AWS Professional Services to address a particularly hard data-storage challenge. We have genomics data in hundreds of Amazon S3 buckets, many of them in the petabyte range and containing billions of objects. Within these collections are millions of objects that are unused, or used once or twice and never to be used again. It can be overwhelming to sift through these billions of objects in search of one in particular. It presents an additional challenge when trying to identify what files or file types are candidates for the Amazon S3-Infrequent Access storage class. Professional Services helped us with a solution for indexing Amazon S3 objects that saves us time and money.

Moving Faster at Lower Cost
Our decision to use AWS came at the right time, occurring at the inflection point of two significant technologies: gene sequencing and cloud computing. Not long ago, it took a full year and cost about $100 million to sequence a single genome. Today we can sequence a genome in about three days for a few thousand dollars. This dramatic improvement in speed and lower cost, along with rapidly advancing visualization and analytics tools, allows us to collect and analyze vast amounts of data in close to real time. Users can take that data and test a hypothesis on a disease in a matter of days or hours, compared to months or years. That ultimately benefits patients.

Our business includes HLI Health Nucleus, a genomics-powered clinical research program that uses whole-genome sequence analysis, advanced clinical imaging, machine learning, and curated personal health information to deliver the most complete picture of individual health. We believe this will dramatically enhance the practice of medicine as physicians identify, treat, and prevent diseases, allowing their patients to live longer, healthier lives.

— Yaron Turpaz (Chief Information Officer), Bryan Coon (Head of Enterprise Services), and Ashley Van Zeeland (Chief Technology Officer).

Learn More
Learn more about how AWS supports genomics in the cloud, and see how genomics innovator Illumina uses AWS for accelerated, cost-effective gene sequencing.

Attention Developers – Public Preview of Amazon WorkDocs SDK Now Available

by Jeff Barr | on 17 NOV 2016 | in Amazon WorkDocs, Developers | Permalink | Comments

I am a heavy-duty user and a big fan of Amazon WorkDocs. With AWS re:Invent just days away, I have nearly two dozen draft blog posts underway. I use WorkDocs to make sure that all of the interested parties are reviewing and commenting on the most recent version of each draft.

Today I am happy announce that we are launching a public preview of an Administrative SDK for WorkDocs. I have been looking forward to this announcement and can’t wait to build some tools to streamline my blogging and reviewing workflow. This SDK opens the doors to many types of value-added integration including advanced content management, document migration, virus scanning, data-loss prevention, and ediscovery.

The SDK provides full, administrator-level access to the resources contained within a WorkDocs site. You can build applications that manage users, content, and permissions and sell them on AWS Marketplace for deployment through the WorkDocs administrator console.

Resources and Actions
The Administrative SDK gives you Create, Read, Update, and Delete actions on WorkDocs users, folders, files, and permissions along with the ability to subscribe to notifications that are sent when an action is performed on them. Permission to access specific functions and resources is granted by AWS Identity and Access Management (IAM).

Here’s an overview of the functions provided by the SDK:

Users	Folders	Documents	Permissions	Notifications
Create User Activate User Describe Users Update User Delete User	Create Folder Get Folder Get Folder Path Update Folder Delete Folder Describe Folder Delete Folder Contents	Get Document Delete Document Get Document Path Get Document Version Describe Document Versions Initiate Document Version Upload Abort Document Version Upload Update Document Version	Add Resource Permissions Describe Resource Permissions Remove Resource Permission Remove All Resource Permissions	Subscribe to Notifications Unsubscribe from Notifications

The SDK is available for Java and Python developers and works in all six AWS Regions where WorkDocs is available. The download is free and there is no charge for calls to the API during the Public Preview period.

Developers Wanted
During the Public Preview, we are looking for developers who are ready to commit engineering resources to the construction of a Proof of Concept application that uses the SDK, and who are willing to meet with the WorkDocs team to provide status updates and share feedback.

If you have an idea for a great application and would like to apply for the Public Preview, sign up today.

— Jeff;

Amazon CloudWatch Update – Percentile Statistics and New Dashboard Widgets

by Jeff Barr | on 17 NOV 2016 | in Amazon CloudWatch | Permalink | Comments

There sure is a lot going on with Amazon CloudWatch these days! Earlier this month I showed you how to Jump From Metrics to Associated Logs and told you about Extended Metrics Retention and the User Interface Update.

Today we are improving CloudWatch yet again, adding percentile statistics and two new dashboard widgets. Time is super tight due to AWS re:Invent, so I’ll be brief!

Percentile Statistics
When you run a web site or a cloud application at scale, you need to make sure that you are delivering the expected level of performance to the vast majority of your customers. While it is always a good idea to watch the numerical averages, you may not be getting the whole picture. The average may mask some performance outliers and you might not be able to see, for example, that 1% of your customers are not having a good experience.

In order to understand and visualize performance and behavior in a way that properly conveys the customer experience, percentiles are a useful tool. For example, you can use percentiles to know that 99% of the requests to your web site are being satisfied within 1 second. At Amazon, we use percentiles extensively and now you can do the same. We prefix them with a “p” and express our goals and observed performance in terms of the p90, p99, and p100 (worst case) response times for sites and services. Over the years we have found that responses in the long tail (p99 and above) can be used to detect database hot spots and other trouble spots.

Percentiles are supported for EC2, RDS, and Kinesis as well as for newly created Elastic Load Balancers and Application Load Balancers. They are also available for custom metrics. You can display the percentiles in CloudWatch (including Custom Dashboards) and you can also set alarms.

Percentiles can be displayed in conjunction with other metrics. For example, the orange and green lines indicate p90 and p95 CPU Utilization:

You can set any desired percentile in the CloudWatch Console:

Read Elastic Load Balancing: Support for CloudWatch Percentile Metrics to learn more about how to use the new percentile metrics to gain additional visibility into the performance of your applications.

New Dashboard Widgets
You can now add Stacked Area and Number widgets to your CloudWatch Custom Dashboards:

Here’s a Stacked Area widget with my network traffic:

And here’s a Number widget with some EC2 and EBS metrics:

Available Now
These new features are now available in all AWS Regions and you can start using them today!

— Jeff;

New for Amazon Simple Queue Service – FIFO Queues with Exactly-Once Processing & Deduplication

by Jeff Barr | on 17 NOV 2016 | in Amazon Simple Queue Service (SQS), Price Reduction | Permalink | Comments

As the very first member of the AWS family of services, Amazon Simple Queue Service (SQS) has certainly withstood the test of time! Back in 2004, we described it as a “reliable, highly scalable hosted queue for buffering messages between distributed application components.” Over the years, we have added many features including a dead letter queue, 256 KB payloads, SNS integration, long polling, batch operations, a delay queue, timers, CloudWatch metrics, and message attributes.

New FIFO Queues
Today we are making SQS even more powerful and flexible with support for FIFO (first-in, first-out) queues. We are rolling out this new type of queue in two regions now, and plan to make it available in many others in early 2017.

These queues are designed to guarantee that messages are processed exactly once, in the order that they are sent, and without duplicates. We expect that FIFO queues will be of particular value to our financial services and e-commerce customers, and to those who use messages to update database tables. Many of these customers have systems that depend on receiving messages in the order that they were sent.

FIFO ordering means that, if you send message A, wait for a successful response, and then send message B, message B will be enqueued after message A, and then delivered accordingly. This ordering does not apply if you make multiple SendMessage calls in parallel. It does apply to the individual messages within a call to SendMessageBatch, and across multiple consecutive calls to SendMessageBatch.

Exactly-once processing applies to both single-consumer and multiple-consumer scenarios. If you use FIFO queues in a multiple-consumer environment, you can configure your queue to make messages visible to other consumers only after the current message has been deleted or the visibility timeout expires. In this scenario, at most one consumer will actively process messages; the other consumers will be waiting until the first consumer finishes or fails.

Duplicate messages can sometimes occur when a networking issue outside of SQS prevents the message sender from learning the status of an action and causes the sender to retry the call. FIFO queues use multiple strategies to detect and eliminate duplicate messages. In addition to content-based deduplication, you can include a MessageDeduplicationId when you call SendMessage for a FIFO queue. The ID can be up to 128 characters long, and, if present, takes higher precedence than content-based deduplication.

When you call SendMessage for a FIFO queue, you can now include a MessageGroupId. Messages that belong to the same group (as indicated by the ID) are processed in order, allowing you to create and process multiple, ordered streams within a single queue and to use multiple consumers while keeping data from multiple groups distinct and ordered.

You can create standard queues (the original queue type) or the new FIFO queues using the CreateQueue function, the create-queue command, or the AWS Management Console. The same API functions apply to both types of queues, but you cannot convert one queue type into the other.

Although the same API calls apply to both queue types, the newest AWS SDKs and SQS clients provide some additional functionality. This includes automatic, idempotent retries of failed ReceiveMessage calls.

Individual FIFO queues can handle up to 300 send, receive, or delete requests per second.

Some SQS Resources
Here are some resources to help you to learn more about SQS and the new FIFO queues:

If you’re coming to Las Vegas for AWS re:Invent and would like to hear more about how AWS customer Capital One is making use of SQS and FIFO queues, register and plan to attend ENT-217, Migrating Enterprise Messaging to the Cloud on Wednesday, November 30 at 3:30 PM.

Available Now
FIFO queues are available now in the US East (Ohio) and US West (Oregon) regions and you can start using them today. If you are running in US East (Northern Virginia) and want to give them a try, you can create them in US East (Ohio) and take advantage of the low-cost, low-latency connectivity between the regions.

As part of today’s launch, we are also reducing the price for standard queues by 20%. For the updated pricing, take a look at the SQS Pricing page.

— Jeff;