AWS Database Blog

Using Amazon Redshift for Fast Analytical Reports

With digital data growing at an incomprehensible rate, enterprises are finding it difficult to ingest, store, and analyze the data quickly while keeping costs low. Traditional data warehouse systems need constant upgrades in terms of compute and storage to meet these challenges.

In this post, we provide insights into how AWS Premier Partner Wipro helped their customer (a leading US business services company) move their workload from SAP Sybase IQ to Amazon Redshift. This move enabled the company to reap the benefits of scalability and performance without affecting how their end users consume reports.

Current data warehouse environment and challenges faced
The customer was running SAP Sybase IQ for an enterprise data warehouse that contained aggregated data from different internal reporting systems across geographies. Their primary goal was to provide quick and accurate analytics to drive faster business decisions. The user base was distributed globally. Meeting this goal was difficult due to the following challenges:

The data warehouse (5 TB) was growing at over 20 percent year over year (YoY), with higher growth expected in the future. This growth required them to keep upgrading the hardware to meet the storage and compute needs, which was expensive.
The Analytical Dashboard experienced performance issues because of the growing data and user base.
Licensing was based on CPU cores, so adding hardware to support the growth also required additional investment in licenses, further spiraling the costs.

Migrating to Amazon Redshift
Wipro used their Cloud Data Warehouse Readiness Solution (CDRS) strategy to migrate data to Amazon Redshift. Using CDRS, they migrated 4.x billion records to Amazon Redshift. CDRS has a Source Analyzer that created the target data model and generated the data definition language (DDL) for tables that needed to be migrated to Amazon Redshift. The architecture included Talend (a data integration platform) running on Amazon EC2 to extract data from various source systems for ongoing change data capture (CDC) and then load it to Amazon Redshift. Talend has several built-in connectors to connect to various sources and extract the data.

The following diagram shows the architecture of this migration setup:

(more…)

Testing Amazon RDS for Oracle: Plotting Latency and IOPS for OLTP I/O Pattern

Kevin Closson is a principal specialist solutions architect at Amazon Web Services.

At Amazon Web Services (AWS), we take the Amazon Leadership Principles to heart. One such principle is that leaders Earn Trust. Although this principle guides how leaders should act, I like to extend the idea into how we can help customers enjoy more trust and assurance in their deployment choices as they move to cloud computing with Oracle Database in the AWS Cloud.

Amazon Relational Database Service (Amazon RDS) is a managed database service that currently offers a choice of six database engines. Amazon RDS for Oracle is an Oracle Database managed service. It lets Oracle practitioners relinquish many of the administrative tasks related to the “care and feeding” of an Oracle Database deployment, to free up more time to focus on adding value in the application space. At AWS, we refer to this “care and feeding” as undifferentiated heavy lifting.

All Oracle DBAs know how much heavy lifting is involved in the day-to-day administration of their Oracle Database deployments. Moreover, they know the heavy lifting that’s involved with maintenance tasks such as patching, backup and restore, and storage management. Time spent on these DBA tasks takes away from time they can spend focusing on applications. And, after all, it’s applications that help differentiate an enterprise from the competition.

Amazon RDS for Oracle offers a large variety of deployment options, ranging from the underlying EC2 instance and storage types to availability and security options. This post focuses on storage, so we will introduce a method for testing the underlying storage of an Amazon RDS for Oracle instance.

Which I/O testing tool is best?
The underlying storage for RDS instances is Amazon Elastic Block Store (Amazon EBS). The User Guide for Amazon EBS offers a helpful section on benchmarking Amazon EBS volumes. The tools that are mentioned in the EBS User Guide are a) fio, and b) Orion. Both of these tools are useful for testing. However, neither are functional for testing I/O on RDS instances because they both require direct file system access. So, what is the best option for testing physical I/O with Amazon RDS for Oracle? Although it depends on what sort of testing you want to conduct, something as simple as the DBMS_RESOURCE_MANAGER.CALIBRATE_IO package might suffice. However, there are considerations where CALIBRATE_IO is concerned.

The CALIBRATE_IO package of Oracle Database is a necessary package for DBAs. The data gathered and stored internally by the CALIBRATE_IO procedure affects such features as Automatic Degree of Parallelism (Auto DOP) when you are using the Parallel Query Option. That said, CALIBRATE_IO is often referred to as an I/O “bounds test.” When CALIBRATE_IO is probing for maximum random single-block read throughput and latency, it does so at full throttle. That is, CALIBRATE_IO can’t help you detect how predictable I/O latencies are as the I/O demand ramps up. Consider, for example, Figure 1. Figure 1 shows example output from an execution of the CALIBRATE_IO procedure while attached to an Amazon RDS for Oracle instance. This instance is deployed in a db.m4.10xlarge instance-type that’s backed with 6,144 GB Provisioned IOPS (input/output operations per second) volume with 30,000 IOPS provisioned.

SQL

Figure 1: Example CALIBRATE_IO output

(more…)

Get Started with Amazon Elasticsearch Service: Filter Aggregations in Kibana

You figured out how cool Elasticsearch is, and now you’re sending your Apache web logs to Amazon Elasticsearch Service. You pop open Kibana to visualize some data—now what? One thing you can do is use filter aggregations to visualize different portions of your data on the same chart. Doing that helps you detect changes in the way your consumers are interacting with your website.

For the purposes of this post, I’ve loaded sample Apache web log data from NASA from July 1995. If you want to follow along, you can learn how to load this dataset into Amazon Elasticsearch Service by following the steps in this AWS Database blog post.

During July 1995, there were two shuttle missions: STS-71, which landed July 7, and STS-70, which took off July 13 and landed July 22. (Read this article on Wikipedia to find out how woodpeckers delayed the launch of STS-70 until after the landing of STS-71!) We would expect NASA website users’ interest to peak for each of these missions on each of these days. And, the traffic to the website reflects that interest:

(more…)

Using AWS Database Migration Service and Amazon Athena to Replicate and Run Ad Hoc Queries on a SQL Server Database

Prahlad Rao is a solutions architect at Amazon Web Services.

When you replicate a relational database to the cloud, one of the common use cases is to enable additional insights on the replicated data. You can apply the analytics and query-processing capabilities that are available in the AWS Cloud on the replicated data. To replicate the database, you need to set up and configure a target database on Amazon EC2 or Amazon Relational Database Service (Amazon RDS). This can take additional time and configuration, especially if you’re looking to query the data interactively and aggregate data from multiple database sources to a common data store like Amazon S3 for ad hoc queries. This capability is especially useful if you’re building a data lake architecture using Amazon S3 as a central data store, and you want to extract specific datasets from multiple database sources and use them for downstream applications and analytics.

This post demonstrates an easy way to replicate a SQL Server database that’s hosted on an Amazon EC2 instance to an Amazon S3 storage target. You can use AWS Database Migration Service (AWS DMS) and then interactively query data stored on Amazon S3 using Amazon Athena without having to set up a target database instance. You can apply the same procedure when you are replicating a database from an on-premises instance.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Although you can use Athena for many different use cases, it’s important to understand that Athena is not a relational database engine and is not meant as a replacement for relational databases. You can use Athena to process logs and unstructured, semi-structured, and structured datasets. You can also use it to perform ad hoc analysis and run interactive queries for data that’s stored on Amazon S3.

This post breaks down the process into two high-level steps:

Replicate data from a SQL Server database that is stored on an Amazon EC2 instance to an Amazon S3 target using AWS DMS.
Use Amazon Athena to run interactive queries for data that is stored on Amazon S3.

(more…)

Categorizing and Prioritizing a Large-Scale Move to an Open Source Database

The AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) are tools that help facilitate and simplify migrating your commercial database to a variety of engines on Amazon RDS. AWS SCT specifically helps simplify proprietary database migrations to open source. It does this by automatically converting your source database schema and a majority of the custom code to a format that is compatible with your new target database. The custom code that the tool converts includes views, stored procedures, and functions. Any code that the tool cannot convert automatically is clearly marked so that you can convert it yourself. AWS DMS helps you migrate your data easily and securely with minimal downtime.

Changing database engines can be daunting. However, the value proposition of a highly scalable, cost-effective, fully managed service such as Amazon Aurora can make the challenge worth it. Assessing and planning for a single database migration to open source is straightforward with AWS SCT and AWS DMS. You can generate an AWS SCT assessment report and get answers about how easy it is to use these tools to migrate your databases.

The following are just a few of the articles and videos about making the transition to open source for your databases:

What if you have hundreds or even thousands of databases?
Building an assessment report and planning a move for one, two, or even ten databases to open source is a straightforward process. You probably have enough familiarity with applications that attach to those databases that you can identify which ones to move first. What do you do if you have hundreds or thousands of database instances deployed in and around your organization that you don’t have intimate application knowledge about?

If you’re in a centralized IT department, you might know how many databases you manage, have backups scheduled for them, and be tracking them in your inventory. But you might not have enough detail to plan a large-scale move or prioritize them for analysis. This post talks about how to assess your database portfolio and break it down into a manageable pipeline of migrations to make the process go smoothly.

A collaboration of smaller projects
A detailed project plan for a single heterogeneous database migration consists of 12 steps, as detailed in the following table.

Phase	Description	Phase	Description
1	Assessment	7	Functional testing of the entire system
2	Database schema conversion	8	Performance tuning
3	Application conversion/remediation	9	Integration and deployment
4	Scripts conversion	10	Training and knowledge
5	Integration with third-party applications	11	Documentation and version control
6	Data migration	12	Post-production support

At a high level, and for the purposes of this blog, the process for planning a migration project can be grouped into five areas of work: Analysis and Planning, Schema Migration, Data Migration, Application Migration, and Cutover. How long your migration takes generally depends on how long you spend iterating over the migration and testing phases. If you are planning several migrations in tandem, you should understand when you start which ones are likely to take the most time and which ones can be tackled first or in rapid succession.

SCT Workflow

Prioritizing which databases to analyze first when moving to open source often hinges on how much proprietary database code is being used in the databases. The volume of proprietary code in use affects the amount of time it takes to translate it and test it during the migration phases of your project. You can categorize your databases by workload using dictionary queries after you establish the categories and criteria.

(more…)

Monitoring Amazon Aurora Audit Events with Amazon CloudWatch

Chayan Biswas is a product manager at Amazon Web Services.

The Advanced Auditing capability in Amazon Aurora allows you to capture detailed logs of database (DB) activity. You can choose what kind of information is collected: connects, queries, and tables affected by the queries, or any combination thereof. You can review these audit logs to ensure the correctness of changes to your databases and to detect any unwanted activity.

Aurora now enables you to continuously monitor activity in your DB clusters by sending these audit logs to Amazon CloudWatch Logs. In this post, you set up Aurora to send audit logs to CloudWatch Logs and create CloudWatch metrics and alarms to continuously monitor activity in Aurora DB clusters.

Let’s get started. For Aurora to start delivering audit logs to CloudWatch Logs, you need to complete the following steps in order:

Turn on Advanced Auditing in Amazon Aurora.
Give Aurora permissions to send audit logs to CloudWatch Logs.
Enable Aurora to deliver audit logs to CloudWatch Logs.

As a prerequisite, you need a running Aurora DB cluster. To create an Aurora DB cluster, follow the steps in the Amazon Aurora documentation.

Step 1: Turn on Advanced Auditing in Amazon Aurora
If you already have Advanced Auditing turned on for your Aurora cluster, you can skip to Step 2. To turn on Advanced Auditing for your Aurora DB cluster, sign in to the AWS Management Console and open the Amazon RDS console. In the left navigation pane, choose Parameter Groups. Then select the DB Cluster Parameter Group that is associated with your Aurora cluster, and choose Edit Parameters.

Search for “server_audit”—this will list all parameters that contain the search string. In the drop-down list next to server_audit_logging, choose 1. For server_audit_events, enter CONNECT, QUERY, TABLE, and then choose Save Changes.

This turns on Advanced Auditing in your Aurora DB cluster and starts recording all supported audit events. You can change the server_audit_events parameter to select only a subset of audit events.

(more…)

Integrating Teradata with Amazon Redshift Using the AWS Schema Conversion Tool

David Gardner is a solutions architect and Pratim Das is a specialist solutions architect for Analytics at Amazon Web Services.

Teradata provides long-standing data warehouse solutions, with many customers and applications running on its platforms. As companies migrate to the cloud, they are using Amazon Redshift as part of their cloud adoption. Recently AWS announced support for Teradata as a source for the AWS Schema Conversion Tool (AWS SCT). With this capability, you can easily integrate Teradata schemas into an Amazon Redshift model. Once both source and target schemas are in place, you can use AWS SCT to set up agents that collect and migrate data to the target Amazon Redshift schema. In this post, we provide an example of how to integrate these two technologies easily and securely.

Introduction to the use case
At a high level, the architecture looks like the following diagram:

Figure 1: Teradata to Amazon Redshift migration using AWS SCT agents

(more…)

How to Use SSL with the AWS Schema Conversion Tool Data Extractors

In another blog post, we described how to use the AWS Schema Conversion Tool data extractors (AWS SCT data extractors) to migrate your legacy data warehouse to Amazon Redshift. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. The Schema Conversion Tool is provided free of charge by AWS to help you migrate your legacy data warehouse to the cloud.

This blog post gives you a quick overview of how you can use Secure Socket Layer (SSL) encryption with the SCT data extractors. SSL is the standard security technology for establishing an encrypted link between applications. With this link, all data passed between the applications remain private and integral.

If you’re not familiar with the overall process of how to install and configure the SCT extractors, read the blog post How to Migrate Your Data Warehouse to Amazon Redshift Using the AWS Schema Conversion Tool Data Extractors before digging into the SSL setup.

Architecture
You can install the SCT extractors on any servers with connectivity to your source data warehouse. The SCT client application issues commands to the extractors to pull data from the source system, upload the data to Amazon S3, and copy the data into Amazon Redshift. You can install multiple extractors on the same machine. Alternatively, you can separate them as needed across many machines. Which approach you might prefer depends on your available server resources and network capacities. In general, a best practice is to locate the data extractors close to your data warehouse to reduce network traffic and latency.

The extract files produced by the extractors are compressed, using lossless LZO compression, then copied to an S3 bucket. The files are compressed to reduce network transfer time and to reduce storage costs on S3. After the files are uploaded to S3, the extractor triggers Amazon Redshift to copy the data into the target tables using the standard Amazon Redshift COPY command.

Now that we know what the architecture looks like, let’s take a look at the steps involved in setting up SSL for SCT and the data extractors.

In the following example, we use Amazon EC2 running Red Hat Linux to host the extractors. However, you can install the extractors on-premises, and on many platforms, including Microsoft Windows, Ubuntu, macOS, and Red Hat.

The SCT client in this example runs on a Mac laptop. Because the client offloads data processing chores to the extractors, a small desktop or laptop is perfectly suited to run the client.

(more…)

Database Migration Gaining Momentum

Ilia Gilderman is a senior software development manager at Amazon Web Services.

Only five months have passed since our most recent post, in March 2017, about the momentum of AWS Database Migration Service (AWS DMS). So far, we’ve watched our customers migrating over 35,000 unique databases to AWS using AWS DMS.

Here is a quick recap for anyone who is reading about AWS DMS for the first time.

AWS Database Migration Service helps anyone migrate databases to AWS easily and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. AWS DMS can migrate your data to and from most widely used commercial and open-source databases and data warehouses.

In addition to AWS DMS, we developed the AWS Schema Conversion Tool (AWS SCT). AWS SCT makes heterogeneous database migrations easy by automatically converting the source database schema and a majority of the custom code—including views, stored procedures, and functions—to a format that is compatible with the target database. Any code that can’t be automatically converted is clearly marked so that it can be manually converted.

AWS SCT can also scan your application source code for embedded SQL statements and convert them as part of a database schema conversion project. During this process, AWS SCT performs cloud-native code optimization by converting legacy Oracle and SQL Server functions to their equivalent AWS service, thus helping you modernize the applications during the migration. AWS SCT can also help convert schema for a range of data warehouses into Amazon Redshift. When schema conversion is complete, you can migrate the data to Amazon Redshift using built-in data migration agents.

Expanded migration capabilities
We also significantly expanded our data warehouse migration capabilities and now support schema conversion and data extraction for six leading commercial platforms: Teradata, IBM Netezza, Microsoft SQL Server, Oracle, Micro Focus Vertica, and Greenplum. You now can move workloads from these systems into Amazon Redshift using AWS SCT and AWS SCT migration agents. For more details, see Working with Data Warehouses.

Since the March post, we’ve expanded AWS DMS migration and replication capabilities significantly by adding NoSQL source and targets, so you can migrate your databases to Amazon DynamoDB (among other options).

These new NoSQL connectors and migration agents provide a variety of interesting use cases that you can now implement. Most notably:

Replicating data from relational databases to Amazon Redshift, if you’re interested in reducing the workload on your databases and running analytics workloads in Amazon Redshift.
Creating data lakes with the Amazon S3 (One of the implementations is described in the post Using the AWS Database Migration Service, Amazon S3, and AWS Lambda for Database Analytics.) This capability is for customers who are interested in replicating data from multiple sources, such as relational and NoSQL databases and data warehouses.

(more…)

Capturing Data Changes in Amazon Aurora Using AWS Lambda

Re Alvarez-Parmar is a solutions architect at Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

At Amazon, we are constantly improving our services to help customers solve their operational problems. One of the features we announced last year enabled integration between Amazon Aurora and other AWS services through AWS Lambda functions and Amazon S3. In this post, we explore how this integration feature helps extend the functionality of Amazon Aurora.

This post shows how to do the following:

Use Lambda functions with Amazon Aurora to capture data changes in a table.
Export this data to Amazon Athena.
Visualize the data using Amazon QuickSight.

In this post, we build a serverless architecture by using the following services to collect, store, query, and visualize data in Amazon Aurora:

AWS Lambda
Amazon Kinesis Firehose
Amazon S3
Amazon Athena
Amazon QuickSight

Serverless architecture for capturing and analyzing Aurora data changes
Consider a scenario in which an ecommerce web application uses Amazon Aurora for a database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it. In this example, you take the changes in data in an Aurora database table and send it to Amazon QuickSight for real-time dashboard visualization.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis Firehose stream. Kinesis Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Athena. Finally, Amazon QuickSight uses Athena as a data source and provides a visualization of the data in Amazon S3.

(more…)

← Older posts