Data is the cornerstone of successful cloud application deployments. Your evaluation and planning process may highlight the physical limitations inherent to migrating data from on-premises locations into the cloud. Amazon offers a suite of tools to help you move data via networks, roads and technology partners.
The daunting realities of data transport apply to most projects. How do you gracefully move from your current location to your new cloud, with minimal disruption, cost and time? What is the smartest way to actually move your GB, TB or PB of data?
It's a basic underlying problem: how much data can move how far how fast? For a best case scenario use this formula:
Number of Days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)
For example, if you have a T1 connection (1.544Mbps) and 1TB (1024 * 1024 * 1024 * 1024 bytes) to move in or out of AWS the theoretical minimum time it would take to load over your network connection at 80% network utilization is 82 days.
Relax. We’ve done this before. We've found that customers approach this in two ways: they use very basic unmanaged migration tools to move their data or they select one of Amazon's suite of services.
As a general rule of thumb, for best results we suggest:
| Connection | Data Scale | Method |
| Less than 10Mbps | Less than 500GB | Unmanaged |
| More than 10Mbps | More than 500GB | Managed |
There are easy, one-and-done methods to move data at small scales from your site into Amazon's cloud storage.
- rsync. Customers use this open source tool along with 3rd party file system tools to copy data directly into S3 buckets.
- S3 command line interface. Customers use the Amazon S3 CLI to write commands to move data directly into S3 buckets.
- Glacier command line interface. Customers use the Amazon Glacier CLI to move data into Glacier vaults.
The suite of migration services created by Amazon includes multiple dfferent methods that help you manage this task more efficiently. Think about them in two categories:
- Optimizing or Replacing the Internet. One should never understimate the bandwidth of a semi truck filled with disks hurtling down the highway. These methods are ideal for moving large archives, data lakes or in situations where bandwidth and data volumes are simply unrealistic.
- Friendly interfaces to S3. These methods make it simple to use S3 with your existing native applications. Rather than lifting and shifting large datasets at once, these help you integrate existing process flows like backup and recovery or continuous Internet of Things streams directly with cloud storage.
| If you need: | Consider: |
|---|---|
| An optimized or replacement Internet connection to: |
|
Connect directly into an AWS regional datacenter |
AWS Direct Connect |
Migrate petabytes of data in batches to the cloud |
AWS Import/Export Snowball |
Migrate recurring jobs with incremental changes over long distances |
Amazon S3 Transfer Acceleration |
| A friendly interface directly into S3 to: |
|
Cache data locally in a hybrid model (for performance reasons) |
Gateways (AWS or Partner) |
Push backups or archives to the cloud with minimal disruption |
Technology Partnerships |
Collect and ingest multiple streaming data sources |
Amazon Kinesis Firehose |
You can also combine services for optimal results. Consider these examples:
Multiple Snowballs running in parallel
Direct Connect and a Technology Partnership
Direct Connect and Transfer Acceleration
Transfer Acceleration and a Storage Gateway or a Technology Partnership
Need a hand with your specific case? Let us help.
These migration methods enhance or replace the Internet to lift-and-shift data from your current location straight into Amazon's datacenters. No development work or APIs are supported, and transfers may impact your existing on-premises processes.
Explore our Direct Connect Partner Bundles that help extend on-premises technologies to the cloud.
Customer select a Direct Connect dedicated physical connection to accelerate network transfers between their datacenters and ours.
AWS Direct Connect lets you establish a dedicated network connection between your network and one of the AWS Direct Connect locations. Using industry standard 802.1q VLANs, this dedicated connection can be partitioned into multiple virtual interfaces. This allows you to use the same connection to access public resources such as objects stored in Amazon S3 using public IP address space, and private resources such as Amazon EC2 instances running within an Amazon Virtual Private Cloud (VPC) using private IP space, while maintaining network separation between the public and private environments. Virtual interfaces can be reconfigured at any time to meet your changing needs.
Learn more about the Direct Connect service.
Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.
Learn more about the Snowball service.
|
|||||||||||||||||
Amazon S3 Transfer Acceleration makes public Internet transfers to Amazon S3 faster. You can maximize your available bandwidth regardless of distance or varying Internet weather, and there are no special clients or proprietary network protocols. Simply change the endpoint you use with your S3 bucket and acceleration is automatically applied.
This is ideal for recurring jobs that travel across the globe, such as media uploads, backups, and local data processing tasks that are regularly sent to a central location.
Learn more about Transfer Acceleration
Sometimes disruption isn't an option. When legacy data stores can gradually migrate over time, or when new data is aggregating from many non-cloud sources, these services are a good fit. These migration services may leverage or complement existing installations like backup and recovery software or a SAN, and you may also program the AWS Firehose service into your own applications.
A gateway sits on-premises and links your environment to the AWS cloud. It's an ideal solution for hybrid scenarios where some storage is needed locally for performance or compliance reasons, but some may be offloaded to S3.
Consider combining the AWS Direct Connect service with your gateway to ensure optimal performance.
The AWS Storage Gateway service simplifies on-premises adoption of AWS storage. Your existing applications use industry-standard storage protocols to connect to a software appliance which stores your data in Amazon S3 and Amazon Glacier.
- Data is compressed and securely transferred to AWS.
- Storage Area Network (SAN) configurations offer stored or cached devices with point-in-time backups as Amazon EBS snapshots.
- Virtual Tape Library (VTL) configuration works with your existing backup software for cost effective backup in Amazon S3 and long term archival in Amazon Glacier.
Learn more about the AWS Storage Gateway.
AWS has partnered with a number of industry vendors on physical gateway appliances that bridge the gap between traditional backup and cloud. Link existing on-premises data to Amazon's cloud to make the move without impacting performance and preserving existing backup catalogs.
- Seamlessly integrates into existing infrastruture
- May offer deduplication, compression, encryption or WAN acceleration
- Cache recent backups locally, vault everything to the AWS cloud
Learn more about Gateway Partnerships.
Amazon has partnered with industry vendors to make it very easy to bring your backups and archives into the cloud. The simplest way to move your data may be via an S3 connector embedded in your existing backup software. The clear advantage to this approach is that the backup catalog stays consistent, so you maintain visibility and control across jobs that span disk, tape and cloud.
Learn more about technology partnerships and embedded connectivity.
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security. You can easily create a Firehose delivery stream from the AWS Management Console, configure it with a few clicks, and start sending data to the stream from hundreds of thousands of data sources to be loaded continuously to AWS – all in just a few minutes.
Learn more about AWS Kinesis Firehose.