File servers on Compute Engine

A file server, also called a storage filer, provides a way for applications to read and update files that are shared across machines. Some file solutions are scale-up, and consist of storage attached to a single VM. Some solutions are scale-out, and consist of a cluster of VMs with storage attached that present a single filesystem namespace to applications.

Although some file systems use a native POSIX client, many file servers use a protocol that enables client machines to mount a filesystem and access the files as if they were hosted locally. The most common protocols for exporting file shares are Network File System (NFS) for Linux and the Common Internet File System (CIFS) or Server Message Block (SMB) for Windows.

This solution describes several options for sharing files:

Compute Engine persistent disks
Single Node File Server
Elastifile
Quobyte
Avere vFXT

An underlying factor in the performance and predictability of all of the Google Cloud Platform services is the network stack that Google has evolved over many years. With the Jupiter Fabric, Google has built a robust, scalable, and stable networking stack that can continue to evolve without affecting your workloads. As Google improves and bolsters its network abilities internally, your file-sharing solution will benefit from the added performance. For more details on the Jupiter Fabric, see the 2015 paper that describes its evolution.

One feature of GCP that can help you get the most out of your investment is the ability to specify Custom VM types. When choosing the size of your filer, you can pick exactly the right mix of memory and CPU, so that your filer is operating at optimal performance without being oversubscribed.

Further, it is important to choose the correct Compute Engine persistent disk capacity and number of vCPUs to ensure that your file server's storage devices receive the required storage bandwidth and IOPs as well as network bandwidth. A VM receives 2 Gb/s of network throughput. for every vCPU (up to the max). For tuning persistent disk, see Optimizing Persistent Disk and Local SSD Performance.

Note that Cloud Storage is also a great way to store petabytes of data with high levels of redundancy at a low cost, but Cloud Storage has a different performance profile and API than the file servers discussed here.

Compute Engine persistent disks

If you have data that only needs to be accessed by a single VM or doesn't change over time, you might be able to use Compute Engine's persistent disks, and avoid a file server altogether. With persistent disks, you can format them with a file system such as Ext4 or XFS and attach volumes in either read-write or read-only modes. This means that you can first attach a volume to an instance, load it with the data you need, and then attach it as a read-only disk to hundreds of virtual machines simultaneously. Employing read-only persistent disks does not work for all use cases, but it can greatly reduce complexity, compared to using a file server.

Compute Engine's persistent disks are a great way to store data in Google Cloud Platform (GCP), because they give you flexibility in balancing scale and performance against cost. Persistent disks can also be resized on the fly, allowing you to start with a low cost and low capacity volume, and not requiring you to spin up additional instances or disks to scale your capacity. Persistent disk throughput and IOPS scale linearly with disk capacity (and with vCPUs for SSD persistent disks). That means you can scale your performance by doing a resize, which requires little to no downtime.

Another advantage of persistent disks is their consistent performance. All disks of the same size (and same number of vCPUs for SSD persistent disks) that you attach to your instance have the same performance characteristics. You don't need to pre-warm or test your persistent disks before using them in production.

Performance is not the only thing that is easy to predict: the cost of persistent disks is easy to determine, because there are no I/O costs to consider after provisioning your volume. You can easily balance cost and performance, because you have the option of using three different types of disks with varying costs and performance characteristics.

If total capacity is the main requirement, you can use low-cost standard persistent disks. For the best performance while continuing to be durable, you can use the SSD persistent disks.

If your data is ephemeral and requires sub-millisecond latency and high IOPS, you can leverage up to 3 TB of local SSDs for extreme performance. Local SSDs allow for up to ~700k IOPS with speeds similar to DDR2 RAM, all while not using up your instances’ allotted network capacity.

For a comparison of the many disk types available to Compute Engine instances, see the documentation for block storage.

Considerations when choosing a filer solution

Choosing a filer solution requires you to make tradeoffs regarding cost, performance, and scalability. Making the decision is easier if you have a well defined workload, but unfortunately that often isn't the case. Where workloads evolve over time or are highly variant, it is prudent to trade cost savings for flexibility and elasticity, so you can grow into your solution. On the other hand, if you have a workload that is temporal and well known, you can create a purpose-built filer architecture that can easily be torn down and rebuilt to meet your immediate storage needs.

One of the first decisions to make is whether you want to pay for a Google-managed filer solution, a supported partner filer solution, or an unsupported solution. Note that all unmanaged solutions require staff to maintain them in the long run, but the resources required for a supported solution are considerably less than for an unsupported solution.

The next decision involves figuring out the filer's durability and availability requirements. Most filer solutions are zonal solutions and do not by default provide protection if the zone fails. So it is important to consider if a disaster recovery solution that protects against zonal failures is required. Further, its important to understand your application requirements for durability and availability. The choice of Local SSD and/or persistent disks in your deployment will have a big impact on duraiblity and availaiblity, as does the configuration of your filer solutions software. Each solution requires careful planning in order to achieve high durability, availability, and even protection against zonal and regional failures.

Finally, consider the locations (that is, zones, regions, on-premise data centers) of where you need to access the data. The locations of your compute farms that access your data will influence your choice of filer solution, as only some solutions allow hybrid on-premises and in-cloud access.

Filer options

Elastifile

Elastifile simplifies enterprise storage and data management on GCP and across hybrid clouds. Elastifile delivers cost-effective, high-performance parallel access to global data while maintaining strict consistency powered by a dynamically scalable, distributed file system with intelligent object tiering. With Elastifile, existing NFS applications and NAS workflows can run in the cloud without requiring refactoring, yet retain the benefits of enterprise data services (high availability, compression, deduplication, replication, and so on). Native integration with Google Kubernetes Engine allows seamless data persistence, portability, and sharing for containerized workloads.

Elastifile is deployable and scalable at the push of a button. It lets you create and expand file system infrastructure easily and on-demand, ensuring that storage performance and capacity always align with your dynamic workflow requirements. As an Elastifile cluster expands, both metadata and I/O performance scale linearly. This scaling allows you to enhance and accelerate a broad range of data-intensive workflows, including high-performance computing, analytics, cross-site data aggregation, DevOps, and many more. As a result, Elastifile is a great fit for use in data-centric industries such as life sciences, electronic design automation (EDA), oil and gas, financial services, and media and entertainment.

Elastifile’s CloudConnect capability enables granular, bidirectional data transfer between any POSIX file system and Cloud Storage. To optimize performance and minimize costs, CloudConnect ensures that data is compressed and deduplicated before transfer and sends changes only after the initial data synchronization. When leveraged for hybrid cloud deployments, CloudConnect allows you to efficiently load data into Cloud Storage from any on-premises NFS file system, delivering a cost-effective way to bring data to the cloud. When leveraged in the cloud, CloudConnect enables cost-optimized data tiering between an Elastifile file system and Cloud Storage.

Diagram of Elastifile data storage and management

For more information, follow these links:

Quobyte

Quobyte is a parallel, distributed, POSIX-compatible file system that runs in the cloud and on-premises to provide petabytes of storage and millions of IOPS. The company was founded by ex-Google engineers who designed and architected the Quobyte file system by drawing on their deep technical understanding of the cloud.

Customers use Quobyte in demanding, large-scale production environments in industries ranging from life sciences, financial services, aerospace engineering, broadcasting and digital production, and electronic design automation (EDA) to traditional high-performance computing (HPC) research projects.

Quobyte natively supports all Linux, Windows, and NFS applications. Existing applications, newly implemented ones, and developers can work in the same environment whether in the cloud or on-premises. Quobyte offers optional cache-consistency for applications that need stronger guarantees than NFS or have not been designed for distributed setups. And HPC applications can take advantage of the fact that Quobyte is a parallel file system supporting concurrent reads and writes from multiple clients at high speed.

As a distributed file system, Quobyte scales IOPS and throughput linearly with the number of nodes—avoiding the performance bottlenecks of clustered or single filer solutions. Quobyte provides thousands of Linux and Windows client virtual machines (VMs) or containerized applications access to high IOPS, low latency, and several GB/s of throughput through its native client software. This native client directly communicates with all storage VMs and can even read from multiple replicas of the data, avoiding the additional latencies and performance bottlenecks of NFS gateways.

Quobyte clusters on Compute Engine can be created and extended in a matter of minutes, allowing admins to run entire workloads in the cloud or to burst peak workloads. Start with a single storage VM and add additional capacity and VMs on the fly; also, dynamically downsize the deployment when resources are no longer needed.

Standard Linux VMs are the foundation for a Quobyte cluster on Compute Engine. The interactive installer makes for a quick and effortless setup. Data is stored on the attached persistent disks, which can be HDD or SSD based. You can use both types in a single installation, for example, as different performance tiers. The volume mirroring feature enables georeplicated disaster recovery (DR) copies of volumes, which you can also use for read-only access in the remote region.

Monitoring and automation are built into Quobyte, making it easy to maintain a cluster of several hundred storage VMs. With a single click, you can add or remove VMs and disks, and new resources are available in less than a minute. Built-in real-time analytics help to identify the top storage consumers and the application's access patterns.

Quobyte is available as a 45-day test license at no cost directly from www.quobyte.com/get-quobyte.

Diagram of Quobyte file system

Quobyte supports thousands of clients communicating directly with all storage VMs without any performance bottlenecks. By using optional volume mirroring between different availability zones or on-premises clusters, you can asynchronously replicate volumes between multiple sites—for example, for disaster recovery—for read-only data access.

Avere vFXT

For workloads that require the utmost read performance, Avere Systems provides a best-of-breed solution. With Avere’s cloud based vFXT clustered cloud filesystem, you can provide your users with petabytes of storage and millions of IOPS.

Diagram of Avere vFXT

The Avere vFXT is not only a filer, but also a read/write cache that allows for minimal changes to your existing workflow by putting working data sets as close to your compute cluster as possible. With Avere, you can employ the cost effectiveness of Cloud Storage as a backing store, along with the performance, scalability and per-second pricing of Compute Engine.

Avere also allows you to make the most of your current on-premises footprint. In addition to being able to leverage GCP with the vFXT, you can use Avere’s on-premises FXT series to unify the storage of your legacy devices and storage arrays into an extensible filer with a single namespace.

If you are considering a transition away from your on-premises storage footprint, you can use Avere's FlashMove technology to migrate to Cloud Storage with zero downtime to your clients. If you want to provide a disaster recovery mechanism for your on-premises data, you can use the FlashMirror feature to replicate your on-premises storage in Cloud Storage. If you find yourself in need of a large amount of storage for a brief period of time, you can use Cloud Storage to burst your workload into the cloud. You can use as much storage and compute as you need, and then deprovision it without paying any ongoing costs.

Avere uses fast local devices, like SSDs and RAM, to cache the currently active data set as close to your compute devices as possible. With the vFXT, you can use the global redundancy and immense scale of Cloud Storage, while still providing your users with the illusion that their data is local to their compute cluster.

To get Avere up and running as your filer solution, contact Avere directly. For more information about Avere, see Google Cloud Platform Integration Overview

Single Node File Server

The Single Node File Server can be deployed by using Cloud Marketplace. The deployment includes monitoring through Grafana.

Diagram of Single Node File Server

When you use Cloud Marketplace to deploy Single Node File Server, you can configure the type of backing disk you'd like: standard or SSD. You can also configure the instance type and total data disk size. Keep in mind that the performance of your filer depends on both the size and type of disk as well as the instance type. The type and size of disk determine the total throughput.

After your filer is fully deployed, you can mount your shares by using NFS or SMB mounts from any host on the local subnet. Keep in mind that you can start with smaller disks and then resize them as necessary to scale with your performance or capacity needs.

If you can tolerate downtime, you can also scale up your filer by stopping the instance, changing the instance type, and then starting it again. Although Single Node File Server cannot scale horizontally to provide a shared pool of disks, you can create as many of the individual filers as you need. This approach could be useful if you are doing development or testing against a shared filesystem back end.

Although Single Node File Server does not automatically replicate data across zones or regions, you can create snapshots of your data disk in order to take periodic backups. There is no official paid support for Single Node File Server, so the costs of running it are tied directly to the instance, disk, and network costs. In general, this option should be low maintenance.

Summary of file server options

The following table summarizes the features of persistent disks and three file server options:

Filer solution	Optimal data set	Throughput	Managed support	Export protocols	Highly available	Hybrid
Elastifile	10s of TB to > 1 PB	10s to 100s of Gb/s	Elastifile	NFSv3	Yes	Yes
Quobyte	10s of TB to > 1 PB	100s to 1000s of Gb/s	Quobyte	Native Linux and Windows clients, Amazon S3, HDFS, NFSv4/3, SMB	Yes	Yes
Avere	10s to 100s of TB	10s to 100s of Gb/s	Avere	NFSv3, SMB2	Yes	Yes
Read-only PD	< 64 TB	180 to 1,200 MB/s	No	Direct attachment	No	No
Single Node File Server	< 64 TB	Up to 16 Gb/s	No	NFSv3, SMB3	No	No