Amazon FSx for NetApp ONTAP performance
Following is an overview of Amazon FSx for NetApp ONTAP file system performance, with a discussion of the available performance and throughput options and useful performance tips.
Topics
How performance is measured for FSx for ONTAP file systems
File system performance is measured by its latency, throughput, and I/O operations per second (IOPS).
Latency
Amazon FSx for NetApp ONTAP provides sub-millisecond file operation latencies with solid state drive (SSD) storage, and tens of milliseconds of latency for capacity pool storage. Above that, Amazon FSx has two layers of read caching on each file server—NVMe (non-volatile memory express) drives and in-memory—to provide even lower latencies when you access your most frequently-read data.
Throughput and IOPS
Each Amazon FSx file system provides up to tens of GB/s of throughput and millions of IOPS. The specific amount of throughput and IOPS that your workload can drive on your file system depends on the total throughput capacity and storage capacity configuration of your file system, along with the nature of your workload, including the size of the active working set.
SMB Multichannel and NFS nconnect support
With Amazon FSx, you can configure SMB Multichannel to provide multiple connections
between ONTAP and clients in a single SMB session. SMB Multichannel uses multiple network
connections between the client and server simultaneously to aggregate network bandwidth for
maximal utilization. For information on using the NetApp ONTAP CLI to configure SMB
Multichannel, see Configuring SMB Multichannel for performance and redundancy
NFS clients can use the nconnect
mount option to have multiple TCP
connections (up to 16) associated with a single NFS mount. Such an NFS client multiplexes
file operations onto multiple TCP connections in a round-robin fashion and thus obtains
higher throughput from the available network bandwidth. NFSv3 and NFSv4.1+ support
nconnect
.
Amazon EC2 instance network bandwidth describes the full duplex 5 Gbps per network flow
bandwidth limit. You can overcome this limit by using multiple network flows with nconnect
or SMB multichannel.
See your NFS client documentation to confirm whether nconnect
is supported in
your client version. For more information about NetApp ONTAP support for nconnect
,
see ONTAP support for NFSv4.1
Performance details
To understand the Amazon FSx for NetApp ONTAP performance model in detail, you can examine the architectural components of an Amazon FSx file system. Your client compute instances, whether they exist in Amazon or on-premises, access your file system through one or multiple elastic network interfaces (ENI). These network interfaces reside in the Amazon VPC that you associate with your file system. Behind each file system ENI is a NetApp ONTAP file server that is serving data over the network to the clients accessing the file system. Amazon FSx provides a fast in-memory cache and NVMe cache on each file server to enhance performance for the most frequently accessed data. Attached to each file server are the SSD disks hosting your file system data.
These components are illustrated in the following diagram.
Corresponding with these architectural components–network interface, in-memory cache, NVMe cache, and storage volumes–are the primary performance characteristics of an Amazon FSx for NetApp ONTAP file system that determine the overall throughput and IOPS performance.
-
Network I/O performance: throughput/IOPS of requests between the clients and the file server (in aggregate)
-
In-memory and NVMe cache size on the file server: size of active working set that can be accommodated for caching
-
Disk I/O performance: throughput/IOPS of requests between the file server and the storage disks
There are two factors that determine these performance characteristics for your file system: the total amount of SSD IOPS and throughput capacity that you configure for it. The first two performance characteristics – network I/O performance and in-memory and NVMe cache size – are solely determined by throughput capacity, while the third one – disk I/O performance – is determined by a combination of throughput capacity and SSD IOPS.
File-based workloads are typically spiky, characterized by short, intense periods of high I/O with plenty of idle time between bursts. To support spiky workloads, in addition to the baseline speeds that a file system can sustain 24/7, Amazon FSx provides the capability to burst to higher speeds for periods of time for both network I/O and disk I/O operations. Amazon FSx uses a network I/O credit mechanism to allocate throughput and IOPS based on average utilization — file systems accrue credits when their throughput and IOPS usage is below their baseline limits, and can use these credits when they perform I/O operations.
Write operations use twice as much network bandwidth as read operations. A write operation has to be replicated on the secondary file server, so a single write operation results in twice the amount of network throughput.
Impact of deployment type on performance
You can create Single-AZ and Multi-AZ file systems with FSx for ONTAP. First-generation file systems (both Single-AZ and Multi-AZ) and second-generation Multi-AZ file systems are powered by one high-availability (HA) pair. Second-generation Single-AZ file systems are powered by up to 12 HA pairs. For more information, see Managing high-availability (HA) pairs.
FSx for ONTAP Multi-AZ and Single-AZ file systems provide consistent sub-millisecond file operation latencies with SSD storage and tens of milliseconds of latency with capacity pool storage. Additionally, file systems that meet the following requirements provide an NVMe read cache to reduce read latencies and increase IOPS for frequently-read data:
Multi-AZ 1 and Multi-AZ 2 file systems
Single-AZ 1 file systems created after November 28, 2022 with at least 2 GBps of throughput capacity
Single-AZ 2 file systems with at least 6 GBps of throughput capacity per pair
Note
For second-generation file systems (Single-AZ 2 and Multi-AZ 2), using an NVMe cache can result in your workload achieving less total throughput for high-throughput or large I/O workloads. If you have a throughput-bound workload, we recommend disabling the NVMe cache. For more information, see Managing the NVMe cache.
The following tables show the amount of throughput capacity that file systems can scale up to depending on factors such as the number of high availability (HA) pairs and Amazon Web Services Regions availability.
Impact of storage capacity on performance
The maximum disk throughput and IOPS levels your file system can achieve is the lower of:
-
the disk performance level provided by your file servers, based on the throughput capacity you select for your file system
-
the disk performance level provided by the number of SSD IOPS you provision for your file system
By default, your file system's SSD storage provides up to the following levels of disk throughput and IOPS:
-
Disk throughput (MBps per TiB of storage): 768
-
Disk IOPS (IOPs per TiB of storage): 3,072
Impact of throughput capacity on performance
Every Amazon FSx file system has a throughput capacity that you configure when the file system is created. Your file system's throughput capacity determines the level of network I/O performance, or the speed at which each of the file servers that are hosting your file system can serve file data over the network to clients accessing it. Higher levels of throughput capacity come with more memory and non-volatile memory express (NVMe) storage for caching data on each file server, and higher levels of disk I/O performance supported by each file server.
You can optionally provision a higher level of SSD IOPS when creating your file system. The maximum level of SSD IOPS that your file system can achieve is also dictated by your file system's throughput capacity, even when provisioning additional SSD IOPS.
The following tables show the full set of specifications for throughput capacity, along with baseline and burst levels, and amount of memory for caching on the file server in the corresponding Amazon Web Services Regions.
Example: storage capacity and throughput capacity
The following example illustrates how storage capacity and throughput capacity impact file system performance.
A first-generation file system that is configured with 2 TiB of SSD storage capacity and 512 MBps of throughput capacity has the following throughput levels:
-
Network throughput – 625 MBps baseline and 1,250 MBps burst (see throughput capacity table)
-
Disk throughput – 512 MBps baseline and 600 MBps burst.
Your workload accessing the file system will therefore be able to drive up to 625 MBps baseline and 1,250 MBps burst throughput for file operations performed on actively accessed data cached in the file server in-memory cache and NVMe cache.