Managing storage on FSx for Windows File Server - Amazon FSx for Windows File Server
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Managing storage on FSx for Windows File Server

Your file system's storage configuration includes the amount of provisioned storage capacity, the storage type, and if the storage type is solid state drive (SSD), the amount of SSD IOPS. You can configure these resources, along with the file system's throughput capacity, when creating a file system and after it's created, to achieve the desired performance for your workload. Learn how to manage your file system's storage and storage-related performance using the Amazon Web Services Management Console, Amazon CLI, and the Amazon FSx CLI for remote management on PowerShell by exploring the following topics.

Optimizing storage costs

You can optimize your storage costs using the storage configuration options available in FSx for Windows.

Storage type options—FSx for Windows File Server provides two storage types, hard disk drives (HDD) and solid state drives (SSD)—to enable you to optimize cost/performance to meet your workload needs. HDD storage is designed for a broad spectrum of workloads, including home directories, user and departmental shares, and content management systems. SSD storage is designed for the highest-performance and most latency-sensitive workloads, including databases, media processing workloads, and data analytics applications. For more information about storage types and file system performance, see FSx for Windows File Server performance.

Data deduplication—Large datasets often have redundant data, which increases data storage costs. For example, user file shares can have multiple copies of the same file, stored by multiple users. Software development shares can contain many binaries that remain unchanged from build to build. You can reduce your data storage costs by turning on data deduplication for your file system. When it's turned on, data deduplication automatically reduces or eliminates redundant data by storing duplicated portions of the dataset only once. For more information about data deduplication, and how to easily turn it on for your Amazon FSx file system, see Reducing storage costs with Data Deduplication.

Managing storage capacity

You can increase your FSx for Windows file system's storage capacity as your storage requirements change. You can do so using the Amazon FSx console, the Amazon FSx API, or the Amazon Command Line Interface (Amazon CLI). Factors to consider when planning a storage capacity increase include knowing when you need to increase storage capacity, understanding how Amazon FSx processes storage capacity increases, and tracking the progress of a storage increase request. You can only increase a file system's storage capacity; you cannot decrease storage capacity.

Note

You can't increase storage capacity for file systems created before June 23, 2019 or file systems restored from a backup belonging to a file system that was created before June 23, 2019.

When you increase the storage capacity of your Amazon FSx file system, Amazon FSx adds a new, larger set of disks to your file system behind the scenes. Amazon FSx then runs a storage optimization process in the background to transparently migrate data from the old disks to the new disks. Storage optimization can take between a few hours and a few days, with minimal noticeable impact on the workload performance. During this optimization, backup usage is temporarily higher, because both the old and new storage volumes are included in the file system-level backups. Both sets of storage volumes are included to ensure that Amazon FSx can successfully take and restore from backups even during storage scaling activity. The backup usage reverts to its previous baseline level after the old storage volumes are no longer included in the backup history. When the new storage capacity becomes available, you are billed only for the new storage capacity.

The following illustration shows the four main steps of the process that Amazon FSx uses when increasing a file system's storage capacity.

Diagram showing 4 steps: 1. Storage capacity increase request, 2. FSx adds new larger disks, 3. FSx migrates data, and 4. FSx removes old disks.

You can track the progress of storage optimization, SSD storage capacity increases, or SSD IOPS updates at any time using the Amazon FSx console, CLI, or API. For more information, see Monitoring storage capacity increases.

What to know about increasing a file system's storage capacity

Here are a few important items to consider when increasing storage capacity:

  • Increase only – You can only increase the amount of storage capacity for a file system; you can't decrease storage capacity.

  • Minimum increase – Each storage capacity increase must be a minimum of 10 percent of the file system's current storage capacity, up to the maximum allowed value of 65,536 GiB.

  • Minimum throughput capacity – To increase storage capacity, a file system must have a minimum throughput capacity of 16 MB/s. This is because the storage optimization step is a throughput-intensive process.

  • Time between increases – You can't make further storage capacity increases on a file system until 6 hours after the last increase was requested, or until the storage optimization process has completed, whichever time is longer. Storage optimization can take from a few hours up to a few days to complete. To minimize the time it takes for storage optimization to complete, we recommend increasing your file system's throughput capacity before increasing storage capacity (the throughput capacity can be scaled back down after storage scaling completes), and increasing storage capacity when there is minimal traffic on the file system.

Note

Certain file system events can consume disk I/O performance resources For example:

The optimization phase of storage capacity scaling can generate increased disk throughput, and potentially cause performance warnings. For more information, see Performance warnings and recommendations.

Knowing when to increase storage capacity

Increase your file system's storage capacity when it's running low on free storage capacity. Use the FreeStorageCapacity CloudWatch metric to monitor the amount of free storage available on the file system. You can create an Amazon CloudWatch alarm on this metric and get notified when it drops below a specific threshold. For more information, see Monitoring with Amazon CloudWatch.

We recommend maintaining at least 10% of free storage capacity at all times on your file system. Using all of your storage capacity can negatively impact your performance and might introduce data inconsistencies.

You can automatically increase your file system's storage capacity when the amount of free storage capacity falls below a defined threshold that you specify. Use the Amazon‐developed custom Amazon CloudFormation template to deploy all of the components required to implement the automated solution. For more information, see Increasing storage capacity dynamically.

Storage capacity increases and file system performance

Most workloads experience minimal performance impact while Amazon FSx runs the storage optimization process in the background after the new storage capacity is available. Write-heavy applications with large active datasets could temporarily experience up to a one-half reduction in the write performance. For these cases, you can first increase your file system's throughput capacity before increasing storage capacity. This enables you to continue providing the same level of throughput to meet your application’s performance needs. For more information, see Managing throughput capacity on FSx for Windows File Server file systems.

Managing your FSx for Windows file system's storage type

You can change your file system storage type from HDD to SSD using the Amazon Web Services Management Console and Amazon CLI. When you change the storage type to SSD, keep in mind that you can't update your file system configuration again until 6 hours after the last update was requested, or until the storage optimization process is complete—whichever time is longer. Storage optimization can take between a few hours and a few days to complete. To minimize this time, we recommend updating your storage type when there is minimal traffic on your file system. For more information, see Updating the storage type of a FSx for Windows file system.

You can't change your file system storage type from SSD to HDD. If you want to change a file system's storage type from HDD to SSD, you will need to restore a backup of the file system to a new file system that you configure to use HDD storage. For more information, see Restoring backups to new file system.

About storage types

You can configure your FSx for Windows File Server file system to use either the solid state drive (SSD) or the magnetic hard disk drive (HDD) storage type.

SSD storage is appropriate for most production workloads that have high performance requirements and latency-sensitivity. Examples of these workloads include databases, data analytics, media processing, and business applications. We also recommend SSD for use cases involving large numbers of end users, high levels of I/O, or datasets that have large numbers of small files. Lastly, we recommend using SSD storage if you plan to enable shadow copies. You can configure and scale SSD IOPS for file systems with SSD storage, but not HDD storage.

HDD storage is designed for a broad range of workloads—including home directories, user and departmental file shares, and content management systems. HDD storage comes at a lower cost relative to SSD storage, but with higher latencies and lower levels of disk throughput and disk IOPS per unit of storage. It might be suitable for general-purpose user shares and home directories with low I/O requirements, large content management systems (CMS) where data is retrieved infrequently, or datasets with small numbers of large files.

For more information, see Storage configuration & performance.

Managing SSD IOPS

For file systems configured with SSD storage, the amount of SSD IOPS determines the amount of disk I/O available when your file system has to read data from and write data to disk, as opposed to data that is in cache. You can select and scale the amount of SSD IOPS independently of storage capacity. The maximum SSD IOPS that you can provision is dependent on the amount of storage capacity and throughput capacity you select for your file system. If you attempt to increase your SSD IOPS above the limit that's supported by your throughput capacity, you might need to increase your throughput capacity to get that level of SSD IOPS. For more information, see FSx for Windows File Server performance and Managing throughput capacity on FSx for Windows File Server file systems.

Here are a few important items to know about updating a file system's provisioned SSD IOPS:

  • Choosing an IOPS mode – there are two IOPS modes to choose from:

    • Automatic – choose this mode and Amazon FSx will automatically scale your SSD IOPS to maintain 3 SSD IOPS per GiB of storage capacity, up to 400,000 SSD IOPS per file system.

    • User-provisioned – choose this mode so that you can specify the number of SSD IOPS within the range of 96–400,000. Specify a number between 3–50 IOPS per GiB of storage capacity for all Amazon Web Services Regions where Amazon FSx is available, or between 3–500 IOPS per GiB of storage capacity in US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Singapore). When you choose the user-provisiohed mode, and the amount of SSD IOPS you specify is not at least 3 IOPS per GiB, the request fails. For higher levels of provisioned SSD IOPS, you pay for the average IOPS above 3 IOPS per GiB per file system.

  • Storage capacity updates – If you increase your file system's storage capacity, and the amount requires by default an amount of SSD IOPS that is greater than your current user-provisioned SSD IOPS level, Amazon FSx automatically switches your file system to Automatic mode and your file system will have a minimum of 3 SSD IOPS per GiB of storage capacity.

  • Throughput capacity updates – If you increase your throughput capacity, and the maximum SSD IOPS supported by your new throughput capacity is higher than your user-provisioned SSD IOPS level, Amazon FSx automatically switches your file system to Automatic mode.

  • Frequency of SSD IOPS increases – You can't make further SSD IOPS increases, throughput capacity increases, or storage type updates on a file system until 6 hours after the last increase was requested, or until the storage optimization process has completed—whichever time is longer. Storage optimization can take from a few hours up to a few days to complete. To minimize the time it takes for storage optimization to complete, we recommend scaling SSD IOPS when there is minimal traffic on the file system.

Note

Note that throughput capacity levels of 4,608 MBps and higher are supported only in the following Amazon Web Services Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Singapore).

For more information about how update the amount of provisioned SSD IOPS for your FSx for Windows File Server file system, see Updating a file system's SSD IOPS.

Reducing storage costs with Data Deduplication

Data Deduplication, often referred to as Dedup for short, helps storage administrators reduce costs that are associated with duplicated data. With FSx for Windows File Server, you can use Microsoft Data Deduplication to identify and eliminate redundant data. Large datasets often have redundant data, which increases the data storage costs. For example:

  • User file shares may have many copies of the same or similar files.

  • Software development shares can have many binaries that remain unchanged from build to build.

You can reduce your data storage costs by enabling data deduplication for your file system. Data deduplication reduces or eliminates redundant data by storing duplicated portions of the dataset only once. When you enable Data Deduplication, Data compression is enabled by default, compressing the data after deduplication for additional savings. Data Deduplication optimizes redundancies without compromising data fidelity or integrity. Data deduplication runs as a background process that continually and automatically scans and optimizes your file system, and it is transparent to your users and connected clients.

The storage savings that you can achieve with data deduplication depends on the nature of your dataset, including how much duplication exists across files. Typical savings average 50–60 percent for general-purpose file shares. Within shares, savings range from 30–50 percent for user documents to 70–80 percent for software development datasets. You can measure potential deduplication savings using the Measure-FSxDedupFileMetadata remote PowerShell command described below.

You can also customize data deduplication to meet your specific storage needs. For example, you can configure deduplication to run only on certain file types, or you can create a custom job schedule. Because deduplication jobs can consume file server resources, we recommend monitoring the status of your deduplication jobs using the Get-FSxDedupStatus command described below.

For more information about data deduplication, see the Microsoft Understanding Data Deduplication documentation.

Note

Please see our best practices for Best practices when using data deduplication. If you encounter issues with getting data deduplication jobs to run successfully, see Troubleshooting data deduplication.

Warning

It is not recommended to run certain Robocopy commands with data deduplication because these commands can impact the data integrity of the Chunk Store. For more information, see the Microsoft Data Deduplication interoperability documentation.

Best practices when using data deduplication

Here are some best practices for using Data Deduplication:

  • Schedule Data Deduplication jobs to run when your file system is idle: The default schedule includes a weekly GarbageCollection job at 2:45 UTC on Saturdays. It can take multiple hours to complete if you have a large amount of data churn on your file system. If this time isn't ideal for your workload, schedule this job to run at a time when you expect low traffic on your file system.

  • Configure sufficient throughput capacity for Data Deduplication to complete: Higher throughput capacities provide higher levels of memory. Microsoft recommends having 1 GB of memory per 1 TB of logical data to run Data Deduplication. Use the Amazon FSx performance table to determine the memory that's associated with your file system's throughput capacity and ensure that the memory resources are sufficient for the size of your data.

  • Customize Data Deduplication settings to meet your specific storage needs and reduce performance requirements: You can constrain the optimization to run on specific file types or folders, or set a minimum file size and age for optimization. To learn more, see Reducing storage costs with Data Deduplication.

Managing data deduplication

You can manage data deduplication on your file system using the Amazon FSx CLI for remote management on PowerShell. For more information about using the Amazon FSx CLI remote management on PowerShell, see Using the Amazon FSx CLI for PowerShell.

Following are commands that you can use for data deduplication.

Data deduplication command Description

Enable-FSxDedup

Enables data deduplication on the file share. Data compression after deduplication is enabled by default when you enable data deduplication.

Disable-FSxDedup

Disables data deduplication on the file share.

Get-FSxDedupConfiguration

Retrieves deduplication configuration information, including Minimum file size and age for optimization, compression settings, and Excluded file types and folders.

Set-FSxDedupConfiguration

Changes the deduplication configuration settings, including minimum file size and age for optimization, compression settings, and excluded file types and folders.

Get-FSxDedupStatus

Retrieve the deduplication status, and include read-only properties that describe optimization savings and status on the file system, times, and completion status for the last dedup jobs on the file system.

Get-FSxDedupMetadata

Retrieves deduplication optimization metadata.

Update-FSxDedupStatus

Computes and retrieves updated data deduplication savings information.

Measure-FSxDedupFileMetadata

Measures and retrieves the potential storage space that you can reclaim on your file system if you delete a group of folders. Files often have chunks that are shared across other folders, and the deduplication engine calculates which chunks are unique and would be deleted.

Get-FSxDedupSchedule

Retrieves deduplication schedules that are currently defined.

New-FSxDedupSchedule

Create and customize a data deduplication schedule.

Set-FSxDedupSchedule

Change configuration settings for existing data deduplication schedules.

Remove-FSxDedupSchedule

Delete a deduplication schedule.

Get-FSxDedupJob

Get status and information for all currently running or queued deduplication jobs.

Stop-FSxDedupJob

Cancel one or more specified data deduplication jobs.

The online help for each command provides a reference of all command options. To access this help, run the command with -?, for example Enable-FSxDedup -?.

Enabling data deduplication

You enable data deduplication on an Amazon FSx for Windows File Server file share using the Enable-FSxDedup command, as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock {Enable-FsxDedup }

When you enable data deduplication, a default schedule and configuration are created. You can create, modify, and remove schedules and configurations using the commands below.

You can use the Disable-FSxDedup command to disable data deduplication entirely on your file system.

Creating a data deduplication schedule

Even though the default schedule works well in most cases, you can create a new deduplication schedule by using the New-FsxDedupSchedule command, shown as follows. Data deduplication schedules use UTC time.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock { New-FSxDedupSchedule -Name "CustomOptimization" -Type Optimization -Days Mon,Wed,Sat -Start 08:00 -DurationHours 7 }

This command creates a schedule named CustomOptimization that runs on days Monday, Wednesday, and Saturday, starting the job at 8:00 am (UTC) each day, with a maximum duration of 7 hours, after which the job stops if it is still running.

Note that creating new, custom deduplication job schedules does not override or remove the existing default schedule. Before creating a custom deduplication job, you may want to disable the default job if you don’t need it.

You can disable the default deduplication schedule by using the Set-FsxDedupSchedule command, shown as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock {Set-FSxDedupSchedule -Name “BackgroundOptimization” -Enabled $false}

You can remove a deduplication schedule by using the Remove-FSxDedupSchedule -Name "ScheduleName" command. Note that the default BackgroundOptimization deduplication schedule cannot be modified or removed and will need to be disabled instead.

Modifying a data deduplication schedule

You can modify an existing deduplication schedule by using the Set-FsxDedupSchedule command, shown as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock { Set-FSxDedupSchedule -Name "CustomOptimization" -Type Optimization -Days Mon,Tues,Wed,Sat -Start 09:00 -DurationHours 9 }

This command modifies the existing CustomOptimization schedule to run on days Monday to Wednesday and Saturday, starting the job at 9:00 am (UTC) each day, with a maximum duration of 9 hours, after which the job stops if it is still running.

To modify the minimum file age before optimizing setting, use the Set-FSxDedupConfiguration command.

Viewing the amount of saved space

To view the amount of disk space you are saving from running data deduplication, use the Get-FSxDedupStatus command, as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FsxRemoteAdmin -ScriptBlock { Get-FSxDedupStatus } | select OptimizedFilesCount,OptimizedFilesSize,SavedSpace,OptimizedFilesSavingsRate OptimizedFilesCount OptimizedFilesSize SavedSpace OptimizedFilesSavingsRate ------------------- ------------------ ---------- ------------------------- 12587 31163594 25944826 83
Note

The values shown in the command response for following parameters are not reliable, and you should not use these values: Capacity, FreeSpace, UsedSpace, UnoptimizedSize, and SavingsRate.

Troubleshooting data deduplication

There are a number of potential causes for data deduplication issues, as described in the following section.

Data deduplication is not working

To see the current status of data deduplication, run the Get-FSxDedupStatus PowerShell command to view the completion status for the most recent deduplication jobs. If one or more jobs is failing, you may not see an increase in free storage capacity on your file system.

The most common reason for deduplication jobs failing is insufficient memory.

  • Microsoft recommends optimally having 1 GB of memory per 1 TB of logical data (or at a minimum 350 MB per 1 TB of logical data). Use the Amazon FSx performance table to determine the memory associated with your file system's throughput capacity and ensure the memory resources are sufficient for the size of your data. If it is not, you need to increase the file system's throughput capacity to the level that meets the memory requirements of 1 GB per 1 TB of logical data.

  • Deduplication jobs are configured with the Windows recommended default of 25% memory allocation, which means that for a file system with 32 GB of memory, 8 GB will be available for deduplication. The memory allocation is configurable (using the Set-FSxDedupSchedule command with parameter –Memory). Be aware that using a higher memory allocation for dedup may impact file system performance.

  • You can modify the configuration of deduplication jobs to reduce the amount of memory required. For example, you can constrain the optimization to run on specific file types or folders, or set a minimum file size and age for optimization. We also recommend configuring deduplication jobs to run during idle periods when there is minimal load on your file system.

You may also see errors if deduplication jobs have insufficient time to complete. You may need to change the maximum duration of jobs, as described in Modifying a data deduplication schedule.

If deduplication jobs have been failing for a long period of time, and there have been changes to the data on the file system during this period, subsequent deduplication jobs may require more resources to complete successfully for the first time.

Deduplication values are unexpectedly set to 0

The values for SavedSpace and OptimizedFilesSavingsRate are unexpectedly 0 for a file system on which you have configured data deduplication.

This can occur during the storage optimization process when you increase the file system's storage capacity. When you increase a file system's storage capacity, Amazon FSx cancels existing data deduplication jobs during the storage optimization process, which migrates data from the old disks to the new, larger disks. Amazon FSx resumes data deduplication on the file system once the storage optimization job completes. For more information about increasing storage capacity and storage optimization, see Managing storage capacity.

Space is not freed up on file system after deleting files

The expected behavior of data deduplication is that if the data that was deleted was something that dedup had saved space on, then the space is not actually freed up on your file system until the garbage collection job runs.

A practice you may find helpful is to set the schedule to run the garbage collection job right after you delete a large number of files. After the garbage collection job finishes, you can set the garbage collection schedule back to its original settings. This ensures you can quickly see the space from your deletions immediately.

Use the following procedure to set the garbage collection job to run in 5 minutes.

  1. To verify that data deduplication is enabled, use the Get-FSxDedupStatus command. For more information on the command and its expected output, see Viewing the amount of saved space.

  2. Use the following to set the schedule to run the garbage collection job 5 minutes from now.

    $FiveMinutesFromNowUTC = ((get-date).AddMinutes(5)).ToUniversalTime() $DayOfWeek = $FiveMinutesFromNowUTC.DayOfWeek $Time = $FiveMinutesFromNowUTC.ToString("HH:mm") Invoke-Command -ComputerName ${RPS_ENDPOINT} -ConfigurationName FSxRemoteAdmin -ScriptBlock { Set-FSxDedupSchedule -Name "WeeklyGarbageCollection" -Days $Using:DayOfWeek -Start $Using:Time -DurationHours 9 }
  3. After the garbage collection job has run and the space has been freed up, set the schedule back to its original settings.