Managing data deduplication - Amazon FSx for Windows File Server
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Managing data deduplication

You can manage your file system's data deduplication settings using the Amazon FSx CLI for remote management on PowerShell. For more information about using the Amazon FSx CLI remote management on PowerShell, see Using the Amazon FSx CLI for PowerShell.

Following are commands that you can use for data deduplication.

Data deduplication command Description

Enable-FSxDedup

Enables data deduplication on the file share. Data compression after deduplication is enabled by default when you enable data deduplication.

Disable-FSxDedup

Disables data deduplication on the file share.

Get-FSxDedupConfiguration

Retrieves deduplication configuration information, including Minimum file size and age for optimization, compression settings, and Excluded file types and folders.

Set-FSxDedupConfiguration

Changes the deduplication configuration settings, including minimum file size and age for optimization, compression settings, and excluded file types and folders.

Get-FSxDedupStatus

Retrieve the deduplication status, and include read-only properties that describe optimization savings and status on the file system, times, and completion status for the last dedup jobs on the file system.

Get-FSxDedupMetadata

Retrieves deduplication optimization metadata.

Update-FSxDedupStatus

Computes and retrieves updated data deduplication savings information.

Measure-FSxDedupFileMetadata

Measures and retrieves the potential storage space that you can reclaim on your file system if you delete a group of folders. Files often have chunks that are shared across other folders, and the deduplication engine calculates which chunks are unique and would be deleted.

Get-FSxDedupSchedule

Retrieves deduplication schedules that are currently defined.

New-FSxDedupSchedule

Create and customize a data deduplication schedule.

Set-FSxDedupSchedule

Change configuration settings for existing data deduplication schedules.

Remove-FSxDedupSchedule

Delete a deduplication schedule.

Get-FSxDedupJob

Get status and information for all currently running or queued deduplication jobs.

Stop-FSxDedupJob

Cancel one or more specified data deduplication jobs.

The online help for each command provides a reference of all command options. To access this help, run the command with -?, for example Enable-FSxDedup -?.

Enabling data deduplication

You enable data deduplication on an Amazon FSx for Windows File Server file share using the Enable-FSxDedup command, as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock {Enable-FsxDedup }

When you enable data deduplication, a default schedule and configuration are created. You can create, modify, and remove schedules and configurations using the commands below.

You can use the Disable-FSxDedup command to disable data deduplication entirely on your file system.

Creating a data deduplication schedule

Although the default schedule works well in most cases, you can create a new deduplication schedule by using the New-FsxDedupSchedule command, shown as follows. Data deduplication schedules use UTC time.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock { New-FSxDedupSchedule -Name "CustomOptimization" -Type Optimization -Days Mon,Wed,Sat -Start 08:00 -DurationHours 7 }

This command creates a schedule named CustomOptimization that runs on days Monday, Wednesday, and Saturday, starting the job at 8:00 am (UTC) each day, with a maximum duration of 7 hours, after which the job stops if it is still running.

Note that creating new, custom deduplication job schedules does not override or remove the existing default schedule. Before creating a custom deduplication job, you may want to disable the default job if you don’t need it.

You can disable the default deduplication schedule by using the Set-FsxDedupSchedule command, shown as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock {Set-FSxDedupSchedule -Name “BackgroundOptimization” -Enabled $false}

You can remove a deduplication schedule by using the Remove-FSxDedupSchedule -Name "ScheduleName" command. Note that the default BackgroundOptimization deduplication schedule cannot be modified or removed and will need to be disabled instead.

Modifying a data deduplication schedule

You can modify an existing deduplication schedule by using the Set-FsxDedupSchedule command, shown as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock { Set-FSxDedupSchedule -Name "CustomOptimization" -Type Optimization -Days Mon,Tues,Wed,Sat -Start 09:00 -DurationHours 9 }

This command modifies the existing CustomOptimization schedule to run on days Monday to Wednesday and Saturday, starting the job at 9:00 am (UTC) each day, with a maximum duration of 9 hours, after which the job stops if it is still running.

To modify the minimum file age before optimizing setting, use the Set-FSxDedupConfiguration command.

Viewing the amount of saved space

To view the amount of disk space you are saving from running data deduplication, use the Get-FSxDedupStatus command, as follows.

PS C:\Users\Admin> Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FsxRemoteAdmin -ScriptBlock { Get-FSxDedupStatus } | select OptimizedFilesCount,OptimizedFilesSize,SavedSpace,OptimizedFilesSavingsRate OptimizedFilesCount OptimizedFilesSize SavedSpace OptimizedFilesSavingsRate ------------------- ------------------ ---------- ------------------------- 12587 31163594 25944826 83
Note

The values shown in the command response for following parameters are not reliable, and you should not use these values: Capacity, FreeSpace, UsedSpace, UnoptimizedSize, and SavingsRate.