Using modular code with the @remote decorator - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using modular code with the @remote decorator

You can organize your code into modules for ease of workspace management during development and still use the @remote function to invoke a function. You can also replicate the local modules from your development environment to the remote job environment. To do so, set the parameter include_local_workdir to True, as shown in the following code example.

@remote( include_local_workdir=True, )
Note

The @remote decorator and parameter must appear in the main file, rather than in any of the dependent files.

When include_local_workdir is set to True, SageMaker packages all of the Python scripts while maintaining the directory structure in the process' current directory. It also makes the dependencies available in the job's working directory.

For example, suppose your Python script which processes the MNIST dataset is divided into a main.py script and a dependent pytorch_mnist.py script. main.py calls the dependent script. Also, the main.py script contains code to import the dependency as shown.

from mnist_impl.pytorch_mnist import ...

The main.py file must also contain the @remote decorator, and it must set the include_local_workdir parameter to True.

The include_local_workdir parameter by default includes all the Python scripts in the directory. You can customize which files you want to upload to the job by using this parameter in conjunction with the custom_file_filter parameter. You can either pass a function that filters job dependencies to be uploaded to S3, or a CustomFileFilter object that specifies the local directories and files to ignore in the remote function. You can use custom_file_filter only if include_local_workdir is set to True—otherwise the parameter is ignored.

The following example uses CustomFileFilter to ignore all notebook files and folders or files named data when uploading files to S3.

@remote( include_local_workdir=True, custom_file_filter=CustomFileFilter( ignore_pattern_names=[ # files or directories to ignore "*.ipynb", # all notebook files "data", # folter or file named data ] ) )

The following example demonstrates how you can package an entire workspace.

@remote( include_local_workdir=True, custom_file_filter=CustomFileFilter( ignore_pattern_names=[] # package whole workspace ) )

The following example shows how you can use a function to filter files.

import os def my_filter(path: str, files: List[str]) -> List[str]: to_ignore = [] for file in files: if file.endswith(".txt") or file.endswith(".ipynb"): to_ignore.append(file) return to_ignore @remote( include_local_workdir=True, custom_file_filter=my_filter )

Best practices in structuring your working directory

The following best practices suggest how you can organize your directory structure while using the @remote decorator in your modular code.

  • Put the @remote decorator in a file that resides at the root level directory of the workspace.

  • Structure the local modules at the root level.

The following example image shows the recommended directory structure. In this example structure, the main.py script is located at the root level directory.

. ├── config.yaml ├── data/ ├── main.py <----------------- @remote used here ├── mnist_impl │ ├── __pycache__/ │ │ └── pytorch_mnist.cpython-310.pyc │ ├── pytorch_mnist.py <-------- dependency of main.py ├── requirements.txt

The following example image shows a directory structure that will result in inconsistent behavior when it is used to annotate your code with an @remote decorator.

In this example structure, the main.py script that contains the @remote decorator is not located at the root level directory. The following structure is NOT recommended.

. ├── config.yaml ├── entrypoint │ ├── data │ └── main.py <----------------- @remote used here ├── mnist_impl │ ├── __pycache__ │ │ └── pytorch_mnist.cpython-310.pyc │ └── pytorch_mnist.py <-------- dependency of main.py ├── requirements.txt