Using a custom container for analysis
This section includes information about how to build a Docker container using a Jupyter notebook. There is a security risk if you re-use notebooks built by third parties: included containers can execute arbitrary code with your user permissions. In addition, the HTML generated by the notebook can be displayed in the Amazon IoT Analytics console, providing a potential attack vector on the computer displaying the HTML. Make sure you trust the author of any third-party notebook before using it.
You can create your own custom container and run it with the Amazon IoT Analytics service. To do so, you setup a Docker image and upload it to Amazon ECR, then set up a dataset yo run a container action. This section gives an example of the process using Octave.
This tutorial assumes that you have:
-
Octave installed on your local computer
-
A Docker account set up on your local computer
-
An Amazon account with Amazon ECR or Amazon IoT Analytics access
Step 1: Set up a Docker image
There are three main files you need for this tutorial. Their names and contents are here:
-
Dockerfile
– The initial setup for Docker's containerization process.FROM ubuntu:16.04 # Get required set of software RUN apt-get update RUN apt-get install -y software-properties-common RUN apt-get install -y octave RUN apt-get install -y python3-pip # Get boto3 for S3 and other libraries RUN pip3 install --upgrade pip RUN pip3 install boto3 RUN pip3 install urllib3 # Move scripts over ADD moment moment ADD run-octave.py run-octave.py # Start python script ENTRYPOINT ["python3", "run-octave.py"]
-
run-octave.py
– Parses JSON from Amazon IoT Analytics, runs the Octave script, and uploads artifacts to Amazon S3.import boto3 import json import os import sys from urllib.parse import urlparse # Parse the JSON from IoT Analytics with open('/opt/ml/input/data/iotanalytics/params') as params_file: params = json.load(params_file) variables = params['Variables'] order = variables['order'] input_s3_bucket = variables['inputDataS3BucketName'] input_s3_key = variables['inputDataS3Key'] output_s3_uri = variables['octaveResultS3URI'] local_input_filename = "input.txt" local_output_filename = "output.mat" # Pull input data from S3... s3 = boto3.resource('s3') s3.Bucket(input_s3_bucket).download_file(input_s3_key, local_input_filename) # Run Octave Script os.system("octave moment {} {} {}".format(local_input_filename, local_output_filename, order)) # # Upload the artifacts to S3 output_s3_url = urlparse(output_s3_uri) output_s3_bucket = output_s3_url.netloc output_s3_key = output_s3_url.path[1:] s3.Object(output_s3_bucket, output_s3_key).put(Body=open(local_output_filename, 'rb'), ACL='bucket-owner-full-control')
-
moment
– A simple Octave script which calculates the moment based on an input or output file and a specified order.#!/usr/bin/octave -qf arg_list = argv (); input_filename = arg_list{1}; output_filename = arg_list{2}; order = str2num(arg_list{3}); [D,delimiterOut]=importdata(input_filename) M = moment(D, order) save(output_filename,'M')
-
Download the contents of each file. Create a new directory and place all the files in it and then
cd
to that directory. -
Run the following command.
docker build -t octave-moment .
-
You should see a new image in your Docker repository. Verify it by running the following command.
docker image ls | grep octave-moment
Step 2: Upload the Docker image to an Amazon ECR repository
-
Create a repository in Amazon ECR.
aws ecr create-repository --repository-name octave-moment
-
Get the login to your Docker environment.
aws ecr get-login
-
Copy the output and run it. The output should look something like the following.
docker login -u AWS -p
password
-e none https://your-aws-account-id
.dkr.ecr..amazonaws.com -
Tag the image you created with the Amazon ECR repository tag.
docker tag
your-image-id
your-aws-account-id
.dkr.ecr.region
.amazonaws.com/octave-moment -
Push the image to Amazon ECR.
docker push
your-aws-account-id
.dkr.ecr.region
.amazonaws.com/octave-moment
Step 3: Upload your sample data to an Amazon S3 bucket
-
Download the following to file
input.txt
.0.857549 -0.987565 -0.467288 -0.252233 -2.298007 0.030077 -1.243324 -0.692745 0.563276 0.772901 -0.508862 -0.404303 -1.363477 -1.812281 -0.296744 -0.203897 0.746533 0.048276 0.075284 0.125395 0.829358 1.246402 -1.310275 -2.737117 0.024629 1.206120 0.895101 1.075549 1.897416 1.383577
-
Create an Amazon S3 bucket called
octave-sample-data-
.your-aws-account-id
-
Upload the file
input.txt
to the Amazon S3 bucket you just created. You should now have a bucket namedoctave-sample-data-
that contains theyour-aws-account-id
input.txt
file.
Step 4: Create a container execution role
-
Copy the following to a file named
role1.json
. Replaceyour-aws-account-id
with your Amazon account ID andaws-region
with the Amazon region of your Amazon resources.Note
This example includes a global condition context key to protect against the confused deputy security problem. For more information, see Cross-service confused deputy prevention.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "sagemaker.amazonaws.com", "iotanalytics.amazonaws.com" ] }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "
your-aws-account-id
" }, "ArnLike": { "aws:SourceArn": "arn:aws:iotanalytics:aws-region
:your-aws-account-id
:dataset/DOC-EXAMPLE-DATASET
" } } ] } -
Create a role that gives access permissions to SageMaker and Amazon IoT Analytics, using the file
role1.json
that you downloaded.aws iam create-role --role-name container-execution-role --assume-role-policy-document file://role1.json
-
Download the following to a file named
policy1.json
and replace
with your account ID (see the second ARN underyour-account-id
Statement:Resource
).{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:PutObject", "s3:GetObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::*-dataset-*/*", "arn:aws:s3:::octave-sample-data-
your-account-id
/*" }, { "Effect": "Allow", "Action": [ "iotanalytics:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:DescribeLogStreams", "logs:GetLogEvents", "logs:PutLogEvents" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:ListAllMyBuckets" ], "Resource" : "*" } ] } -
Create an IAM policy, using the
policy.json
file you just downloaded.aws iam create-policy --policy-name ContainerExecutionPolicy --policy-document file://policy1.json
-
Attach the policy to the role.
aws iam attach-role-policy --role-name container-execution-role --policy-arn arn:aws:iam::
your-account-id
:policy/ContainerExecutionPolicy
Step 5: Create a dataset with a container action
-
Download the following to a fie named
cli-input.json
and replace all instances of
andyour-account-id
with the appropriate values.region
{ "datasetName": "octave_dataset", "actions": [ { "actionName": "octave", "containerAction": { "image": "
your-account-id
.dkr.ecr.region
.amazonaws.com/octave-moment", "executionRoleArn": "arn:aws:iam::your-account-id
:role/container-execution-role", "resourceConfiguration": { "computeType": "ACU_1", "volumeSizeInGB": 1 }, "variables": [ { "name": "octaveResultS3URI", "outputFileUriValue": { "fileName": "output.mat" } }, { "name": "inputDataS3BucketName", "stringValue": "octave-sample-data-your-account-id
" }, { "name": "inputDataS3Key", "stringValue": "input.txt" }, { "name": "order", "stringValue": "3" } ] } } ] } -
Create a dataset using the file
cli-input.json
you just downloaded and edited.aws iotanalytics create-dataset —cli-input-json file://cli-input.json
Step 6: Invoke dataset content generation
-
Run the following command.
aws iotanalytics create-dataset-content --dataset-name octave-dataset
Step 7: Get dataset content
-
Run the following command.
aws iotanalytics get-dataset-content --dataset-name octave-dataset --version-id \$LATEST
-
You might need to wait several minutes until the
DatasetContentState
isSUCCEEDED
.
Step 8: Print the output on Octave
-
Use the Octave shell to print the output from the container by running the following command.
bash> octave octave> load output.mat octave> disp(M) -0.016393 -0.098061 0.380311 -0.564377 -1.318744