Creating a Studio notebook with Kinesis Data Streams
This tutorial describes how to create a Studio notebook that uses a Kinesis data stream as a source.
This tutorial contains the following sections:
Setup
Before you create a Studio notebook, create a Kinesis data stream
(ExampleInputStream
). Your
application uses this stream for the application source.
You can create this stream using either the Amazon Kinesis console or the following
Amazon CLI command. For console instructions, see Creating and Updating Data
Streams in the Amazon Kinesis Data Streams Developer Guide. Name the stream
ExampleInputStream
and set the Number of open shards to 1
.
To create the stream (ExampleInputStream
) using the Amazon CLI, use the
following Amazon Kinesis create-stream
Amazon CLI command.
$ aws kinesis create-stream \ --stream-name ExampleInputStream \ --shard-count 1 \ --region us-east-1 \ --profile adminuser
Create an Amazon Glue table
Your Studio notebook uses an Amazon Glue database for metadata about your Kinesis Data Streams data source.
You can either manually create the database first or you can let Kinesis Data Analytics create it for you when you create the notebook. Similarly, you can either manually create the table as described in this section, or you can use the create table connector code for Kinesis Data Analytics in your notebook within Apache Zeppelin to create your table via a DDL statement. You can then check in Amazon Glue to make sure the table was correctly created.
Create a Table
Sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/
. If you don't already have a Amazon Glue database, choose Databases from the left navigation bar. Choose Add Database. In the Add database window, enter
default
for Database name. Choose Create.In the left navigation bar, choose Tables. In the Tables page, choose Add tables, Add table manually.
In the Set up your table's properties page, enter
stock
for the Table name. Make sure you select the database you created previously. Choose Next.In the Add a data store page, choose Kinesis. For the Stream name, enter
ExampleInputStream
. For Kinesis source URL, choose enterhttps://kinesis.us-east-1.amazonaws.com
. If you copy and paste the Kinesis source URL, be sure to delete any leading or trailing spaces. Choose Next.In the Classification page, choose JSON. Choose Next.
In the Define a Schema page, choose Add Column to add a column. Add columns with the following properties:
Column name Data type ticker
string
price
double
Choose Next.
On the next page, verify your settings, and choose Finish.
-
Choose your newly created table from the list of tables.
-
Choose Edit table and add a property with the key
kinesisanalytics.proctime
and the valueproctime
. -
Choose Apply.
Create a Studio notebook with Kinesis Data Streams
Now that you have created the resources your application uses, you create your Studio notebook.
To create your application, you can use either the Amazon Web Services Management Console or the Amazon CLI.
Create a Studio notebook using the Amazon Web Services Management Console
Open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics/home?region=us-east-1#/applications/dashboard
. In the Kinesis Data Analytics applications page, choose the Studio tab. Choose Create Studio notebook.
Note You can also create a Studio notebook from the Amazon MSK or Kinesis Data Streams consoles by selecting your input Amazon MSK cluster or Kinesis data stream, and choosing Process data in real time.
In the Create Studio notebook page, provide the following information:
Enter
MyNotebook
for the name of the notebook.Choose default for Amazon Glue database.
Choose Create Studio notebook.
In the MyNotebook page, choose Run. Wait for the Status to show Running. Charges apply when the notebook is running.
Create a Studio notebook using the Amazon CLI
To create your Studio notebook using the Amazon CLI, do the following:
Verify your account ID. You need this value to create your application.
Create the role
arn:aws:iam::
and add the following permissions to the auto-created role by console.AccountID
:role/ZeppelinRole"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
Create a file called
create.json
with the following contents. Replace the placeholder values with your information.{ "ApplicationName": "MyNotebook", "RuntimeEnvironment": "ZEPPELIN-FLINK-2_0", "ApplicationMode": "INTERACTIVE", "ServiceExecutionRole": "arn:aws:iam::
AccountID
:role/ZeppelinRole", "ApplicationConfiguration": { "ApplicationSnapshotConfiguration": { "SnapshotsEnabled": false }, "ZeppelinApplicationConfiguration": { "CatalogConfiguration": { "GlueDataCatalogConfiguration": { "DatabaseARN": "arn:aws:glue:us-east-1:AccountID
:database/default" } } } } }Run the following command to create your application:
aws kinesisanalyticsv2 create-application --cli-input-json file://create.json
When the command completes, you see output that shows the details for your new Studio notebook. The following is an example of the output.
{ "ApplicationDetail": { "ApplicationARN": "arn:aws:kinesisanalytics:us-east-1:012345678901:application/MyNotebook", "ApplicationName": "MyNotebook", "RuntimeEnvironment": "ZEPPELIN-FLINK-2_0", "ApplicationMode": "INTERACTIVE", "ServiceExecutionRole": "arn:aws:iam::012345678901:role/ZeppelinRole", ...
Run the following command to start your application. Replace the sample value with your account ID.
aws kinesisanalyticsv2 start-application --application-arn arn:aws:kinesisanalytics:us-east-1:
012345678901
:application/MyNotebook\
Send data to your Kinesis data stream
To send test data to your Kinesis data stream, do the following:
Open the Kinesis Data Generator
. Choose Create a Cognito User with CloudFormation.
The Amazon CloudFormation console opens with the Kinesis Data Generator template. Choose Next.
In the Specify stack details page, enter a username and password for your Cognito user. Choose Next.
In the Configure stack options page, choose Next.
In the Review Kinesis-Data-Generator-Cognito-User page, choose the I acknowledge that Amazon CloudFormation might create IAM resources. checkbox. Choose Create Stack.
Wait for the Amazon CloudFormation stack to finish being created. After the stack is complete, open the Kinesis-Data-Generator-Cognito-User stack in the Amazon CloudFormation console, and choose the Outputs tab. Open the URL listed for the KinesisDataGeneratorUrl output value.
In the Amazon Kinesis Data Generator page, log in with the credentials you created in step 4.
On the next page, provide the following values:
Region us-east-1
Stream/delivery stream ExampleInputStream
Records per second 1
For Record Template, paste the following code:
{ "ticker": "{{random.arrayElement( ["AMZN","MSFT","GOOG"] )}}", "price": {{random.number( { "min":10, "max":150 } )}} }
Choose Send data.
The generator will send data to your Kinesis data stream.
Leave the generator running while you complete the next section.
Test your Studio notebook
In this section, you use your Studio notebook to query data from your Kinesis data stream.
Open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics/home?region=us-east-1#/applications/dashboard
. On the Kinesis Data Analytics applications page, choose the Studio notebook tab. Choose MyNotebook.
In the MyNotebook page, choose Open in Apache Zeppelin.
The Apache Zeppelin interface opens in a new tab.
In the Welcome to Zeppelin! page, choose Zeppelin Note.
In the Zeppelin Note page, enter the following query into a new note:
%flink.ssql(type=update) select * from stock
Choose the run icon.
After a short time, the note displays data from the Kinesis data stream.
To open the Apache Flink Dashboard for your application to view operational aspects, choose FLINK JOB. For more information about the Flink Dashboard, see Apache Flink Dashboard in the Kinesis Data Analytics Developer Guide.
For more examples of Flink Streaming SQL queries, see
Queries