Step 1: Create the Input and Output Streams - Amazon Kinesis Data Analytics for SQL Applications Developer Guide
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

For new projects, we recommend that you use the new Managed Service for Apache Flink Studio over Kinesis Data Analytics for SQL Applications. Managed Service for Apache Flink Studio combines ease of use with advanced analytical capabilities, enabling you to build sophisticated stream processing applications in minutes.

Step 1: Create the Input and Output Streams

Before you create an Amazon Kinesis Data Analytics application for the Hotspots example, you create two Kinesis data streams. Configure one of the streams as the streaming source for your application, and the other stream as the destination where Kinesis Data Analytics persists your application output.

Step 1.1: Create the Kinesis Data Streams

In this section, you create two Kinesis data streams: ExampleInputStream and ExampleOutputStream.

Create these data streams using the console or the Amazon CLI.

  • To create the data streams using the console:

    1. Sign in to the Amazon Web Services Management Console and open the Kinesis console at https://console.amazonaws.cn/kinesis.

    2. Choose Data Streams in the navigation pane.

    3. Choose Create Kinesis stream, and create a stream with one shard named ExampleInputStream.

    4. Repeat the previous step, creating a stream with one shard named ExampleOutputStream.

  • To create data streams using the Amazon CLI:

    • Create streams (ExampleInputStream and ExampleOutputStream) using the following Kinesis create-stream Amazon CLI command. To create the second stream, which the application will use to write output, run the same command, changing the stream name to ExampleOutputStream.

      $ aws kinesis create-stream \ --stream-name ExampleInputStream \ --shard-count 1 \ --region us-west-2 \ --profile adminuser $ aws kinesis create-stream \ --stream-name ExampleOutputStream \ --shard-count 1 \ --region us-west-2 \ --profile adminuser

Step 1.2: Write Sample Records to the Input Stream

In this step, you run Python code to continuously generate sample records and write to the ExampleInputStream stream.

{"x": 7.921782426109737, "y": 8.746265312709893, "is_hot": "N"} {"x": 0.722248626580026, "y": 4.648868803193405, "is_hot": "Y"}
  1. Install Python and pip.

    For information about installing Python, see the Python website.

    You can install dependencies using pip. For information about installing pip, see Installation on the pip website.

  2. Run the following Python code. This code does the following:

    • Generates a potential hotspot somewhere in the (X, Y) plane.

    • Generates a set of 1,000 points for each hotspot. Of these points, 20 percent are clustered around the hotspot. The rest are generated randomly within the entire space.

    • The put-record command writes the JSON records to the stream.

    Important

    Do not upload this file to a web server because it contains your Amazon credentials.

    import json from pprint import pprint import random import time import boto3 STREAM_NAME = "ExampleInputStream" def get_hotspot(field, spot_size): hotspot = { "left": field["left"] + random.random() * (field["width"] - spot_size), "width": spot_size, "top": field["top"] + random.random() * (field["height"] - spot_size), "height": spot_size, } return hotspot def get_record(field, hotspot, hotspot_weight): rectangle = hotspot if random.random() < hotspot_weight else field point = { "x": rectangle["left"] + random.random() * rectangle["width"], "y": rectangle["top"] + random.random() * rectangle["height"], "is_hot": "Y" if rectangle is hotspot else "N", } return {"Data": json.dumps(point), "PartitionKey": "partition_key"} def generate( stream_name, field, hotspot_size, hotspot_weight, batch_size, kinesis_client ): """ Generates points used as input to a hotspot detection algorithm. With probability hotspot_weight (20%), a point is drawn from the hotspot; otherwise, it is drawn from the base field. The location of the hotspot changes for every 1000 points generated. """ points_generated = 0 hotspot = None while True: if points_generated % 1000 == 0: hotspot = get_hotspot(field, hotspot_size) records = [ get_record(field, hotspot, hotspot_weight) for _ in range(batch_size) ] points_generated += len(records) pprint(records) kinesis_client.put_records(StreamName=stream_name, Records=records) time.sleep(0.1) if __name__ == "__main__": generate( stream_name=STREAM_NAME, field={"left": 0, "width": 10, "top": 0, "height": 10}, hotspot_size=1, hotspot_weight=0.2, batch_size=10, kinesis_client=boto3.client("kinesis"), )

Next Step

Step 2: Create the Kinesis Data Analytics Application