Save SQL query results in a pandas DataFrame - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Save SQL query results in a pandas DataFrame

You can store the results of your SQL query in a pandas DataFrame. The easiest way to output query results to a DataFrame is to use the SQL editor features of the JupyterLab SQL extension query-result dropdown and choose the Pandas dataframe option.

Alternatively, you can add the parameter --output '{"format": "DATAFRAME", "dataframe_name": "dataframe_name"}' to your connection string.

For example, the following query extracts details of customers with the highest balance from the Customer table in Snowflake's TPCH_SF1 database, using both pandas and SQL:

  • In this example, we extract all the data from the customer table and save then in a DataFrame named all_customer_data.

    %%sm_sql --output '{"format": "DATAFRAME", "dataframe_name": "all_customer_data"}' --metastore-id snowflake-connection-name --metastore-type GLUE_CONNECTION SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
    Saved results to all_customer_data
  • Next, we extract the details of the highest account balance from the DataFrame.

    all_customer_data.loc[all_customer_data['C_ACCTBAL'].idxmax()].values
    array([61453, 'Customer#000061453', 'RxNgWcyl5RZD4qOYnyT3', 15, '25-819-925-1077', Decimal('9999.99'), 'BUILDING','es. carefully regular requests among the blithely pending requests boost slyly alo'], dtype=object)