Blog

Connecting to and querying Snowflake from Python

Four steps to read data from your warehouse directly into Python

connecting-to-snowflake-hero

Your local Python environment and a massive Snowflake cluster up in the cloud live in very different worlds. Learning to connect them and query data from Snowflake directly into your Python environment or Jupyter notebook means you can get whatever data you need right into the tool where you need it.

There are a few different ways to connect to and query Snowflake from Python. Weā€™ll cover the most common approach in depth, and briefly mention some alternative methods. We'll walk through:

1. Installing the Snowflake Connector for Python (snowflake-connector-python) 2. Authenticating and connecting to your Snowflake data warehouse 3. Running a query! At this point, youā€™ve successfully connected to and queried Snowflake from Python, and you can write any query you want. Job done? Not quite. Weā€™ll also cover: 4. Reading data from a Snowflake query into aĀ pandas DataFrame

If youā€™re running through this live, it should only take you around 10 minutes to go from zero to successful query. Let's get into it!

šŸ Installing the Snowflake Connector for Python

Thereā€™s nothing particularly special about the environment needed for this, so if you already know you have a working Python 3.6+ installation, you can skip to the next section,Ā Installing the snowflake-connector-python package.

Prepping your Python environment

There are two likely scenarios here for how you're accessing Python:

a. Youā€™re using aĀ JupyterĀ notebook or aĀ Jupyter notebook alternative:

You should be pretty much good to go in this case. If you want to make sure, you can run this Python command in a cell and look for a response that's >= 3.6:

from platform import python_version
print(python_version())
Checking the Python version of a notebook
Checking the Python version of a notebook

b. Youā€™re using the terminal / command line / some other Python IDE directly:

  1. Open up your terminal (Terminal app on Mac, command prompt or Powershell on windows, etc.)

  2. Run python --version to check what Python version is installed. Mine prints out Python 3.9.12, but as long as yours is 3.6 or greater, youā€™re perfect.

If you get aĀ command not found: pythonĀ error, or your output isĀ Python 2.x, try runningĀ python3 --version. If that works, then you have separate installations of Python 3 and Python 2, and for the rest of this tutorial youā€™ll need to substitueĀ python3Ā forĀ pythonĀ andĀ pip3Ā forĀ pipĀ in the example commands.

IfĀ python3 --versionĀ also does not work, then you donā€™t have Python installed. The easiest way to get this working is toĀ download the official installerĀ for your machine.

PS: We wonā€™t go deep into the setup of virtual environments here, but if youā€™re doing a lot of Python work directly at the command line, youā€™ll want to read up on them.Ā VirtualenvĀ works well, and lots of people also likeĀ conda.

Installing the snowflake-connector-python package

This step is very easy! It only takes one command to install the official Snowflake Python connector. Weā€™ll also install pandas at the same time, since weā€™ll use it later.

pip install "snowflake-connector-python[pandas]"

Remember, if you had to runĀ python3 -- versionĀ earlier to get a working output, you need to runĀ pip3Ā instead ofĀ pipĀ to install the package. And if youā€™re writing this in a Jupyter notebook, add aĀ !Ā before pip to let this command run as a system call.

!pip install "snowflake-connector-python[pandas]"

ThisĀ pip installĀ command will spin through a bunch of messages and progress bars, and maybe a few warnings about your ā€œpip versionā€ or various package deprecations. This is OK, and unless anything actually says ā€œERRORā€ in red, you can probably ignore it.

pip installing the snowflake connector
pip installing the snowflake connector

Now weā€™re ready to connect to the Snowflake data warehouse.

ā„ļøĀ Authenticating and connecting to Snowflake

In this next part, weā€™ll be working with sensitive information: your Snowflake authentication credentials.Ā You shouldnā€™t ever store these directly in code.Ā You never know what you might unthinkingly do with that codeā€” send it to a coworker to help, post it on Stack Overflow with a question, check it into a public git repo...

Storing Snowflake credentials

Instead of directly entering your user and password credentials, weā€™ll use environment variables. This lets us reference the values from Python without directly storing them in the Python code, entrusting their safekeeping to our computerā€™s environment.

In a terminal (not a Python session) run the following command, replacing the placeholders with your actual username and password:

export snowflakeuser=<your_snowflake_username>
export snowflakepass=<your_snowflake_password>

In a Jupyter notebook, you can also choose to set environment variables right from the notebook using theĀ %envĀ 'magic' command. Make sure to delete the cell after it's run, though, or else the credentials will still be sitting right there! Some online hosted Jupyter notebooks have their own secret variable managers, and you can use those instead.

%env snowflakeuser=<your_snowflake_username>
%env snowflakepass=<your_snowflake_password>

These commands won't print any kind of output or status response, but they will have saved these sensitive values to your current environment. Weā€™ll learn how to reference them in a little bit.

Opening a connection to Snowflake

Now letā€™s start working in Python. Open a new Python session, either in the terminal by runningĀ pythonĀ /Ā python3, or by opening your choice of notebook tool.

Weā€™ll import the packages that we need to work with:

import pandas as pd
import os
import snowflake.connector

Now we can create a connection to Snowflake. Youā€™ll need a few other pieces of information about your Snowflake instance for this (shown as placeholders below), but these are not sensitive and donā€™t need to be stored as environment variables.

ctx = snowflake.connector.connect(
    user=os.environ['snowflakeuser'],
    password=os.environ['snowflakepass'],
    account='your-snowflake-account-identifier-here',
    warehouse='your-warehouse-name',
    role='your-role-name',
    database='your-database-name',
    schema ='your-default-schema-name'
    )
cs = ctx.cursor()

Not sure where to find this information?

Your account identifier is everything betweenĀ https://Ā andĀ .snowflakecomputing.comĀ in your Snowflake web URL. If your URL isĀ https://ab12345.us-east-2.aws.snowflakecomputing.com, then your account identifier isĀ ab12345.us-east-2.aws.

You can find data warehouse, role, database, and schema information from theĀ ContextĀ menu in a new SnowflakeĀ worksheet.

Setting environment variables and opening a connection
Setting environment variables and opening a connection

These commands establish a connection and open a ā€œcursorā€ to your Snowflake data warehouse. Quick dictionary interlude:

  • ctx stands for ā€œcontextā€, a common term used when accessing databases in Python. It refers to an object that holds ā€œstateā€ (aka context) about the current status of the database connection: if itā€™s still authenticated, elapsed time since it was created, etc.

  • cs stands for "cursor", the standard terminology for an object used to actually access records in a database. Kind of like how you can ā€˜pointā€™ at things on the screen with your mouse cursor, this virtual cursor acts as a pointer to rows in your database and lets you point at and select them.

  • Also, notice how weā€™re using the os.environ['snowflakepass'] syntax to reference the environment variable containing our password. This means if a bad actor gets their hands on this code, they still wonā€™t have anything sensitive.

šŸ”Ž Running a query

If those commands ran successfully, weā€™re all authenticated and ready to query!

First, weā€™ll run a simple test query just to establish that our connection worked properly.

sql = "select 1"
cs.execute(sql)
first_row = cs.fetchone()
print(first_row[0])
Running a simple query in Snowflake with Python
Running a simple query in Snowflake with Python

If that returnedĀ 1, then congratulationsā€”Ā youā€™ve just run a query against your Snowflake data warehouse!Ā You can replaceĀ "select 1"Ā with any SQL query you want to run, and theĀ csĀ object will contain the results.

Now that you're running a real query, you probably want to fetch more than one row from the results. In addition toĀ fetchone(), the cursor object has aĀ fetchall()Ā method that helps iterate over the entire result set:

results = cs.fetchall()
for row in results:
    print('%s, %s' % (row[0], row[1]))

But this isnā€™t the best way to work with lots of dataā€” you donā€™t just want to look at the results, you want to do things with them! You could manually replace thatĀ print()Ā line with something that dumps all the rows into a data structure you can useā€¦ or, as we'll explain now, you can use theĀ fetch_pandas_all()Ā method to do all that automatically!

For more information and other ways to fetch results from a cursor,Ā dive into the official Snowflake docs.

šŸ¼ Reading data into a pandas DataFrame

Since most data analytics and data science projects useĀ pandasĀ to crunch data, what we really want is to get results from a Snowflake query into a pandas DataFrame. Luckily, the library weā€™re using makes that really easy to do.

Letā€™s use a real query instead of thatĀ SELECT 1, so we have some more data to work with. Iā€™ve got a demo dataset with a table calledĀ DIM_CUSTOMERS, in a DB calledĀ ANALYTICS. You won't be able to run this query if you copy/paste it, so you'll want to replace it with a query against your own data.

If you get an error likeĀ Database 'your-db-name' does not exist or not authorizedĀ when running a query, double-check yourĀ ctxĀ config from earlier and make sure it includes a role that is authorized to access the database and schema youā€™re querying. Remember, if you create a new context, you then need to re-run the command to establish a new cursor.

sql = 'SELECT * FROM "ANALYTICS"."PROD"."DIM_CUSTOMERS"'
cs.execute(sql)
df = cs.fetch_pandas_all()

This will assign the results of your query to a pandas DataFrame calledĀ df. You can print a sample of the results to make sure it worked:

df.head()
Reading results to a pandas dataframe
Reading results to a pandas dataframe

And youā€™re off to the races! You can use this method to execute any Snowflake query and read the results directly into a pandas DataFrame.

Once you've got data in this format, you can use any pandas functions or libraries from the greater Python ecosystem on your data, jumping into a complex statistical analysis, machine learning, geospatial analysis, or even modifying andĀ writing data backĀ to your data warehouse.

Other methods

Part of the joy (or chaos) of Python is that thereā€™s always more than one way to do something. Thereā€™s lots of other options for connecting to Snowflake, but here are the most relevant alternatives:

  • SQLAlchemy: SQLAlchemy is a generic database toolkit for connecting to any database from Python. Itā€™s not specifically tailored to Snowflake, but works with it just fine. Check out their docs for detailed instructions on connecting SQLAlchemy to Python.

  • pandas.read_sql: This isnā€™t a replacement for the entire process, since you still have to create a Snowflake connection, but instead of running your SQL with a cursor and then using fetch_pandas_all, you could do it all from pandas directly. In my anecdotal experience, this is less performant than the other methods, though it still works fine.

df = pd.read_sql('SELECT * FROM "ANALYTICS"."PROD"."DIM_CUSTOMERS"', ctx)
Using pd.read_sql
Using pd.read_sql
  • And if you use Hex (hint: youā€™re on our website right now!) you donā€™t have to deal with any of these contexts and cursors, and can just write SQL in a SQL cell and get the results as a DataFrame.

This is something we think a lot about at Hex, where we're creating a platform that makes it easy to build and share interactive data products which can help teams be more impactful. If this is is interesting, click below to get started, or to check out opportunities to join our team.