Harnessing the Power of Snowflake Artic for AI Report Generation with Streamlit: A Step-by-Step Guide
Snowflake recently unveiled its enterprise LLM (Large Language Model), sparking conversations across the tech community. With its open-source nature, developers are exploring its potential across various applications. Notably, Snowflake emphasizes Artic’s strength in code generation. Given Snowflake’s acquisition of Streamlit, there’s curiosity about Artic’s capabilities in generating Streamlit code for innovative projects. In this guided article, we’ll delve into this aspect.
To follow along with this tutorial, a prerequisite is necessary. I recommend checking out this post for configuring Artic: https://mebinjoy.medium.com/how-to-set-up-snowflakes-enterprise-llm-artic-in-snowflake-without-snowflake-cortex-06362e6c22bb
For those unfamiliar, Streamlit is a Python framework facilitating the development of robust data applications. It’s seamlessly integrated into Snowflake, enhancing its capabilities. You can learn more about Streamlit [here](https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit).
Now, let’s address the central question: How effectively does Artic generate code? Remarkably well, I’d say. It furnishes high-quality code tailored to Streamlit for user queries. We’ll leverage this functionality to craft a straightforward report, incorporating charts, statistics, and more from a selected table.
Before proceeding, it’s essential to note that AI-generated code can occasionally be intricate. I utilize Python’s `exec` function to execute the Python code generated by Artic. I’ve fine-tuned the prompt to prevent the generation of any harmful code. Additionally, I’m working on implementing additional security measures, which I’ll discuss in a future blog post.
To begin, create a new Streamlit app by navigating to “Projects” and selecting “Streamlit.” Ensure that the project is created in the same database and schema where the Artic UDF (User-Defined Function) is configured. Once the app is created, copy and paste the following code:
# Import Python packages
import streamlit as st
import re
from snowflake.snowpark.functions import *
from snowflake.snowpark.context import get_active_session
session = get_active_session()
st.title("AI Data Report Generator")
databases = session.sql("show databases").select(col('"name"'))
db = st.selectbox(label="Select a DB", options=databases)
schemas = session.sql(f"show schemas in {db}").filter(~col('"name"').isin(lit('INFORMATION_SCHEMA'))).select(col('"name"'))
schema = st.selectbox(label="Select a schema", options=schemas)
tables = session.sql(f"show tables in {db}.{schema}").select(col('"name"'))
if tables.count() > 0:
table = st.selectbox(label="Select a table", options=tables)
limitrows = st.selectbox(label="Rows to limit", options=[10, 100, 1000])
if st.button("Generate Report"):
df = session.sql(f"select * from {db}.{schema}.{table} limit {limitrows}").to_pandas()
df_dtypes = df.dtypes.to_dict()
columns = ','.join(df.columns)
prompt = f"""
Create a Streamlit report utilizing an existing DataFrame named df.
Assume that df is already initialized and available as a global variable.
The DataFrame columns are as follows: {columns}. Use only these columns.
The data types of these columns are {df_dtypes}.
Your task is to generate the report using only pandas and Streamlit, without invoking any system-related or external libraries. Do not use any other libraries.
Incorporate Streamlit charts and provide insights wherever applicable.
Only use native charts (bar, area chart, line chart) available in Streamlit.
"""
snow_df = session.create_dataframe([[prompt]], schema=['prompt'])
res_df = snow_df.select(call_udf("artic", col("prompt")).alias('res')).to_pandas()
code = res_df['RES'][0]
python_match = re.search(r"```python\n(.*)\n```", code, re.DOTALL)
if python_match:
exec(python_match.group(1))
else:
st.error("No tables found")
Now, let’s walk through the code briefly. Initially, users are prompted to select the database, schema, and table, with an option to limit rows. Note that the Information Schema table is filtered out to avoid unintended interference. Upon selecting a table, a dynamic query retrieves data, listing the columns and data types, which are then passed to the prompt. You’re encouraged to adjust the prompt for optimal results. The magic unfolds upon clicking “Generate Report” as the filled prompt is passed to Artic. Using regular expressions, we extract the Python code, which is executed using the `exec` function.
In this article, we’ve explored how to leverage Artic to generate a basic report using Streamlit. Stay tuned for more experiments and innovations with Artic!