How to set up LangChain with CrateDB

Introduction

LangChain is a framework for developing applications powered by language models. For this tutorial, we are going to use it to interact with CrateDB using only natural language without writing any SQL. To achieve that, you will need a CrateDB instance running, an OpenAI API key, and some Python knowledge.

Set up

If you are new to CrateDB, check the deployment options available here for download options or here for the free cloud deployment, choose the one that suits you best. Once CrateDB is set up, create and populate the table as seen below.

CREATE TABLE IF NOT EXISTS "doc"."people" (
   "name" TEXT,
   "info" OBJECT(DYNAMIC) AS (
      "like" ARRAY(BIGINT),
      "dislike" ARRAY(BIGINT)
   ),
   "house_id" INTEGER,
   "description" TEXT
)

INSERT INTO doc.people VALUES ('John M', {"like"=[1,2,3],"dislike"=[4,5]}, 1, 'nice person'),
                              ('John T', {"like"=[2],"dislike"=[1]}, 2, 'tall person'),
                              ('Mary P', {"like"=[2,3],"dislike"=[7]}, 3, 'smart person');

Using LangChain

First, install the required libraries

pip install 'langchain[openai]' 'sqlalchemy-cratedb'

Once installed, import every component that will be used as follows. Also, before running the code snippet below, make sure to replace the <API_KEY> with your OpenAI API key. Besides that, replace the URI with the correct connection string to your CrateDB instance. Finally, enter your question as a string replacing <TEXT_QUESTION> in the code.

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
import os

os.environ["OPENAI_API_KEY"]= <API_KEY>

llm=OpenAI(temperature=0) #play around with this parameter
#change the URI below to match your CrateDB instance
db = SQLDatabase.from_uri("crate://")
toolkit = SQLDatabaseToolkit(db=db,llm=llm)

agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True
)

agent_executor.run(<TEXT_QUESTION>)

If you ask the question “Who is tall?“ to the model, the result will be as follows. As you can see, it queries all the tables and, based on their names and their columns’ names, it decides which is relevant to the query. You can track the reasoning behind the decision-making process the model uses.

Now that everything is set up, feel free to explore with different questions and tables. Keep in mind, that by connecting to CrateDB with the OpenAI API, you are giving it access to perform a variety of queries on your data, so avoid using it with your production environment or use a specific user with limited permissions. If you are looking for a different model, you can explore the options available on LangChain documentation.

Summary

This tutorial covered the use of LangChain to interact with CrateDB by simply writing questions in English. If you explore different questions, you may encounter some wrong answers, so we recommend you use this tool with caution and always check the reasoning it provides behind every answer. If you are looking for more exciting integrations, have a look at the integrations section in our community.

6 Likes