This is a quick intro into getting started with Azure Databricks and CrateDB.
Setup Azure Databricks
- Add a new Databricks service to your Azure Subscription
- Once this is done use “Launch Workspace”
- After you are signed into Azure Databricks use the common task “New Cluster” to start a cluster for your Spark jobs execution
- Install the pgjdbc library (as of time of publishing
org.postgresql:postgresql:42.2.23
) from Maven for your cluster
Connect to CrateDB: Scala example
- Create a new notebook with default language Scala
- Add the following code and run the notebook
val crateUsername = "<username>"
val cratePassword = "<password>"
val postgresqlUrl = "jdbc:postgresql://<url-to-server>:5432/?sslmode=require";
val tableName = "<tablename>"
val jdbcDF = spark.read
.format("jdbc")
.option("url", postgresqlUrl)
.option("driver", "org.postgresql.Driver")
.option("dbtable", tableName)
.option("user", crateUsername)
.option("password", cratePassword)
.option("fetchsize", 100000)
.load()
jdbcDF.head(n=10);
- You should see the results from CrateDB
Connect to CrateDB: Python example
- Create a new notebook of default language Python
- Add the following code and run the notebook
crateUsername = "<username>"
cratePassword = "<password>"
postgresqlUrl = "jdbc:postgresql://<url-to-server>:5432/?sslmode=require";
tableName = "<tablename>"
jdbcDF = spark.read \
.format("jdbc") \
.option("url", postgresqlUrl) \
.option("driver", "org.postgresql.Driver") \
.option("dbtable", tableName) \
.option("user", crateUsername) \
.option("password", cratePassword) \
.load()
jdbcDF.head(n=10)
- You should see the results from CrateDB