Merge pull request #166 from rstudio/blairj09/issue164

Databricks updates
rstudio · Jun 22, 2024 · 238272b · 238272b
2 parents 9e79eda + 7d6e8e1
commit 238272b
Show file tree

Hide file tree

Showing 3 changed files with 20 additions and 12 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -20,6 +20,8 @@ website:
         href: get-started/index.qmd
       - text: Guides
         href: guides/index.qmd
+      - text: Databricks
+        href: deployment/databricks-connect.qmd
       - text: Deployment
         href: deployment/index.qmd
       - text: News
@@ -107,7 +109,7 @@ website:
           href: deployment/yarn-cluster-emr.qmd
         - text: Cloudera cluster
           href: deployment/cloudera-aws.qmd
-      - section: Databricks Connect (v2)
+      - section: Databricks
         contents:
         - text: Getting Started
           href: deployment/databricks-connect.qmd

diff --git a/deployment/databricks-connect.qmd b/deployment/databricks-connect.qmd
@@ -1,5 +1,5 @@
 ---
-title: Databricks Connect v2
+title: Databricks
 format:
   html:
     theme: default
@@ -36,14 +36,15 @@ library(pysparklyr)
 
 ## Intro
 
-Databricks Connect enables the interaction with Spark clusters remotely.
-It is based on Spark Connect, which enables remote connectivity thanks
-to its new decoupled client-server architecture. This allows users to
-interact with the Spark cluster without having to run the jobs from a
-node. Additionally, it removes the requirement of having Java components
-installed in the user's machine.
+[Databricks Connect](https://docs.databricks.com/en/dev-tools/databricks-connect/index.html#)
+enables the interaction with Spark clusters remotely. It is based on [Spark
+Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html),
+which enables remote connectivity thanks to its new decoupled client-server
+architecture. This allows users to interact with the Spark cluster without
+having to run the jobs from a node. Additionally, it removes the requirement of
+having Java components installed in the user's machine.
 
-The API is very different than the "legacy" Spark and using the Spark
+The API is very different than "legacy" Spark and using the Spark
 shell is no longer an option. We have decided to use Python as the new
 interface. In turn, Python uses *gRPC* to interact with Spark.
 
@@ -126,7 +127,7 @@ For users of Posit Workbench, this is the recommended approach to
 setting up credentials as it provides an additional layer of security. If you
 are not currently using Posit Workbench, feel free to skip this section. 
 
-Details for how to setup and configure this feature can be found
+Details for how to setup and configure the Databricks integration can be found
 [here](https://docs.posit.co/ide/server-pro/integration/databricks.html).
 
 For users who have signed into a Databricks Workspace via Posit
@@ -144,7 +145,7 @@ space *(1)*, and an authentication token *(2)*. For default values,
 those applications initially look for these environment variables:
 
 -   `DATABRICKS_HOST` - Your Workspace Instance URL
--   `DATABRICKS_TOKEN` - Your Personal Authentication Token *(Not needed if using Posit Workbench)*
+-   `DATABRICKS_TOKEN` - Your Personal Authentication Token
 
 Environment variables work well, because they rarely vary between
 projects. The thing that will change more often is the cluster you are
@@ -169,6 +170,11 @@ DATABRICKS_TOKEN="Enter here your personal token" # Not needed if using Posit Wo
 **This is a one time operation.** After saving and closing the file,
 restart your R session.
 
+::: callout-note
+If you are using Posit Workbench and have signed into your Databricks Workspace
+when starting your session, you do not need to set these environment variables.
+:::
+
 ## First time connecting
 
 After setting up your Host and Token environment variables, you can now

diff --git a/deployment/index.qmd b/deployment/index.qmd
@@ -17,7 +17,7 @@ title: "Deployment"
 - [Setting up a Standalone Cluster in AWS EC2](/deployment/stand-alone-aws.qmd)
 - [Spark Connect](/deployment/spark-connect.qmd)
 
-### Databricks Connect (v2)
+### Databricks
 - [Getting Started](/deployment/databricks-connect.qmd)    
 - [Run R code in Databricks](/deployment/databricks-connect-udfs.qmd)
 - [Deploying to Posit Connect](/deployment/databricks-posit-connect.qmd)