Part of a project I was working on required mashing up some data from SharePoint with data stored in datalake. We settled on creating a Databricks notebook to read an input file, query data lake using the input file, and then export an enriched file.
Here’s a high-level overview of what’s going to be created:
Call the notebook, parse the JSON response, loop until the notebook has finished, then respond to the notebook’s output.
In my case, triggering the notebook will require knowing its URL, bearer token, job id, and input parameters.
Parse the response from the HTTP call:
The notebook will take a little time to spin up, then process the input file. The best way to handle this is to leverage a basic do-until loop to check the status of the notebook job. I opted to use a one-minute delay, call the API to get the job status, parse the response, then evaluate if it’s finished.
One thing to note about the do until action, you don’t want it to run for eternity, and to avoid adding complexity to it, you don’t want to add extra evaluations like: if looped X times, stop
If you expand the Change limits option, you can set how many times it loops or change the duration. Here I’ve set the action to stop looping after 20 tries. For more info on this, please check SPGuides for a detailed overview.
The last step in the flow is to process the response from the notebook. If the job is success(full), get the file from blob storage and load it to SharePoint; otherwise, create a Slack alert.
That’s it; using the example above, you can trigger a Databricks notebook using a Flow.
Authentication
When I set this up, my company allowed the use of Personal Access Tokens (PAT).
https://docs.databricks.com/dev-tools/auth.html#pat
The PAT was then used in the Flow to trigger the notebook.
Hey Ian,
Thanks for sharing, man! This idea and solution you’ve got here are awesome! Just wanted to ask, did you use cloud power automate or desktop power automate for this? I’m curious because it seems like the HTTP connector is a premium option in the cloud power automate. Just wondering, you know?
Thanks again for the cool post!
Take care,
Marcelo, it’s a cloud Flow. Looking at the desktop Flow options, the same basic logic should apply.
Hi Ian,
This is really helpful, thanks for the post. This is exactly what I need to work as part of a spike I am working on currently. However, for the exact same flow configuration as described, I am getting “200 Bad request” from the databricks instance. I have ensured all info ie URL, token, job_id, params etc are all correct, the only thing I am not sure of is if this has anything to do with my authorizations/access in the databricks env. Please could you suggest if any minimum access rights are needed to be able to run the api call successfully?
Regards
Souradipto
When I set this up, my company allowed us to use Personal Access Tokens. You can read more about it here: https://docs.databricks.com/dev-tools/auth.html#pat
I’m able to run the code of my notebook but how do I get the resulting SQL extracted so I can save it as .csv?
Hey Terry, I’m no longer working at a client that’s using this stack, but this post outlines how you can get data from the notebook to blob storage:
How to write data from an Azure Databricks notebook to an Azure Blob storage container
Thanks for this post! I was wondering if you had a way of rotating your PAT when it expires? Are you manually going in to change it or is there a way to do so within Power Automate?
Hey Kris,
I haven’t tried this, but you might be able to automate the creation of the token, then reference it in the Flow.
You might try using the API to generate/export the token:
https://stackoverflow.com/a/68096807
https://cloudarchitected.com/2020/01/using-azure-ad-with-the-azure-databricks-api/
If you are having trouble generating the token via a Flow, you try doing it locally using Python or PowerShell, then port the process to a Runbook.