DataFramer
DataFramer

Get started with DataFramer on Databricks

This page summarizes the main steps. For the full notebook and code, see the official documentation.

1. Prerequisites

  • Run the notebook with a service principal that has Unity Catalog access (USE CATALOG, USE SCHEMA, CREATE TABLE, SELECT, MODIFY on the schema).
  • Get a DataFramer API key from app.aimon.ai → Account → Keys.
  • Create a Databricks secret scope (e.g. dataframer) with: DATAFRAMER_API_KEY, DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET, DATABRICKS_SERVER_HOSTNAME.

2. Install SDK

In a Databricks notebook cell:

%pip install --upgrade pydataframer pydataframer-databricks pyyaml tenacity

3. Initialize client and connector

Use the Dataframer client with your API key from secrets, and the DatabricksConnector for Unity Catalog:

from dataframer import Dataframer
from pydataframer_databricks import DatabricksConnector

databricks_connector = DatabricksConnector(dbutils, scope="dataframer")
client = Dataframer(api_key=dbutils.secrets.get("dataframer", "DATAFRAMER_API_KEY"))

4. Upload seed data and generate spec

Fetch a sample from your Unity Catalog table, upload it as a seed dataset to DataFramer, then create a specification (optionally using a Databricks model like databricks/databricks-claude-sonnet-4-5 with databricks_connector.serving_credentials()).

5. Generate samples and load into Delta

Create a run with your spec and Databricks generation model, poll for completion, then download the generated files and use databricks_connector.load_generated_data() to write to a Delta table.

Full documentation

The complete notebook with all code cells, retries, and evaluation steps is in the DataFramer docs.

Open Databricks integration docs →