May 19, 2023

Databricks : Exporting a Spark Dataframe to a DBFS Location

As I started to work with Databricks, one challenge I ran into early was how to export a Spark data frame - as a single file - back to my Azure storage. We've been working with payer pricing transparency data stored in a HIVE schema, and I wanted to export a query containing 500k rows to a single CSV.

Coalesce on the data frame outputs a single file to the dbfs mount. Not using coalesce(1) resulted in mutiple files being generated.

df \
.coalesce(1).write.format("csv") \
.mode("overwrite") \
.save("dbfs:/mnt/pricedata/export/codes.csv", header = 'true')