Data Stack Index / v 02.06
Verified 2026·05·08
Send a correction
Catalog & discovery · primary Self-hosted only Open source

Unity Catalog.

Databricks
Founded 2024 · San Francisco, CA
Status · ● active

Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted.

Pricing starts OSS · paid tiers
Deployment Self-hosted only
License Open source
Free tier Open-source under Apache-2.0; self-host. Databricks-managed Unity Catalog is a separate paid offering bundled with the Databricks platform and is out of scope for this entry.
Persona data engineer · platform engineer
Company size scaleup → mid market → enterprise
dbt integration Plugin
Warehouses databricks · trino · athena · snowflake +2
OpenLineage none
Founded 2024
HQ San Francisco, CA
Last verified 2026·05·08
01
Verdict

Where it fits — and where it doesn't.

● Ideal for

Engineering teams that want a vendor-neutral, open-API governance layer for tables (Delta, Iceberg via UniForm, Parquet), volumes, and AI models — particularly when an engine-portable Iceberg REST endpoint matters more than a polished discovery UI. The strongest fit is for organisations standardising on open table formats and wanting one catalog readable by Spark, Trino, DuckDB, and Snowflake (via Iceberg REST). Also a defensible choice for teams already on Databricks who want to keep the same governance model when data spills onto other engines.

○ Avoid if

You're shopping for a data-discovery catalog with search, glossary, lineage UI, and steward workflows — Unity Catalog OSS is not that, and won't be for a while. Avoid also if you need automated column-level lineage in 2026; that capability lives in Databricks-managed UC, not the OSS. Finally, avoid if your team isn't comfortable consuming a still-maturing project — the OSS hit v0.4 in April 2026, with governance and lineage still on the roadmap.

02
Strengths & weaknesses

The honest scorecard.

  • [+] Apache-2.0 with project governance moving to LF AI & Data Foundation — credible neutral home
  • [+] Iceberg REST catalog API compatibility means UC-cataloged data is readable by Spark, Trino, DuckDB, dbt, Daft, and Snowflake (via Iceberg REST)
  • [+] Universal asset model — tables, volumes (files), functions, and AI models in one catalog
  • [+] Strong launch ecosystem — AWS, Azure, GCP, NVIDIA, dbt Labs, Fivetran, Confluent, Salesforce, Unstructured
  • [+] Backed by Databricks engineering — UC managed has been battle-tested at enterprise scale, and the OSS inherits the schema model
  • [−] Feature gap vs Databricks-managed Unity Catalog is large in 2026 — no automated lineage, no fine-grained ACL UI, limited governance
  • [−] No discovery UX comparable to DataHub or OpenMetadata — UC OSS is a registry, not a catalog product
  • [−] Pull from heterogeneous sources is not the model — UC is push and registration-based, engine-driven
  • [−] Strategic optics — Databricks-led 'neutral' project with governance only recently moving to LF AI; some buyers will read it as Databricks-aligned regardless
  • [−] Roadmap velocity has been steady but slower than the launch marketing suggested — major governance and lineage features still pending as of v0.4 (April 2026)
03
Editorial

What Unity Catalog actually is.

What Unity Catalog actually is

Unity Catalog OSS is a governed registry — a catalog API that engines like Spark, Trino, DuckDB, and dbt can read to discover tables, files, functions, and AI models. The defining technical fact is the API surface: UC speaks Iceberg REST and Hive metastore, which means data registered in UC is readable by anything that reads either of those interfaces. That portability — one catalog, many engines — is the pitch.

The defining strategic fact is the gap between OSS and managed. The managed Unity Catalog on Databricks has automated runtime column-level lineage, fine-grained access control, attribute-based access control, audit logs, and PII classification. The OSS — at v0.4 in April 2026 — has the catalog API, the asset model, storage credentials, external locations, and managed-table support, and that’s about it. The OSS is genuinely useful as a vendor-neutral metadata registry; it is not yet a discovery catalog in the DataHub or OpenMetadata sense.

Where it fits against the alternatives

Against datahub and openmetadata, Unity Catalog OSS solves a different problem. DataHub and OpenMetadata are catalogs you point at your existing stack to crawl metadata, build lineage, and provide a discovery surface. Unity Catalog OSS is a catalog you register data into, so that engines can read it. In a mature stack, the two layers can coexist — UC as the storage/governance registry, DataHub or OpenMetadata as the discovery and lineage UI on top — but most buyers pick one or the other.

Against the legacy Hive metastore, UC is the modern open replacement: same role, better schema, multi-engine API, asset types beyond tables.

On the strategic context

Open-sourcing Unity Catalog in June 2024 was a deliberate move by Databricks to position UC as the neutral catalog standard, in the same way Iceberg has become the neutral table format. The donation to LF AI & Data Foundation as a sandbox project, plus the launch ecosystem (AWS, Azure, GCP, NVIDIA, dbt Labs, Fivetran, Confluent), gives the project credible neutrality on paper. Some buyers will still read it as Databricks-aligned, and that is a fair read — the maintainership concentration matters more than the license. For organisations that are already standardising on Iceberg REST and want a catalog that speaks that protocol natively, Unity Catalog OSS is the strongest current option. For organisations that are not, the choice is more nuanced.

How to evaluate it

The honest test is the protocol surface, not the UI. Pick two engines that need to share a table — Databricks plus Trino, or Spark plus Snowflake via Iceberg REST — register the table in UC OSS, and confirm both engines read it transparently. If they do, UC is solving the problem it is designed to solve. If you also need search, glossary, and lineage, you will want a discovery catalog (DataHub, OpenMetadata) on top, or you will want to budget for the Databricks-managed UC if you’re already on Databricks.

04
Capability spec

All capabilities by cluster.

Catalog & discovery

Primary · strength 2/3
01 Business glossary
02 Glossary linked to assets
03 Natural language search
04 Ownership tracking
05 Data contracts
06 Governance workflows
07 Access request workflow
08 PII auto-classification
09 Tag propagation
10 Free self-hosted
Metadata ingestion push api
Search approach keyword
Asset types tables · files · ml models · api endpoints
05
Warehouses & integrations

Where it plugs in.

Native warehouse support

databrickstrinoathenasnowflakebigqueryduckdb
01dbt — Plugin
02Airflow — API only
03OpenLineage — none
04API access — full
05Terraform provider
06Public SDK — python, java
06
Pricing

The honest pricing breakdown.

Pricing model free forever
Charged per custom
Published ● Yes — listed on vendor site
Starts at $0 custom
Free tier ● Yes
OSS self-host ● Available

Free tier Open-source under Apache-2.0; self-host. Databricks-managed Unity Catalog is a separate paid offering bundled with the Databricks platform and is out of scope for this entry.

07
Notable missing

What it doesn't do.

Column-Level Lineage →

Traces data flow at the individual column granularity rather than just between tables. Critical for impact analysis when a column changes, for PII tracking, and for regulatory compliance in financial or healthcare contexts. Column-level lineage is computationally expensive and not all tools that claim "lineage" actually provide it — many stop at table level.

Business Glossary →

A managed vocabulary of business terms ("Active Customer", "Recognized Revenue") with definitions, owners, and — critically — links to the physical assets that implement them. Without the linking layer a glossary is just a wiki. With it, you can answer "which dashboards use our official definition of Active Customer?" — the question governance teams actually care about.

OpenLineage-Native →

Emits and consumes OpenLineage events as a first-class citizen rather than via a plugin or adapter. Signals commitment to interoperability with other metadata tooling — Marquez, OpenMetadata, Astronomer, and others can consume the same event stream. Increasingly the differentiator between "open" and "proprietary metadata model" observability platforms.

PII Auto-Classification →

Automatically identifies columns likely to contain personally identifiable information — email addresses, phone numbers, national IDs — through regex, name heuristics, or ML. Required for meaningful compliance workflows at scale. Quality varies: naive implementations produce heavy false-positive rates. Worth asking vendors about their accuracy benchmarks.

08
Alternatives & migrations

If not Unity Catalog, then what?

Common alternatives

DataHub → Best-in-class column-level SQL lineage parser (SQLGlot-based, benchmarked at 97–99% accuracy on standard corpora) ↔ Unity Catalog vs DataHub
OpenMetadata → Highest connector count in the OSS catalog space (120+) — particularly strong on dashboards, ML, and pipeline systems ↔ Unity Catalog vs OpenMetadata
See all 6 Unity Catalog alternatives, scored and compared →
09
Common questions

Quick answers.

Is Unity Catalog open source?
Yes. Unity Catalog is open source under the Apache-2.0 license, and can be self-hosted at no license cost. A paid managed tier is also offered.
How much does Unity Catalog cost?
Unity Catalog publishes pricing, starting around $0 custom. A free tier is available: Open-source under Apache-2.0; self-host. Databricks-managed Unity Catalog is a separate paid offering bundled with the Databricks platform and is out of scope for this entry.
How is Unity Catalog deployed?
Unity Catalog is self-hosted — you run it in your own infrastructure.
Does Unity Catalog work with dbt and my warehouse?
It integrates with dbt via plugin. Unity Catalog supports databricks, trino, athena, snowflake, bigquery, plus 1 more.

More catalog & discovery tools

Provenance.

Last verified 2026·05·08 against vendor documentation and, where possible, hands-on trial. Spot something off? Send a correction →