Unity Catalog.
Founded 2024 · San Francisco, CA
Status · ● active
Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted.
Where it fits — and where it doesn't.
Engineering teams that want a vendor-neutral, open-API governance layer for tables (Delta, Iceberg via UniForm, Parquet), volumes, and AI models — particularly when an engine-portable Iceberg REST endpoint matters more than a polished discovery UI. The strongest fit is for organisations standardising on open table formats and wanting one catalog readable by Spark, Trino, DuckDB, and Snowflake (via Iceberg REST). Also a defensible choice for teams already on Databricks who want to keep the same governance model when data spills onto other engines.
You're shopping for a data-discovery catalog with search, glossary, lineage UI, and steward workflows — Unity Catalog OSS is not that, and won't be for a while. Avoid also if you need automated column-level lineage in 2026; that capability lives in Databricks-managed UC, not the OSS. Finally, avoid if your team isn't comfortable consuming a still-maturing project — the OSS hit v0.4 in April 2026, with governance and lineage still on the roadmap.
The honest scorecard.
- Apache-2.0 with project governance moving to LF AI & Data Foundation — credible neutral home
- Iceberg REST catalog API compatibility means UC-cataloged data is readable by Spark, Trino, DuckDB, dbt, Daft, and Snowflake (via Iceberg REST)
- Universal asset model — tables, volumes (files), functions, and AI models in one catalog
- Strong launch ecosystem — AWS, Azure, GCP, NVIDIA, dbt Labs, Fivetran, Confluent, Salesforce, Unstructured
- Backed by Databricks engineering — UC managed has been battle-tested at enterprise scale, and the OSS inherits the schema model
- Feature gap vs Databricks-managed Unity Catalog is large in 2026 — no automated lineage, no fine-grained ACL UI, limited governance
- No discovery UX comparable to DataHub or OpenMetadata — UC OSS is a registry, not a catalog product
- Pull from heterogeneous sources is not the model — UC is push and registration-based, engine-driven
- Strategic optics — Databricks-led 'neutral' project with governance only recently moving to LF AI; some buyers will read it as Databricks-aligned regardless
- Roadmap velocity has been steady but slower than the launch marketing suggested — major governance and lineage features still pending as of v0.4 (April 2026)
What Unity Catalog actually is.
What Unity Catalog actually is
Unity Catalog OSS is a governed registry — a catalog API that engines like Spark, Trino, DuckDB, and dbt can read to discover tables, files, functions, and AI models. The defining technical fact is the API surface: UC speaks Iceberg REST and Hive metastore, which means data registered in UC is readable by anything that reads either of those interfaces. That portability — one catalog, many engines — is the pitch.
The defining strategic fact is the gap between OSS and managed. The managed Unity Catalog on Databricks has automated runtime column-level lineage, fine-grained access control, attribute-based access control, audit logs, and PII classification. The OSS — at v0.4 in April 2026 — has the catalog API, the asset model, storage credentials, external locations, and managed-table support, and that’s about it. The OSS is genuinely useful as a vendor-neutral metadata registry; it is not yet a discovery catalog in the DataHub or OpenMetadata sense.
Where it fits against the alternatives
Against datahub and openmetadata, Unity Catalog OSS solves a different problem. DataHub and OpenMetadata are catalogs you point at your existing stack to crawl metadata, build lineage, and provide a discovery surface. Unity Catalog OSS is a catalog you register data into, so that engines can read it. In a mature stack, the two layers can coexist — UC as the storage/governance registry, DataHub or OpenMetadata as the discovery and lineage UI on top — but most buyers pick one or the other.
Against the legacy Hive metastore, UC is the modern open replacement: same role, better schema, multi-engine API, asset types beyond tables.
On the strategic context
Open-sourcing Unity Catalog in June 2024 was a deliberate move by Databricks to position UC as the neutral catalog standard, in the same way Iceberg has become the neutral table format. The donation to LF AI & Data Foundation as a sandbox project, plus the launch ecosystem (AWS, Azure, GCP, NVIDIA, dbt Labs, Fivetran, Confluent), gives the project credible neutrality on paper. Some buyers will still read it as Databricks-aligned, and that is a fair read — the maintainership concentration matters more than the license. For organisations that are already standardising on Iceberg REST and want a catalog that speaks that protocol natively, Unity Catalog OSS is the strongest current option. For organisations that are not, the choice is more nuanced.
How to evaluate it
The honest test is the protocol surface, not the UI. Pick two engines that need to share a table — Databricks plus Trino, or Spark plus Snowflake via Iceberg REST — register the table in UC OSS, and confirm both engines read it transparently. If they do, UC is solving the problem it is designed to solve. If you also need search, glossary, and lineage, you will want a discovery catalog (DataHub, OpenMetadata) on top, or you will want to budget for the Databricks-managed UC if you’re already on Databricks.
All capabilities by cluster.
Catalog & discovery
Primary · strength 2/3Where it plugs in.
Native warehouse support
The honest pricing breakdown.
Free tier Open-source under Apache-2.0; self-host. Databricks-managed Unity Catalog is a separate paid offering bundled with the Databricks platform and is out of scope for this entry.
What it doesn't do.
Traces data flow at the individual column granularity rather than just between tables. Critical for impact analysis when a column changes, for PII tracking, and for regulatory compliance in financial or healthcare contexts. Column-level lineage is computationally expensive and not all tools that claim "lineage" actually provide it — many stop at table level.
Business Glossary →A managed vocabulary of business terms ("Active Customer", "Recognized Revenue") with definitions, owners, and — critically — links to the physical assets that implement them. Without the linking layer a glossary is just a wiki. With it, you can answer "which dashboards use our official definition of Active Customer?" — the question governance teams actually care about.
OpenLineage-Native →Emits and consumes OpenLineage events as a first-class citizen rather than via a plugin or adapter. Signals commitment to interoperability with other metadata tooling — Marquez, OpenMetadata, Astronomer, and others can consume the same event stream. Increasingly the differentiator between "open" and "proprietary metadata model" observability platforms.
PII Auto-Classification →Automatically identifies columns likely to contain personally identifiable information — email addresses, phone numbers, national IDs — through regex, name heuristics, or ML. Required for meaningful compliance workflows at scale. Quality varies: naive implementations produce heavy false-positive rates. Worth asking vendors about their accuracy benchmarks.
If not Unity Catalog, then what?
Common alternatives
Quick answers.
- Is Unity Catalog open source?
- Yes. Unity Catalog is open source under the Apache-2.0 license, and can be self-hosted at no license cost. A paid managed tier is also offered.
- How much does Unity Catalog cost?
- Unity Catalog publishes pricing, starting around $0 custom. A free tier is available: Open-source under Apache-2.0; self-host. Databricks-managed Unity Catalog is a separate paid offering bundled with the Databricks platform and is out of scope for this entry.
- How is Unity Catalog deployed?
- Unity Catalog is self-hosted — you run it in your own infrastructure.
- Does Unity Catalog work with dbt and my warehouse?
- It integrates with dbt via plugin. Unity Catalog supports databricks, trino, athena, snowflake, bigquery, plus 1 more.
More catalog & discovery tools
Provenance.
Last verified 2026·05·08 against vendor documentation and, where possible, hands-on trial. Spot something off? Send a correction →