Managed catalog, or open-source catalog I run myself?

OSS catalogs (DataHub, OpenMetadata, Unity Catalog — all Apache-2.0) run at real scale, but you operate stateful services (search index, metadata store, ingestion) much like a database. Managed options (Atlan, Collate, DataHub Cloud) trade that operational load for cost plus polished governance and AI. The deciding factor is platform-engineering appetite, not licence price.

How important is column-level lineage for my catalog?

It depends on who uses the catalog. For impact analysis on warehouse changes — what breaks if I rename this column — column-level lineage is what makes the catalog actionable. For pure discovery and documentation, table-level is often enough. Check lineage_granularity; the strongest catalogs derive column-level lineage from SQL parsing rather than manual entry.

Do I need a real business glossary, or is technical metadata enough?

A real business glossary links each term to the assets that embody it, propagates ownership and tags along that linkage, and is versioned — not just a wiki of definitions. If governance and stewardship are the goal, glossary depth is the best signal a catalog is in active use. If you only need search and docs, technical metadata may be enough.

Which catalog actually integrates with my BI tools — Tableau, Looker, Power BI?

Most major catalogs ingest Tableau, Looker, and Power BI metadata, but depth differs — some pull dashboards and fields with lineage back to warehouse columns, others only list the dashboard. The distinction to check per catalog: does BI lineage reach warehouse columns, or stop at the asset?

How does PII / sensitive-data auto-classification compare across catalogs?

Auto-classification scans values or schemas to flag PII, PHI, and PCI without manual tagging. Coverage and accuracy vary widely; false positives on synthetic-looking strings and false negatives on internal IDs are common. Treat it as a first pass a steward reviews — check each catalog's pii_auto_classification and whether sensitivity tags propagate along lineage.

What does "AI-powered catalog" actually mean tool by tool, in 2026?

It ranges from classical search with an LLM summary bolted on, to fully LLM-native experiences that answer natural-language questions, generate documentation, and reason over lineage. AI-powered is one of the noisiest claims in this space — look at search_approach and natural_language_search, and test it on your own metadata before trusting the demo.

§ Cluster · Catalog & discovery

Catalog
& discovery.

Tools for finding the right asset — and the right context — without asking on Slack.

Tools indexed 7 primary · 5 strong secondary

Open-source options 3

Sales-led pricing 6 of 7

Last verification 2026·05·08

Sibling clusters Quality & testing · Lineage & metadata

Catalog tooling splits along two fault lines that matter more than the marketing language. The first is how metadata gets in: pull connectors that the catalog runs against your stack on a schedule, push APIs that your jobs emit events to, or some hybrid. The second is what the catalog is for: a discovery surface where humans search (“where’s the customer table?”), a governance surface where decisions are made and recorded (“who owns this and can it leave the EU?”), or both.

The 2024–2026 wave added a third axis: how AI-native the search and authoring experience is. Some catalogs lean on classical search with a glossary on top; some are fully LLM-driven, generating documentation, answering natural-language questions, and reasoning over lineage. Depth varies dramatically — judge it on search_approach and natural_language_search, not the label.

Open-source matters more in this cluster than in quality testing. DataHub and OpenMetadata are both Apache-2.0 and run by real organisations at real scale. Unity Catalog became Apache-2.0 in mid-2024 and is now a third serious open-source option.

Questions this page answers

Questions a buyer actually asks.

01Managed catalog, or open-source catalog I run myself?: OSS catalogs (DataHub, OpenMetadata, Unity Catalog — all Apache-2.0) run at real scale, but you operate stateful services (search index, metadata store, ingestion) much like a database. Managed options (Atlan, Collate, DataHub Cloud) trade that operational load for cost plus polished governance and AI. The deciding factor is platform-engineering appetite, not licence price.
02How important is column-level lineage for my catalog?: It depends on who uses the catalog. For impact analysis on warehouse changes — what breaks if I rename this column — column-level lineage is what makes the catalog actionable. For pure discovery and documentation, table-level is often enough. Check lineage_granularity; the strongest catalogs derive column-level lineage from SQL parsing rather than manual entry.
03Do I need a real business glossary, or is technical metadata enough?: A real business glossary links each term to the assets that embody it, propagates ownership and tags along that linkage, and is versioned — not just a wiki of definitions. If governance and stewardship are the goal, glossary depth is the best signal a catalog is in active use. If you only need search and docs, technical metadata may be enough.
04Which catalog actually integrates with my BI tools — Tableau, Looker, Power BI?: Most major catalogs ingest Tableau, Looker, and Power BI metadata, but depth differs — some pull dashboards and fields with lineage back to warehouse columns, others only list the dashboard. The distinction to check per catalog: does BI lineage reach warehouse columns, or stop at the asset?
05How does PII / sensitive-data auto-classification compare across catalogs?: Auto-classification scans values or schemas to flag PII, PHI, and PCI without manual tagging. Coverage and accuracy vary widely; false positives on synthetic-looking strings and false negatives on internal IDs are common. Treat it as a first pass a steward reviews — check each catalog's pii_auto_classification and whether sensitivity tags propagate along lineage.
06What does "AI-powered catalog" actually mean tool by tool, in 2026?: It ranges from classical search with an LLM summary bolted on, to fully LLM-native experiences that answer natural-language questions, generate documentation, and reason over lineage. AI-powered is one of the noisiest claims in this space — look at search_approach and natural_language_search, and test it on your own metadata before trusting the demo.

Capability matrix

What each catalog actually ships.

Tool	01 Glossary	02 NL search	03 Contracts	04 Govern flows	05 Access req	06 PII auto	07 OpenLineage	08 Col lineage	09 Free self-host
Alation
Atlan
Collibra
DataHub
OpenMetadata
Secoda
Unity Catalog

Connector counts, ingestion model, and asset types vary substantially — open any tool name above for the full capability spec.

How to choose

Three trade-offs that matter.

Axis 01

Open-source, or managed?

The OSS catalogs (DataHub, OpenMetadata, Unity Catalog) mean running a search index, metadata store, and ingestion layer yourself — stateful infra, not a binary. Managed (Atlan, DataHub Cloud, Collate) buys you that plus a proprietary AI/governance layer. Decide on platform-engineering bandwidth, not licence price.

Axis 02

Engineering-led, or steward-led?

DataHub is more developer-shaped — event-driven architecture, strong SQL parser, Kafka MCL. Atlan is more steward-shaped — polished governance UX, certifications, glossary as a first-class artifact. OpenMetadata sits between, with a simpler stack and fast governance feature cadence. Pick by who actually uses the catalog day-to-day.

Axis 03

Discovery catalog, or governed registry?

Discovery catalogs (DataHub, OpenMetadata, Atlan) crawl your stack and present a search-and-lineage UI for humans. Governed registries (Unity Catalog OSS) are read by engines — Spark, Trino, DuckDB — to access tables. Different problems, both legitimate; some mature stacks run a registry below a discovery catalog.

Also strong at catalog & discovery — primarily categorised elsewhere.

These tools earn their primary classification in another cluster but score 2 or 3 of 3 on catalog capability — the cluster overlap is real, not aspirational. Worth a look when consolidating two budgets into one.

Acceldata → Primary: quality testing · Catalog strength 2/3
Cloudera Data Lineage (Octopai) → Primary: lineage metadata · Catalog strength 2/3
IBM Manta Data Lineage → Primary: lineage metadata · Catalog strength 2/3
Monte Carlo → Primary: quality testing · Catalog strength 2/3
Sifflet → Primary: quality testing · Catalog strength 2/3

By specific capability

Drill into one feature.

Catalogs with a real business glossary → Catalogs that auto-classify PII → Tools that enforce data contracts →

Head-to-head

Compare two side by side.

Every same-cluster pair a buyer realistically shortlists — see all comparisons.

The OSS-vs-managed catch.

The catch on this cluster is the OSS-vs-managed gap: Unity Catalog and DataHub do materially less in their free/self-hosted tiers than in their managed offerings, and several matrix capabilities are paid-only — each tool page spells out which.

Catalog& discovery.

Questions a buyer actually asks.

7 tools, different shapes.

Alation

Atlan

Collibra

DataHub

OpenMetadata

Secoda

Unity Catalog