Data Stack Index / v 02.06
Verified 2026·05·08
Send a correction
§ Cluster · Catalog & discovery

Catalog
& discovery.

Tools for finding the right asset — and the right context — without asking on Slack.

Tools indexed 7 primary · 5 strong secondary
Open-source options 3
Sales-led pricing 6 of 7
Last verification 2026·05·08

Catalog tooling splits along two fault lines that matter more than the marketing language. The first is how metadata gets in: pull connectors that the catalog runs against your stack on a schedule, push APIs that your jobs emit events to, or some hybrid. The second is what the catalog is for: a discovery surface where humans search (“where’s the customer table?”), a governance surface where decisions are made and recorded (“who owns this and can it leave the EU?”), or both.

The 2024–2026 wave added a third axis: how AI-native the search and authoring experience is. Some catalogs lean on classical search with a glossary on top; some are fully LLM-driven, generating documentation, answering natural-language questions, and reasoning over lineage. Depth varies dramatically — judge it on search_approach and natural_language_search, not the label.

Open-source matters more in this cluster than in quality testing. DataHub and OpenMetadata are both Apache-2.0 and run by real organisations at real scale. Unity Catalog became Apache-2.0 in mid-2024 and is now a third serious open-source option.

01
Questions this page answers

Questions a buyer actually asks.

01Managed catalog, or open-source catalog I run myself?
OSS catalogs (DataHub, OpenMetadata, Unity Catalog — all Apache-2.0) run at real scale, but you operate stateful services (search index, metadata store, ingestion) much like a database. Managed options (Atlan, Collate, DataHub Cloud) trade that operational load for cost plus polished governance and AI. The deciding factor is platform-engineering appetite, not licence price.
02How important is column-level lineage for my catalog?
It depends on who uses the catalog. For impact analysis on warehouse changes — what breaks if I rename this column — column-level lineage is what makes the catalog actionable. For pure discovery and documentation, table-level is often enough. Check lineage_granularity; the strongest catalogs derive column-level lineage from SQL parsing rather than manual entry.
03Do I need a real business glossary, or is technical metadata enough?
A real business glossary links each term to the assets that embody it, propagates ownership and tags along that linkage, and is versioned — not just a wiki of definitions. If governance and stewardship are the goal, glossary depth is the best signal a catalog is in active use. If you only need search and docs, technical metadata may be enough.
04Which catalog actually integrates with my BI tools — Tableau, Looker, Power BI?
Most major catalogs ingest Tableau, Looker, and Power BI metadata, but depth differs — some pull dashboards and fields with lineage back to warehouse columns, others only list the dashboard. The distinction to check per catalog: does BI lineage reach warehouse columns, or stop at the asset?
05How does PII / sensitive-data auto-classification compare across catalogs?
Auto-classification scans values or schemas to flag PII, PHI, and PCI without manual tagging. Coverage and accuracy vary widely; false positives on synthetic-looking strings and false negatives on internal IDs are common. Treat it as a first pass a steward reviews — check each catalog's pii_auto_classification and whether sensitivity tags propagate along lineage.
06What does "AI-powered catalog" actually mean tool by tool, in 2026?
It ranges from classical search with an LLM summary bolted on, to fully LLM-native experiences that answer natural-language questions, generate documentation, and reason over lineage. AI-powered is one of the noisiest claims in this space — look at search_approach and natural_language_search, and test it on your own metadata before trusting the demo.
02
Primary tools in this cluster

7 tools, different shapes.

Scope ▸
Capability ▸
7 / 7 shown

Alation

Alation · est. 2012 · Redwood City, CA

SaaS / Self-host

The incumbent that defined the data catalog — behavioral search, deep governance, and strong column-level lineage.

Pricing
Contact sales
Built for
data steward
First strength
Category-defining catalog with behavioral, usage-ranked search and pioneering natural-language search

Atlan

Atlan · est. 2019 · Singapore

Hybrid

Enterprise catalog and governance plane positioned as the AI context layer — connectors, lineage, contracts, and an MCP server for agents.

Pricing
Contact sales
Built for
data steward
First strength
Polished UX and onboarding

Collibra

Collibra · est. 2008 · Brussels, Belgium

SaaS

Enterprise data-and-AI governance incumbent: catalog, glossary, workflow stewardship, lineage, and a separate ML data-quality module.

Pricing
Contact sales
Built for
data steward
First strength
The deepest governance and stewardship tooling in the cluster

DataHub

Acryl Data · est. 2021 · Palo Alto, CA

OSS SaaS / Self-host

Apache-2.0 metadata platform with a serious managed counterpart — strongest event-driven architecture and column-level SQL lineage in OSS.

Pricing
OSS · free
Built for
data engineer
First strength
Best-in-class column-level SQL lineage parser (SQLGlot-based, benchmarked at 97–99% accuracy on standard corpora)

OpenMetadata

Collate · est. 2021 · Saratoga, CA

OSS SaaS / Self-host

Apache-2.0 unified metadata platform with a deliberately simple stack — discovery, lineage, quality, and contracts in one project.

Pricing
OSS · free
Built for
data engineer
First strength
Highest connector count in the OSS catalog space (120+)

Secoda

Secoda (Atlassian) · est. 2021 · Toronto, Ontario, Canada

SaaS / Self-host acquired

AI-native data catalog, lineage, and observability from Toronto — acquired by Atlassian in December 2025 to power Rovo AI.

Pricing
Contact sales
Built for
data steward
First strength
AI-native search and assistant as the primary interface

Unity Catalog

Databricks · est. 2024 · San Francisco, CA

OSS Self-host

Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted.

Pricing
OSS · free
Built for
data engineer
First strength
Apache-2
03
Capability matrix

What each catalog actually ships.

Tool 01 Glossary 02 NL search 03 Contracts 04 Govern flows 05 Access req 06 PII auto 07 OpenLineage 08 Col lineage 09 Free self-host
Alation
Atlan
Collibra
DataHub
OpenMetadata
Secoda
Unity Catalog

Connector counts, ingestion model, and asset types vary substantially — open any tool name above for the full capability spec.

04
How to choose

Three trade-offs that matter.

Axis 01

Open-source, or managed?

The OSS catalogs (DataHub, OpenMetadata, Unity Catalog) mean running a search index, metadata store, and ingestion layer yourself — stateful infra, not a binary. Managed (Atlan, DataHub Cloud, Collate) buys you that plus a proprietary AI/governance layer. Decide on platform-engineering bandwidth, not licence price.

Axis 02

Engineering-led, or steward-led?

DataHub is more developer-shaped — event-driven architecture, strong SQL parser, Kafka MCL. Atlan is more steward-shaped — polished governance UX, certifications, glossary as a first-class artifact. OpenMetadata sits between, with a simpler stack and fast governance feature cadence. Pick by who actually uses the catalog day-to-day.

Axis 03

Discovery catalog, or governed registry?

Discovery catalogs (DataHub, OpenMetadata, Atlan) crawl your stack and present a search-and-lineage UI for humans. Governed registries (Unity Catalog OSS) are read by engines — Spark, Trino, DuckDB — to access tables. Different problems, both legitimate; some mature stacks run a registry below a discovery catalog.

Also strong at catalog & discovery — primarily categorised elsewhere.

These tools earn their primary classification in another cluster but score 2 or 3 of 3 on catalog capability — the cluster overlap is real, not aspirational. Worth a look when consolidating two budgets into one.

05
By specific capability

Drill into one feature.

06
Head-to-head

Compare two side by side.

Every same-cluster pair a buyer realistically shortlists — see all comparisons.

The OSS-vs-managed catch.

The catch on this cluster is the OSS-vs-managed gap: Unity Catalog and DataHub do materially less in their free/self-hosted tiers than in their managed offerings, and several matrix capabilities are paid-only — each tool page spells out which.