v 02.06 verified 2026·05·30 21 tools indexed 1 vertical · 3 clusters

Data tooling,
indexed.

A reference catalog of tools in the modern data stack — structured for comparison, sourced from vendor documentation, free of paid placement, verified by hand. Three views of the same data: by cluster, by capability, by tool.

Tools indexed 21

Open source 7

Published pricing 6 · 15 sales-led

Last verification 2026·05·30

Browse by cluster

Three clusters, one vertical.

§ 01 Live

Quality
& testing

11 tools indexed

Pre-production and post-production checks on warehouse correctness, freshness, and schema. Where your data quality story actually starts.

Browse cluster →

§ 02 Live

Catalog
& discovery

7 tools indexed

Asset inventory, business glossary, ownership tracking, and search for warehouse data.

Browse cluster →

§ 03 Live

Lineage
& metadata

3 tools indexed

Cross-system data flow tracking, impact analysis, and column-level lineage from source through consumption.

Browse cluster →

Every tool, by name

The full ledger.

Idx Tool · Vendor One-liner Pricing Deploy License Cluster

Acceldata Acceldata · est. 2018 · Campbell, CA

Enterprise data observability with ML data quality, reconciliation, and a built-in catalog — strong on hybrid and on-prem estates. Contact sales Hybrid Proprietary Quality & testing → 02

Alation Alation · est. 2012 · Redwood City, CA

The incumbent that defined the data catalog — behavioral search, deep governance, and strong column-level lineage. Contact sales SaaS · Self-host Proprietary Catalog & discovery → 03

Anomalo Anomalo · est. 2018

GUI-first ML anomaly detection at petabyte scale — pivoting in 2026 around agentic AI and unstructured-data monitoring. Contact sales SaaS · Self-host Proprietary Quality & testing → 04

Atlan Atlan · est. 2019 · Singapore

Enterprise catalog and governance plane positioned as the AI context layer — connectors, lineage, contracts, and an MCP server for agents. Contact sales Hybrid Proprietary Catalog & discovery → 05

Bigeye Bigeye · est. 2019

Enterprise data observability with Autometrics ML thresholds — repositioning in 2026 as an AI Trust Platform with runtime governance. Contact sales SaaS · Self-host Proprietary Quality & testing → 06

Cloudera Data Lineage (Octopai) Cloudera · est. 2016 · Santa Clara, CA

SaaS lineage with 60+ connectors and a 24-hour deploy story — built for hybrid enterprise estates without IBM-stack baggage. Contact sales SaaS Proprietary Lineage & metadata → 07

Collibra Collibra · est. 2008 · Brussels, Belgium

Enterprise data-and-AI governance incumbent: catalog, glossary, workflow stewardship, lineage, and a separate ML data-quality module. Contact sales SaaS Proprietary Catalog & discovery → 08

Datafold Datafold · est. 2020 · San Francisco, CA

Pre-merge data diffing and column-level lineage — the tool that shifts data quality left into the pull request. From $799 custom SaaS · Self-host Proprietary Quality & testing → 09

DataHub Acryl Data · est. 2021 · Palo Alto, CA

Apache-2.0 metadata platform with a serious managed counterpart — strongest event-driven architecture and column-level SQL lineage in OSS. OSS · free SaaS · Self-host OSS Catalog & discovery → 10

dbt-expectations Metaplane (Datadog) · est. 2020

Open-source dbt package adding 50+ Great Expectations-style assertions as native dbt tests that run in your own warehouse. OSS · paid tiers Self-host OSS Quality & testing → 11

Elementary Elementary Data · est. 2021 · Tel Aviv, Israel

The dbt-native observability layer — tests, anomaly detection, and lineage that live inside your dbt project. OSS · free SaaS · Self-host OSS Quality & testing → 12

Great Expectations Great Expectations · est. 2017

Python-native data validation framework — the OSS standard, now in stewardship transition after the May 2026 acquisition. OSS · free SaaS · Self-host OSS Quality & testing → 13

IBM Manta Data Lineage IBM · est. 2016 · Armonk, NY

The deepest scanner-driven lineage product on the market — built for legacy estates (SAP, Cognos, Informatica) modern catalogs miss. Contact sales SaaS · Self-host Proprietary Lineage & metadata → 14

Marquez Marquez Project · est. 2018

The OpenLineage reference backend — vendor-neutral lineage events from Spark, Airflow, dbt, and Flink, stored and visualised. OSS · paid tiers Self-host OSS Lineage & metadata → 15

Metaplane Metaplane (Datadog) · est. 2019 · Boston, MA

ML-powered, no-code data observability for the dbt and warehouse stack with automatic column-level lineage — now Metaplane by Datadog. Published SaaS Proprietary Quality & testing → 16

Monte Carlo Monte Carlo Data · est. 2019 · San Francisco, CA

Warehouse-side data observability for teams whose problems are upstream of dbt — ingestion, streaming, and across the full pipeline. Contact sales SaaS Proprietary Quality & testing → 17

OpenMetadata Collate · est. 2021 · Saratoga, CA

Apache-2.0 unified metadata platform with a deliberately simple stack — discovery, lineage, quality, and contracts in one project. OSS · free SaaS · Self-host OSS Catalog & discovery → 18

Secoda Secoda (Atlassian) · est. 2021 · Toronto, Ontario, Canada

AI-native data catalog, lineage, and observability from Toronto — acquired by Atlassian in December 2025 to power Rovo AI. Contact sales SaaS · Self-host Proprietary Catalog & discovery → 19

Sifflet Sifflet · est. 2021 · Paris, France

EU-built full-stack data observability pairing ML-driven monitoring with an embedded catalog and field-level lineage. Contact sales SaaS · Self-host Proprietary Quality & testing → 20

Soda Soda Data · est. 2019 · Brussels, Belgium

YAML-first data contracts and observability — SodaCL plus Soda Cloud, with anomaly detection and a self-hosted Kubernetes runner. From $750 custom SaaS · Self-host Proprietary Quality & testing → 21

Unity Catalog Databricks · est. 2024 · San Francisco, CA

Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted. OSS · paid tiers Self-host OSS Catalog & discovery →

Browse by capability

Cut the index by feature.

dbt-native testing tools → Tools with ML anomaly detection → Tools with pre-merge diffing → Tools that enforce data contracts → Tools with circuit-breaker support →

How tools are categorized

One schema, three cuts.

Each tool is placed in a primary cluster — the problem it was built for — and scored 0–3 against every cluster in the vertical. A tool with a primary cluster of quality testing and a strength of 2 against lineage appears on the quality hub as a primary entry and on the lineage hub as a strong secondary entry.

Categories, scoring rubrics, sourcing, and verification cadence are documented in full on the methodology page.

What this catalog is.

Data Stack Index is a structured reference for data tooling. Every tool is described against the same schema — deployment, pricing, capabilities, integrations, alternatives — so that two tools can be compared on the same fields rather than on incompatible vendor language.

The catalog is free, takes no money from vendors, and lists no tool because the vendor asked. Selection criteria, categorization, sourcing, and update cadence are documented on the methodology and independence pages. The project is built and maintained by one person; the coverage page is honest about what's indexed and what isn't.

Data tooling,indexed.

Three clusters, one vertical.

Quality& testing

Catalog& discovery

Lineage& metadata

The full ledger.

Cut the index by feature.

One schema, three cuts.

What this catalog is.

Data tooling,
indexed.

Quality
& testing

Catalog
& discovery

Lineage
& metadata