Data Stack Index / v 02.06
Verified 2026·05·30
Send a correction
v 02.06 verified 2026·05·30 21 tools indexed 1 vertical · 3 clusters

Data tooling,
indexed.

A reference catalog of tools in the modern data stack — structured for comparison, sourced from vendor documentation, free of paid placement, verified by hand. Three views of the same data: by cluster, by capability, by tool.

Tools indexed 21
Open source 7
Published pricing 6 · 15 sales-led
Last verification 2026·05·30
01
Browse by cluster

Three clusters, one vertical.

02
Every tool, by name

The full ledger.

Idx Tool · Vendor One-liner Pricing Deploy License Cluster
01
Acceldata Acceldata · est. 2018 · Campbell, CA
Enterprise data observability with ML data quality, reconciliation, and a built-in catalog — strong on hybrid and on-prem estates. Contact sales Hybrid Proprietary Quality & testing
02
Alation Alation · est. 2012 · Redwood City, CA
The incumbent that defined the data catalog — behavioral search, deep governance, and strong column-level lineage. Contact sales SaaS · Self-host Proprietary Catalog & discovery
03
Anomalo Anomalo · est. 2018
GUI-first ML anomaly detection at petabyte scale — pivoting in 2026 around agentic AI and unstructured-data monitoring. Contact sales SaaS · Self-host Proprietary Quality & testing
04
Atlan Atlan · est. 2019 · Singapore
Enterprise catalog and governance plane positioned as the AI context layer — connectors, lineage, contracts, and an MCP server for agents. Contact sales Hybrid Proprietary Catalog & discovery
05
Bigeye Bigeye · est. 2019
Enterprise data observability with Autometrics ML thresholds — repositioning in 2026 as an AI Trust Platform with runtime governance. Contact sales SaaS · Self-host Proprietary Quality & testing
06
Cloudera Data Lineage (Octopai) Cloudera · est. 2016 · Santa Clara, CA
SaaS lineage with 60+ connectors and a 24-hour deploy story — built for hybrid enterprise estates without IBM-stack baggage. Contact sales SaaS Proprietary Lineage & metadata
07
Collibra Collibra · est. 2008 · Brussels, Belgium
Enterprise data-and-AI governance incumbent: catalog, glossary, workflow stewardship, lineage, and a separate ML data-quality module. Contact sales SaaS Proprietary Catalog & discovery
08
Datafold Datafold · est. 2020 · San Francisco, CA
Pre-merge data diffing and column-level lineage — the tool that shifts data quality left into the pull request. From $799 custom SaaS · Self-host Proprietary Quality & testing
09
DataHub Acryl Data · est. 2021 · Palo Alto, CA
Apache-2.0 metadata platform with a serious managed counterpart — strongest event-driven architecture and column-level SQL lineage in OSS. OSS · free SaaS · Self-host OSS Catalog & discovery
10
dbt-expectations Metaplane (Datadog) · est. 2020
Open-source dbt package adding 50+ Great Expectations-style assertions as native dbt tests that run in your own warehouse. OSS · paid tiers Self-host OSS Quality & testing
11
Elementary Elementary Data · est. 2021 · Tel Aviv, Israel
The dbt-native observability layer — tests, anomaly detection, and lineage that live inside your dbt project. OSS · free SaaS · Self-host OSS Quality & testing
12
Great Expectations Great Expectations · est. 2017
Python-native data validation framework — the OSS standard, now in stewardship transition after the May 2026 acquisition. OSS · free SaaS · Self-host OSS Quality & testing
13
IBM Manta Data Lineage IBM · est. 2016 · Armonk, NY
The deepest scanner-driven lineage product on the market — built for legacy estates (SAP, Cognos, Informatica) modern catalogs miss. Contact sales SaaS · Self-host Proprietary Lineage & metadata
14
Marquez Marquez Project · est. 2018
The OpenLineage reference backend — vendor-neutral lineage events from Spark, Airflow, dbt, and Flink, stored and visualised. OSS · paid tiers Self-host OSS Lineage & metadata
15
Metaplane Metaplane (Datadog) · est. 2019 · Boston, MA
ML-powered, no-code data observability for the dbt and warehouse stack with automatic column-level lineage — now Metaplane by Datadog. Published SaaS Proprietary Quality & testing
16
Monte Carlo Monte Carlo Data · est. 2019 · San Francisco, CA
Warehouse-side data observability for teams whose problems are upstream of dbt — ingestion, streaming, and across the full pipeline. Contact sales SaaS Proprietary Quality & testing
17
OpenMetadata Collate · est. 2021 · Saratoga, CA
Apache-2.0 unified metadata platform with a deliberately simple stack — discovery, lineage, quality, and contracts in one project. OSS · free SaaS · Self-host OSS Catalog & discovery
18
Secoda Secoda (Atlassian) · est. 2021 · Toronto, Ontario, Canada
AI-native data catalog, lineage, and observability from Toronto — acquired by Atlassian in December 2025 to power Rovo AI. Contact sales SaaS · Self-host Proprietary Catalog & discovery
19
Sifflet Sifflet · est. 2021 · Paris, France
EU-built full-stack data observability pairing ML-driven monitoring with an embedded catalog and field-level lineage. Contact sales SaaS · Self-host Proprietary Quality & testing
20
Soda Soda Data · est. 2019 · Brussels, Belgium
YAML-first data contracts and observability — SodaCL plus Soda Cloud, with anomaly detection and a self-hosted Kubernetes runner. From $750 custom SaaS · Self-host Proprietary Quality & testing
21
Unity Catalog Databricks · est. 2024 · San Francisco, CA
Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted. OSS · paid tiers Self-host OSS Catalog & discovery
03
Browse by capability

Cut the index by feature.

04
How tools are categorized

One schema, three cuts.

Each tool is placed in a primary cluster — the problem it was built for — and scored 0–3 against every cluster in the vertical. A tool with a primary cluster of quality testing and a strength of 2 against lineage appears on the quality hub as a primary entry and on the lineage hub as a strong secondary entry.

Categories, scoring rubrics, sourcing, and verification cadence are documented in full on the methodology page.

What this catalog is.

Data Stack Index is a structured reference for data tooling. Every tool is described against the same schema — deployment, pricing, capabilities, integrations, alternatives — so that two tools can be compared on the same fields rather than on incompatible vendor language.

The catalog is free, takes no money from vendors, and lists no tool because the vendor asked. Selection criteria, categorization, sourcing, and update cadence are documented on the methodology and independence pages. The project is built and maintained by one person; the coverage page is honest about what's indexed and what isn't.