Data Stack Index / v 02.06
Verified 2026·05·30
Send a correction
§ For platform engineers

Best data tools
for platform engineers.

The tools that deploy and integrate like infrastructure. 11 indexed, 5 open source.

01
Why these

What fits.

If a tool cannot be expressed in code, it does not ship here. These deploy self-hosted or BYOC, configure through APIs and Terraform, speak OpenLineage, and run at scale unattended.

02
11 tools

The shortlist.

DataHub
OSS SaaS / Self-host

Apache-2.0 metadata platform with a serious managed counterpart — strongest event-driven architecture and column-level SQL lineage in OSS.

Catalog & discovery OSS · free
Great Expectations
OSS SaaS / Self-host acquired

Python-native data validation framework — the OSS standard, now in stewardship transition after the May 2026 acquisition.

Quality & testing OSS · free
Marquez
OSS Self-host

The OpenLineage reference backend — vendor-neutral lineage events from Spark, Airflow, dbt, and Flink, stored and visualised.

Lineage & metadata OSS · free
OpenMetadata
OSS SaaS / Self-host

Apache-2.0 unified metadata platform with a deliberately simple stack — discovery, lineage, quality, and contracts in one project.

Catalog & discovery OSS · free
Unity Catalog
OSS Self-host

Open-source universal catalog for data and AI under Apache-2.0 — Iceberg-REST and Hive-MS compatible, Databricks-led, LF AI hosted.

Catalog & discovery OSS · free
Acceldata
Hybrid

Enterprise data observability with ML data quality, reconciliation, and a built-in catalog — strong on hybrid and on-prem estates.

Quality & testing Contact sales
Datafold
SaaS / Self-host

Pre-merge data diffing and column-level lineage — the tool that shifts data quality left into the pull request.

Quality & testing From $799/custom
Metaplane
SaaS acquired

ML-powered, no-code data observability for the dbt and warehouse stack with automatic column-level lineage — now Metaplane by Datadog.

Quality & testing Published
Monte Carlo
SaaS

Warehouse-side data observability for teams whose problems are upstream of dbt — ingestion, streaming, and across the full pipeline.

Quality & testing Contact sales
Sifflet
SaaS / Self-host

EU-built full-stack data observability pairing ML-driven monitoring with an embedded catalog and field-level lineage.

Quality & testing Contact sales
Soda
SaaS / Self-host

YAML-first data contracts and observability — SodaCL plus Soda Cloud, with anomaly detection and a self-hosted Kubernetes runner.

Quality & testing From $750/custom

How this list sorts.

Open-source options sort first, then alphabetical — no editorial ranking, no paid placement. Every entry matches a structured field on the tool profile; see the methodology, or compare any two on the comparisons page.