Data Stack Index / v 02.06
Verified 2026·05·30
Send a correction
§ Capability hubs · Data observability

Browse by capability.

Every hub answers one question — "which tools actually do this?" — from a single structured field, not editorial judgment. 9 capabilities, across all three clusters.

01
Cluster

Quality & testing

dbt-native testing critical

Runs as part of the dbt execution context — as a package, post-hook, or artifact consumer — rather than monitoring the warehouse from the outside. Tests are defined in the same codebase as models, run on the same schedule, and fail the same CI pipeline. The alternative is warehouse-side monitoring (Monte Carlo-style) which catches issues dbt misses but reacts rather than prevents.

4 tools list this →
ML anomaly detection important

Uses machine learning models trained on historical data to detect values, volumes, or distributions outside expected bounds — without requiring the user to write explicit assertions. Reduces the "I didn't know to test for that" class of incident. Trade-off: requires a training window (typically two to four weeks), can produce false positives on seasonal data, and doesn't replace assertions for business-rule validation.

9 tools list this →
Pre-merge data diffing important

Compares the output of a model change against production before the pull request is merged — showing row-level and aggregate differences. Shifts data quality left into the development workflow. Datafold is the category-defining tool here; dbt's own cloud offering has added similar capabilities. Requires production-scale compute on a development branch, which has cost implications.

1 tool lists this →
Data contracts important

Explicit, versioned agreements between data producers and consumers specifying schema, semantics, SLAs, and breaking-change policy. Enforced in CI for producers and at consumption time for consumers. Distinct from schema validation alone — a contract captures intent, not just structure. Implementations vary wildly; many tools claiming "data contracts" offer only schema checks.

5 tools list this →
Circuit breaker important

Halts downstream execution when a test fails — preventing bad data from propagating into marts, ML features, or BI dashboards. Requires tight integration with the orchestrator (Airflow, Dagster, dbt Cloud). Distinct from alerting-only tools which notify after damage is done.

2 tools list this →
02
Cluster

Catalog & discovery

03
Cluster

Lineage & metadata

How capability hubs work.

For definitions and commonly-confused terms, see concepts & vocabulary.