§ Capability hubs · Data observability

Browse by capability.

Every hub answers one question — "which tools actually do this?" — from a single structured field, not editorial judgment. 9 capabilities, across all three clusters.

Cluster

Quality & testing

dbt-native testing critical

Runs as part of the dbt execution context — as a package, post-hook, or artifact consumer — rather than monitoring the warehouse from the outside. Tests are defined in the same codebase as models, run on the same schedule, and fail the same CI pipeline. The alternative is warehouse-side monitoring (Monte Carlo-style) which catches issues dbt misses but reacts rather than prevents.

4 tools list this →

ML anomaly detection important

Uses machine learning models trained on historical data to detect values, volumes, or distributions outside expected bounds — without requiring the user to write explicit assertions. Reduces the "I didn't know to test for that" class of incident. Trade-off: requires a training window (typically two to four weeks), can produce false positives on seasonal data, and doesn't replace assertions for business-rule validation.

9 tools list this →

Pre-merge data diffing important

Compares the output of a model change against production before the pull request is merged — showing row-level and aggregate differences. Shifts data quality left into the development workflow. Datafold is the category-defining tool here; dbt's own cloud offering has added similar capabilities. Requires production-scale compute on a development branch, which has cost implications.

1 tool lists this →

Data contracts important

Explicit, versioned agreements between data producers and consumers specifying schema, semantics, SLAs, and breaking-change policy. Enforced in CI for producers and at consumption time for consumers. Distinct from schema validation alone — a contract captures intent, not just structure. Implementations vary wildly; many tools claiming "data contracts" offer only schema checks.

5 tools list this →

Circuit breaker important

Halts downstream execution when a test fails — preventing bad data from propagating into marts, ML features, or BI dashboards. Requires tight integration with the orchestrator (Airflow, Dagster, dbt Cloud). Distinct from alerting-only tools which notify after damage is done.

2 tools list this →

Cluster

Catalog & discovery

Business glossary important

A managed vocabulary of business terms ("Active Customer", "Recognized Revenue") with definitions, owners, and — critically — links to the physical assets that implement them. Without the linking layer a glossary is just a wiki. With it, you can answer "which dashboards use our official definition of Active Customer?" — the question governance teams actually care about.

9 tools list this →

PII auto-classification important

Automatically identifies columns likely to contain personally identifiable information — email addresses, phone numbers, national IDs — through regex, name heuristics, or ML. Required for meaningful compliance workflows at scale. Quality varies: naive implementations produce heavy false-positive rates. Worth asking vendors about their accuracy benchmarks.

9 tools list this →

Cluster

Lineage & metadata

Column-level lineage critical

Traces data flow at the individual column granularity rather than just between tables. Critical for impact analysis when a column changes, for PII tracking, and for regulatory compliance in financial or healthcare contexts. Column-level lineage is computationally expensive and not all tools that claim "lineage" actually provide it — many stop at table level.

16 tools list this →

OpenLineage native important

Emits and consumes OpenLineage events as a first-class citizen rather than via a plugin or adapter. Signals commitment to interoperability with other metadata tooling — Marquez, OpenMetadata, Astronomer, and others can consume the same event stream. Increasingly the differentiator between "open" and "proprietary metadata model" observability platforms.

3 tools list this →

How capability hubs work.

For definitions and commonly-confused terms, see concepts & vocabulary.