Sifflet.
Founded 2021 · Paris, France
Status · ● active
EU-built full-stack data observability pairing ML-driven monitoring with an embedded catalog and field-level lineage.
Where it fits — and where it doesn't.
Mid-market and enterprise data teams — especially in Europe — that want one platform spanning quality monitoring, an embedded catalog, and column-level lineage rather than stitching point tools together, with strong compliance posture (ISO 27001, SOC 2 Type 2, GDPR, single-tenant isolation, and a self-host option). The combination of assertion rules, ML/dynamic anomaly detection, automated root cause, and a Flow Stopper circuit breaker makes it a credible single-vendor observability suite.
You need open-source or self-serve-priced tooling, transparent published pricing, native OpenLineage, formal data contracts, code-first value-level data-diff regression testing (use Datafold), or a heavyweight standalone governance catalog with automatic PII classification (Atlan or Collibra). Sifflet deliberately avoids storing PII, which limits it as a governance catalog.
The honest scorecard.
- Spans all three observability clusters in one product — monitoring, an embedded catalog, and field-level lineage
- Both assertion-based rules and ML/dynamic anomaly detection (dynamic freshness/volume, distribution change, proprietary time-series thresholds) to cut alert fatigue
- Automatic field-level (column-level) lineage via SQL query-log parsing across Snowflake, BigQuery, Redshift, and Databricks, plus BI tools
- Flow Stopper circuit breaker and Monitors-as-Code (CLI, YAML, Terraform provider, public API) fit engineering workflows
- Flexible deployment including fully self-hosted on Kubernetes, with ISO 27001, SOC 2 Type 2, GDPR, and single-tenant isolation
- No published pricing — every tier routes to sales
- Proprietary and closed-source, with no community or free self-host tier
- No native OpenLineage support and no formal data-contracts feature
- No automatic PII classification (by design, it avoids storing PII), which limits it as a governance catalog
- Field-level lineage automation is limited to four cloud warehouses, and it is not a value-level data-diff / regression-testing tool
What Sifflet actually is.
What Sifflet is
Sifflet is a full-stack data observability platform from a Paris-based company. It covers three things in one product: data-quality monitoring (a large library of assertion-style and ML/dynamic anomaly monitors), an embedded data catalog, and end-to-end field-level lineage from ingestion through the warehouse and dbt to BI. Around that it layers automated root-cause analysis, incident management, a Flow Stopper circuit breaker, and a set of AI agents. It runs read-only against the source as managed SaaS, a hybrid agent model, or fully self-hosted.
Where it fits
Sifflet competes most directly with monte-carlo and bigeye as a full-stack, ML-driven observability suite, but leans harder into catalog and field-level lineage, giving it overlap with atlan and datahub on discovery. Against soda or great-expectations it is a managed, broader platform rather than a code-first testing framework; against datafold it does impact analysis in CI but not value-level data diffing. Its EU origin, GDPR posture, and self-host option are the clearest differentiators for European and regulated buyers.
On the three-cluster span
Spanning monitoring, catalog, and lineage in one product is genuinely unusual, and the field-level lineage (parsed from warehouse query logs across the four major cloud warehouses) is a real strength. The caveats are at the edges: the catalog is a competent embedded one, not a standalone governance platform — no automatic PII classification, by design — and lineage automation is limited to those four warehouses. Score it as a strong observability suite with a useful catalog attached, not as a catalog-first tool.
How to evaluate it
Connect it read-only to your warehouse and let the dynamic monitors learn before judging signal quality — the proprietary forecasting is meant to reduce alert fatigue on seasonal data, so give it a couple of weeks. Then test the two differentiators directly: trace a field end-to-end through the lineage graph (ingestion → warehouse → dbt → BI), and wire Flow Stopper into an Airflow DAG to confirm it actually halts a pipeline on a failing rule. If compliance is the driver, scope the self-host option into your security review early.
All capabilities by cluster.
Quality & testing
Primary · strength 3/3Catalog & discovery
Secondary · strength 2/3Lineage & metadata
Secondary · strength 3/3Where it plugs in.
Native warehouse support
The honest pricing breakdown.
Sales-only tier Entry (up to 500 assets) / Growth (up to 1,000) / Enterprise (1,000+) — all contact-sales; a free-trial CTA exists but with no published terms
What it doesn't do.
Emits and consumes OpenLineage events as a first-class citizen rather than via a plugin or adapter. Signals commitment to interoperability with other metadata tooling — Marquez, OpenMetadata, Astronomer, and others can consume the same event stream. Increasingly the differentiator between "open" and "proprietary metadata model" observability platforms.
Data Contracts →Explicit, versioned agreements between data producers and consumers specifying schema, semantics, SLAs, and breaking-change policy. Enforced in CI for producers and at consumption time for consumers. Distinct from schema validation alone — a contract captures intent, not just structure. Implementations vary wildly; many tools claiming "data contracts" offer only schema checks.
PII Auto-Classification →Automatically identifies columns likely to contain personally identifiable information — email addresses, phone numbers, national IDs — through regex, name heuristics, or ML. Required for meaningful compliance workflows at scale. Quality varies: naive implementations produce heavy false-positive rates. Worth asking vendors about their accuracy benchmarks.
Pre-Merge Diffing →Compares the output of a model change against production before the pull request is merged — showing row-level and aggregate differences. Shifts data quality left into the development workflow. Datafold is the category-defining tool here; dbt's own cloud offering has added similar capabilities. Requires production-scale compute on a development branch, which has cost implications.
Drill into one capability.
Other key features
If not Sifflet, then what?
Common alternatives
Quick answers.
- Is Sifflet open source?
- No. Sifflet is a proprietary product.
- How much does Sifflet cost?
- Sifflet does not publish list pricing — it is sales-led, so you request a quote. There is no free tier.
- How is Sifflet deployed?
- Sifflet can run as managed SaaS or be self-hosted.
- Does Sifflet work with dbt and my warehouse?
- It has a native dbt integration. Sifflet supports snowflake, bigquery, redshift, databricks, athena, plus 4 more.
More quality & testing tools
Provenance.
Last verified 2026·05·30 against vendor documentation and, where possible, hands-on trial. Spot something off? Send a correction →