Data Stack Index / v 02.06
Verified 2026·05·30
Send a correction
Quality & testing · primary Catalog & discovery · strong secondary Lineage & metadata · strong secondary SaaS · Self-hosted Proprietary

Sifflet.

Sifflet
Founded 2021 · Paris, France
Status · ● active

EU-built full-stack data observability pairing ML-driven monitoring with an embedded catalog and field-level lineage.

Pricing starts Contact sales
Deployment SaaS · Self-hosted
License Proprietary
Free tier No
Persona data engineer · analytics engineer
Company size scaleup → mid market → enterprise
dbt integration Native
Warehouses snowflake · bigquery · redshift · databricks +5
OpenLineage none
Founded 2021
HQ Paris, France
Last verified 2026·05·30
01
Verdict

Where it fits — and where it doesn't.

● Ideal for

Mid-market and enterprise data teams — especially in Europe — that want one platform spanning quality monitoring, an embedded catalog, and column-level lineage rather than stitching point tools together, with strong compliance posture (ISO 27001, SOC 2 Type 2, GDPR, single-tenant isolation, and a self-host option). The combination of assertion rules, ML/dynamic anomaly detection, automated root cause, and a Flow Stopper circuit breaker makes it a credible single-vendor observability suite.

○ Avoid if

You need open-source or self-serve-priced tooling, transparent published pricing, native OpenLineage, formal data contracts, code-first value-level data-diff regression testing (use Datafold), or a heavyweight standalone governance catalog with automatic PII classification (Atlan or Collibra). Sifflet deliberately avoids storing PII, which limits it as a governance catalog.

02
Strengths & weaknesses

The honest scorecard.

  • [+] Spans all three observability clusters in one product — monitoring, an embedded catalog, and field-level lineage
  • [+] Both assertion-based rules and ML/dynamic anomaly detection (dynamic freshness/volume, distribution change, proprietary time-series thresholds) to cut alert fatigue
  • [+] Automatic field-level (column-level) lineage via SQL query-log parsing across Snowflake, BigQuery, Redshift, and Databricks, plus BI tools
  • [+] Flow Stopper circuit breaker and Monitors-as-Code (CLI, YAML, Terraform provider, public API) fit engineering workflows
  • [+] Flexible deployment including fully self-hosted on Kubernetes, with ISO 27001, SOC 2 Type 2, GDPR, and single-tenant isolation
  • [−] No published pricing — every tier routes to sales
  • [−] Proprietary and closed-source, with no community or free self-host tier
  • [−] No native OpenLineage support and no formal data-contracts feature
  • [−] No automatic PII classification (by design, it avoids storing PII), which limits it as a governance catalog
  • [−] Field-level lineage automation is limited to four cloud warehouses, and it is not a value-level data-diff / regression-testing tool
03
Editorial

What Sifflet actually is.

What Sifflet is

Sifflet is a full-stack data observability platform from a Paris-based company. It covers three things in one product: data-quality monitoring (a large library of assertion-style and ML/dynamic anomaly monitors), an embedded data catalog, and end-to-end field-level lineage from ingestion through the warehouse and dbt to BI. Around that it layers automated root-cause analysis, incident management, a Flow Stopper circuit breaker, and a set of AI agents. It runs read-only against the source as managed SaaS, a hybrid agent model, or fully self-hosted.

Where it fits

Sifflet competes most directly with monte-carlo and bigeye as a full-stack, ML-driven observability suite, but leans harder into catalog and field-level lineage, giving it overlap with atlan and datahub on discovery. Against soda or great-expectations it is a managed, broader platform rather than a code-first testing framework; against datafold it does impact analysis in CI but not value-level data diffing. Its EU origin, GDPR posture, and self-host option are the clearest differentiators for European and regulated buyers.

On the three-cluster span

Spanning monitoring, catalog, and lineage in one product is genuinely unusual, and the field-level lineage (parsed from warehouse query logs across the four major cloud warehouses) is a real strength. The caveats are at the edges: the catalog is a competent embedded one, not a standalone governance platform — no automatic PII classification, by design — and lineage automation is limited to those four warehouses. Score it as a strong observability suite with a useful catalog attached, not as a catalog-first tool.

How to evaluate it

Connect it read-only to your warehouse and let the dynamic monitors learn before judging signal quality — the proprietary forecasting is meant to reduce alert fatigue on seasonal data, so give it a couple of weeks. Then test the two differentiators directly: trace a field end-to-end through the lineage graph (ingestion → warehouse → dbt → BI), and wire Flow Stopper into an Airflow DAG to confirm it actually halts a pipeline on a failing rule. If compliance is the driver, scope the self-host option into your security review early.

04
Capability spec

All capabilities by cluster.

Quality & testing

Primary · strength 3/3
01 dbt-native
02 ML anomaly detection
03 Assertion-based testing
04 Pre-merge diffing
05 Schema drift detection
06 Freshness monitoring
07 Volume monitoring
08 Custom SQL checks
09 Circuit breaker
10 Data contracts
11 Column profiling
12 Runs in CI
13 Root cause analysis
14 Incident management
Test authoring code first plus gui
Paradigm both
ML training window Proprietary time-series forecasting learns historical patterns to set dynamic thresholds for freshness, volume, and distribution
Monitors at warehouse table · warehouse column · bi dashboard · pipeline task
Alerting slack · teams · email · pagerduty · jira · webhook

Catalog & discovery

Secondary · strength 2/3
01 Business glossary
02 Glossary linked to assets
03 Natural language search
04 Ownership tracking
05 Data contracts
06 Governance workflows
07 Access request workflow
08 PII auto-classification
09 Tag propagation
10 Free self-hosted
Metadata ingestion pull connectors
Search approach keyword
Connectors 32+
Asset types tables · dashboards · pipelines

Lineage & metadata

Secondary · strength 3/3
01 Cross-system lineage
02 Upstream source lineage
03 Impact analysis
04 Reverse impact analysis
05 Historical lineage
06 Lineage API
07 Lineage diff
Granularity both
OpenLineage none
Extraction query log parsing · dbt manifest · api push
05
Warehouses & integrations

Where it plugs in.

Native warehouse support

snowflakebigqueryredshiftdatabricksathenasynapsepostgresmysqlmssql
01dbt — Native
02Airflow — Plugin
03OpenLineage — none
04API access — full
05Terraform provider
06Public SDK — python
06
Pricing

The honest pricing breakdown.

Pricing model per asset
Charged per per asset
Published ○ Contact sales required
Free tier ○ No
OSS self-host ○ Not available

Sales-only tier Entry (up to 500 assets) / Growth (up to 1,000) / Enterprise (1,000+) — all contact-sales; a free-trial CTA exists but with no published terms

07
Notable missing

What it doesn't do.

OpenLineage-Native →

Emits and consumes OpenLineage events as a first-class citizen rather than via a plugin or adapter. Signals commitment to interoperability with other metadata tooling — Marquez, OpenMetadata, Astronomer, and others can consume the same event stream. Increasingly the differentiator between "open" and "proprietary metadata model" observability platforms.

Data Contracts →

Explicit, versioned agreements between data producers and consumers specifying schema, semantics, SLAs, and breaking-change policy. Enforced in CI for producers and at consumption time for consumers. Distinct from schema validation alone — a contract captures intent, not just structure. Implementations vary wildly; many tools claiming "data contracts" offer only schema checks.

PII Auto-Classification →

Automatically identifies columns likely to contain personally identifiable information — email addresses, phone numbers, national IDs — through regex, name heuristics, or ML. Required for meaningful compliance workflows at scale. Quality varies: naive implementations produce heavy false-positive rates. Worth asking vendors about their accuracy benchmarks.

Pre-Merge Diffing →

Compares the output of a model change against production before the pull request is merged — showing row-level and aggregate differences. Shifts data quality left into the development workflow. Datafold is the category-defining tool here; dbt's own cloud offering has added similar capabilities. Requires production-scale compute on a development branch, which has cost implications.

08
Strong at

Drill into one capability.

09
Alternatives & migrations

If not Sifflet, then what?

Common alternatives

Monte Carlo → Genuine breadth across the stack — ingestion, transformation, BI, ML in one surface ↔ Sifflet vs Monte Carlo
Bigeye → Autometrics / Autothresholds — Bigeye's ML-based anomaly detection — has a strong reviewer reputation for low false-positive rates relative to peers in the cluster ↔ Sifflet vs Bigeye
Anomalo → ML anomaly detection has a strong reviewer reputation in the cluster — Anomalo's profiling engine is purpose-built for petabyte-scale tables with minimal manual configuration ↔ Sifflet vs Anomalo
Soda → SodaCL is one of the cleaner data-quality DSLs — readable, version-controllable, and expressive enough for both simple assertions and ML thresholds ↔ Sifflet vs Soda
See all 10 Sifflet alternatives, scored and compared →
10
Common questions

Quick answers.

Is Sifflet open source?
No. Sifflet is a proprietary product.
How much does Sifflet cost?
Sifflet does not publish list pricing — it is sales-led, so you request a quote. There is no free tier.
How is Sifflet deployed?
Sifflet can run as managed SaaS or be self-hosted.
Does Sifflet work with dbt and my warehouse?
It has a native dbt integration. Sifflet supports snowflake, bigquery, redshift, databricks, athena, plus 4 more.

More quality & testing tools

Provenance.

Last verified 2026·05·30 against vendor documentation and, where possible, hands-on trial. Spot something off? Send a correction →