Data Stack Index / v 02.06
Verified 2026·05·30
Send a correction
Catalog & discovery · primary Quality & testing · strong secondary Lineage & metadata · strong secondary SaaS only Proprietary

Collibra.

Collibra
Founded 2008 · Brussels, Belgium
Status · ● active

Enterprise data-and-AI governance incumbent: catalog, glossary, workflow stewardship, lineage, and a separate ML data-quality module.

Pricing starts Contact sales
Deployment SaaS only
License Proprietary
Free tier No
Persona data steward · governance lead
Company size mid market → enterprise
dbt integration Plugin
Warehouses snowflake · databricks · bigquery · redshift +4
OpenLineage consumer
Founded 2008
HQ Brussels, Belgium
Last verified 2026·05·30
01
Verdict

Where it fits — and where it doesn't.

● Ideal for

Large, regulated enterprises — banks, insurers, pharma, public sector — that need a governance-first control plane: a real CDO function, formal stewardship, a business glossary, policy enforcement, and auditable lineage for regulations like BCBS 239, GDPR, SOX, HIPAA, and the EU AI Act. Collibra is strongest where governance process and accountability matter more than developer ergonomics, and where a single vendor for catalog plus governance plus lineage plus data quality plus AI governance is preferred over best-of-breed point tools.

○ Avoid if

You are a startup, scaleup, or lean modern-data-stack team. There is no free tier, no trial, and no OSS path, and base subscriptions reportedly start around USD 170k/year before modules and services — with implementations commonly cited at six months. Avoid too if you want CI/pipeline-gating data quality (circuit breakers, pre-merge diffs, dbt-native tests): Collibra's quality module is governance-attached, not a developer-workflow tool, so pair or replace it with Datafold, Monte Carlo, Soda, or Anomalo for those needs.

02
Strengths & weaknesses

The honest scorecard.

  • [+] The deepest governance and stewardship tooling in the cluster — a configurable workflow engine, business glossary, policies, ownership, and audit trails purpose-built for regulated enterprises
  • [+] Broad single-vendor footprint — catalog, lineage (table and column, OpenLineage-aware), an ML data-quality module (from the OwlDQ acquisition), privacy, and AI governance under one platform
  • [+] Strong automated lineage with root-cause and downstream impact analysis at table, column, and report level, with in-line transformation context
  • [+] A mature, analyst-recognised leader with 100+ catalog integrations and a large regulated-enterprise customer base
  • [+] Active investment in 2025–2026 via acquisitions (Raito for access management, Husprey for notebooks, Deasy Labs for unstructured/AI metadata) and an AI Copilot for natural-language search
  • [−] Opaque, high pricing — nothing published; Lineage and Data Quality licensed separately, with high services/people TCO
  • [−] Heavy and slow to deploy — implementations frequently cited at six months, needing dedicated admins and stewards
  • [−] Business-user adoption is a recurring complaint — a governance-first, technical UI; competitors win deals specifically on UX and time-to-value
  • [−] No OSS or self-host path, and total metadata-graph lock-in; SaaS-centric with weaker on-prem support
  • [−] The data-quality module is a separately-licensed add-on (from the OwlDQ acquisition), not a native testing engine — it is governance-attached
03
Editorial

What Collibra actually is.

What Collibra is

Collibra is an enterprise data-and-AI governance platform built around a data catalog, a business glossary and semantic layer, a configurable stewardship workflow engine, automated lineage, and a separately licensed Data Quality & Observability module. Founded in 2008 by researchers from the Vrije Universiteit Brussel, it is one of the original category-defining governance incumbents, sold sales-led to large regulated enterprises and positioned in 2026 as “unified governance for data and AI.”

Where it fits

Collibra is the heavyweight governance incumbent, most directly cross-shopped with atlan (modern, UX-led, lower TCO), alation, and the OSS catalogs datahub and openmetadata (open, engineer-led, free self-host). It typically wins where formal governance, regulatory auditability, and single-vendor breadth outweigh developer ergonomics and price. Its data-quality module competes with monte-carlo, anomalo, bigeye, and soda, though those remain better for CI/pipeline-gating and dbt-native workflows.

On the data-quality module

Unlike a pure catalog, Collibra ships a genuine data-quality engine — Data Quality & Observability, built on its 2021 OwlDQ acquisition — with adaptive ML rules, anomaly detection, profiling, and remediation workflows authored in standard SQL. We score the quality cluster a 2 rather than a 3 because it is governance-attached: strong on ML detection and stewardship, but without the circuit-breaker, pre-merge diffing, and dbt-native testing that define dedicated observability tools. It is licensed as a separate module, which matters for cost.

How to evaluate it

Evaluation is sales-led and module-scoped. Decide first which modules you are actually buying (Catalog alone, or Catalog plus Lineage plus Data Quality), because that drives both fit and cost. Then test the governance workflow engine against a real stewardship process you run today — approvals, issue management, policy enforcement — since that is Collibra’s deepest differentiator, and budget realistically for the implementation timeline rather than the licence alone.

04
Capability spec

All capabilities by cluster.

Quality & testing

Secondary · strength 2/3
01 dbt-native
02 ML anomaly detection
03 Assertion-based testing
04 Pre-merge diffing
05 Schema drift detection
06 Freshness monitoring
07 Volume monitoring
08 Custom SQL checks
09 Circuit breaker
10 Data contracts
11 Column profiling
12 Runs in CI
13 Root cause analysis
14 Incident management
Test authoring sql
Paradigm both
ML training window Adaptive rules plus ML for outliers, drift, and silent schema changes (OwlDQ heritage); thresholds self-adjust over history
Monitors at warehouse table · warehouse column
Alerting email

Catalog & discovery

Primary · strength 3/3
01 Business glossary
02 Glossary linked to assets
03 Natural language search
04 Ownership tracking
05 Data contracts
06 Governance workflows
07 Access request workflow
08 PII auto-classification
09 Tag propagation
10 Free self-hosted
Metadata ingestion both
Search approach keyword
Connectors 100+
Asset types tables · columns · dashboards · reports · glossary terms

Lineage & metadata

Secondary · strength 3/3
01 Cross-system lineage
02 Upstream source lineage
03 Impact analysis
04 Reverse impact analysis
05 Historical lineage
06 Lineage API
07 Lineage diff
Granularity both
OpenLineage consumer
Extraction sql static analysis · openlineage events · api push
05
Warehouses & integrations

Where it plugs in.

Native warehouse support

snowflakedatabricksbigqueryredshiftsynapsefabricpostgresmssql
01dbt — Plugin
02Airflow — Plugin
03OpenLineage — consumer
04API access — full
05Terraform provider
06Public SDK — python, java
06
Pricing

The honest pricing breakdown.

Pricing model per seat tiered
Charged per per seat
Published ○ Contact sales required
Free tier ○ No
OSS self-host ○ Not available

Sales-only tier Enterprise — sales-led, with Lineage, Data Quality, Privacy and other modules licensed separately. Nothing is published; third parties report a base around USD 170k/year before modules and services.

07
Notable missing

What it doesn't do.

08
Strong at

Drill into one capability.

09
Alternatives & migrations

If not Collibra, then what?

Common alternatives

Atlan → Polished UX and onboarding — consistently scores top in analyst rankings on time-to-value relative to peers ↔ Collibra vs Atlan
Alation → Category-defining catalog with behavioral, usage-ranked search and pioneering natural-language search ↔ Collibra vs Alation
DataHub → Best-in-class column-level SQL lineage parser (SQLGlot-based, benchmarked at 97–99% accuracy on standard corpora) ↔ Collibra vs DataHub
OpenMetadata → Highest connector count in the OSS catalog space (120+) — particularly strong on dashboards, ML, and pipeline systems ↔ Collibra vs OpenMetadata
See all 6 Collibra alternatives, scored and compared →
10
Common questions

Quick answers.

Is Collibra open source?
No. Collibra is a proprietary product.
How much does Collibra cost?
Collibra does not publish list pricing — it is sales-led, so you request a quote. There is no free tier.
How is Collibra deployed?
Collibra is a managed cloud (SaaS) product.
Does Collibra work with dbt and my warehouse?
It integrates with dbt via plugin. Collibra supports snowflake, databricks, bigquery, redshift, synapse, plus 3 more.

More catalog & discovery tools

Provenance.

Last verified 2026·05·30 against vendor documentation and, where possible, hands-on trial. Spot something off? Send a correction →