Prominent Features

Data Quality Assurance

Ability to import your data processing steps as components and establish their dependencies, forming pipelines. Our Apache Airflow plugin can import your DAGs as pipelines with zero configuration.

Associate Health Checks with the components, connect your data source, configure listeners for events and get notified of important data events or halt the data processing when things don't go as planned.

Thumb

Data Integrity

Protect against incorrect data from entering the system by detecting early, and avoid data corruption.

Dragonfly Health Checks allow for assessing various aspects of data and generate a quality confidence rating, DCR. DCR is a numeric value calculated by the weighted resolution of ratings of all dependencies and checks applied.

Thumb

Quality Rules Library

Choose from our standard library of pre-written highly configurable data quality checks, or write your own if you can wield SQL.

If you are not a SQL wizard, fret not, Dragonfly GPT can scan your data warehouse / data lake schema, and convert your simple English demands into Health Checks. (Note: Dragonfly GPT is built on top of state-of-the-art LLMs, but the ability LLMs to generate correct SQL is still very limited.)

Thumb

Seamless Integration

Dragonfly is an API-first platform; it was one of the goals to not add another UI tool to the already long list of tools developers and support staff have to use. Integrate with the APIs, so you can get the DCR quality reports right into the application logs. API integration also allows for data to be tested live, as it is being processed.

Dragonfly supports integration with various Databases, Warehouse and Data Lake platforms: PostgreSQL, MySQL, AWS Redshift, Google BigQuery, Google BigTable, Snowflake, Apache Spark, Apache Hive, Apache HBase, Apache Cassandra, Apache Hudi; and AWS S3 (via S3Select) and AWS Lambda.

Thumb

Data Security

Dragonfly does not modify any data and can work with read-only access to limited set of tables/collections in the warehouse. It can also work without any data access, via code execution-based checks applied through AWS Lambda.

Dragonfly does not copy any data from the warehouse. It only captures the Data Confidence Rating (DCR) for the current state of the data. It can collect metadata, to better explain the DCR, if the metadata collection is enabled.

Thumb

Event Notifications

Dragonfly converts all checks and DCR reports into events. Subscribe to the events of interest to receive an alert over email or SMS or consume the events into a BPA tool, or just connect them to pub-sub system like Apache Kafka or AWS SNS to operate on them downstream.

Thumb