How to implement Near-ZeroETL in a Data Mesh

Using ksqlDB, Apache Kafka, and Real-Time OLAP

Feb 13, 2023

∙ Paid

ZeroETL

In re-Invent 2022, AWS introduced the concept of ZeroETL which in their case meant a native integration between their Aurora (OLTP) and Redshift (OLAP) services.

Let’s first define what ETL and ELT are:

ETL (extract, transform, load) - data is extracted from the source, transformed in the data pipeline, then loaded into a OLAP (analytical) database.
ELT (extract, load, transform) - data is extracted from the source, loaded into the OLAP, then transformed in the OLAP using SQL.

ZeroETL negates the need for separate data pipelines to perform ETL and provides built-in replication of transactional data for near real-time analytics. ZeroETL is similar to ELT in that they both transform data in the OLAP database. But doing so can force batching semantics. This is unlike ETL where the transformation can remain in real-time. ETL basically provides a way to pre-processes the data before it reaches the OLAP so that it is relieved from performing transformations.

So how can we create a zeroETL solution that supports real-time use cases?

Keep reading with a 7-day free trial

Subscribe to SUP! Hubert’s Substack to keep reading this post and get 7 days of free access to the full post archives.