[Avg. reading time: 10 minutes]
Medallion Architecture
This is also called as Multi-Hop architecture.

Bronze Layer (Raw Data)
- Append-only ingestion
- No business logic
- Schema minimally enforced
- Supports replay / backfill
Silver Layer (Cleansed and Conformed Data)
- Deduplication
- Joins / normalization
- Schema enforcement
- Basic data quality checks
Gold Layer (Curated Business-level tables)
- Business logic
- Aggregations
- KPI tables
- Semantic-ready datasets

Polars (DataFrame Library)
Polars is a high-performance DataFrame library designed for Rust and Python, aiming to provide fast data manipulation capabilities similar to those found in libraries like Pandas for Python.
-
Performance: Polars is built for speed, leveraging Rust’s performance capabilities.
-
Lazy Execution: Polars supports lazy execution, allowing you to build complex query plans that are only executed when needed. This can optimize performance by minimizing unnecessary computations.
-
Expressive API: Polars offers an expressive and flexible API for data manipulation, including support for operations like filtering, aggregation, joining, and more.
-
Interoperability: While Polars is native to Rust, it also has a Python API, making it accessible to a broader range of developers.
DataFusion (Query Engine)
-
Rust query engine on top of Arrow.
-
Gives you SQL, DataFrame APIs, logical plans, physical plans, and an optimizer.
-
Best choice when you want to build an actual engine-like pipeline layer, not just do DataFrame scripting
| Category | Polars | DataFusion |
|---|---|---|
| Type | DataFrame library | Query engine |
| Primary API | DataFrame / LazyFrame | SQL + DataFrame |
| Execution model | Eager + Lazy | Always lazy (planned execution) |
| Focus | Transformations | Query planning + execution |
| Optimizer | Built-in (lazy optimizer) | Explicit rule-based optimizer |
| Control | Limited control over planning | Full control over plans |
| SQL support | Limited | Strong |
| Extensibility | Medium | Very high (custom rules, UDFs) |
| Use case | Data wrangling | Engine building / query layer |
| Parallelism | Automatic | Engine-controlled |
| Distributed | No | Via Ballista |
Demo
git clone https://github.com/gchandra10/rust-polars-csv-dataframe-demo