[Avg. reading time: 10 minutes]

Medallion Architecture

This is also called as Multi-Hop architecture.

Bronze Layer (Raw Data)

  • Append-only ingestion
  • No business logic
  • Schema minimally enforced
  • Supports replay / backfill

Silver Layer (Cleansed and Conformed Data)

  • Deduplication
  • Joins / normalization
  • Schema enforcement
  • Basic data quality checks

Gold Layer (Curated Business-level tables)

  • Business logic
  • Aggregations
  • KPI tables
  • Semantic-ready datasets

Polars (DataFrame Library)

Polars is a high-performance DataFrame library designed for Rust and Python, aiming to provide fast data manipulation capabilities similar to those found in libraries like Pandas for Python.

  • Performance: Polars is built for speed, leveraging Rust’s performance capabilities.

  • Lazy Execution: Polars supports lazy execution, allowing you to build complex query plans that are only executed when needed. This can optimize performance by minimizing unnecessary computations.

  • Expressive API: Polars offers an expressive and flexible API for data manipulation, including support for operations like filtering, aggregation, joining, and more.

  • Interoperability: While Polars is native to Rust, it also has a Python API, making it accessible to a broader range of developers.

DataFusion (Query Engine)

  • Rust query engine on top of Arrow.

  • Gives you SQL, DataFrame APIs, logical plans, physical plans, and an optimizer.

  • Best choice when you want to build an actual engine-like pipeline layer, not just do DataFrame scripting

CategoryPolarsDataFusion
TypeDataFrame libraryQuery engine
Primary APIDataFrame / LazyFrameSQL + DataFrame
Execution modelEager + LazyAlways lazy (planned execution)
FocusTransformationsQuery planning + execution
OptimizerBuilt-in (lazy optimizer)Explicit rule-based optimizer
ControlLimited control over planningFull control over plans
SQL supportLimitedStrong
ExtensibilityMediumVery high (custom rules, UDFs)
Use caseData wranglingEngine building / query layer
ParallelismAutomaticEngine-controlled
DistributedNoVia Ballista

Demo

git clone https://github.com/gchandra10/rust-polars-csv-dataframe-demo

#medallion #bronze #silver #goldVer 2.1.1

Last change: 2026-04-08