⚡ Key Features

📦 Apache Iceberg

ACID transactions, time travel, schema evolution on S3 with AWS Glue catalog

🗺️ Geosquare Grid

Advanced spatial indexing (level 12 = ~60m² grid) replacing geohash

📂 Dual Aggregation

Direct grid aggregation + catchment area (buffer/isochrone) methods

🏷️ Category Mapping

2000+ raw Overture categories → standardized 3-level hierarchy

🔄 Monthly Updates

Incremental ETL from Overture releases (2026-04-15.0)

📊 Dashboard

Streamlit + PyDeck for interactive visualization

🔄 Data Pipeline Architecture

📡
Overture Maps (S3)
GeoParquet format, global coverage
🔄
ETL Pipeline (9 Steps)
Python + DuckDB + PyArrow
🗄️
AWS S3 + Glue Catalog
Iceberg tables (poi_raw, poi_master, poi_clean)
📊
Streamlit Dashboard
Interactive POI visualization

⚙️ Configuration (AWS)

# AWS Glue + S3 + Iceberg Configuration release: "2026-04-15.0" catalog: type: "glue" warehouse: "s3://geosquare-warehouse" db_name: "geosquare_poi" aws: region: "ap-southeast-3" projects: - name: "geosquare" gid_level: 12 aggregation: precisions: [12] advanced_aggregation: mode: "buffer" buffer: radius_meters: 500 catchment: provider: "valhalla"

📈 Processing Distribution

🔬 Two Aggregation Methods

Method 1: Direct Grid Aggregation

Simple count per Geosquare grid cell at specified precision level.

  • Level 12: ~60m² cells
  • Fast computation: O(n) complexity
  • Use case: Overall POI density
  • Table: poi_stats_geosquare12

Method 2: Catchment/Isochrone

Calculate service area per POI (buffer or isochrone), then intersect with grid.

  • Buffer: Fixed radius (e.g., 500m)
  • Isochrone: Travel time (e.g., 5 min)
  • Use case: Accessibility analysis
  • Table: poi_adv_stats_geosquare12

📋 Sample ETL Execution

# Run full pipeline (9 steps) $ python run_etl.py # Run specific steps $ python run_etl.py --steps 1,2,3,4 # Run aggregation only $ python run_etl.py --steps 7 # Run advanced aggregation (isochrone) $ python run_etl.py --steps 8 # Override config via CLI $ python run_etl.py --city jakarta --limit 1000 # With custom config $ python run_etl.py --config config.aws.yaml