⚡ Key Features
📦 Apache Iceberg
ACID transactions, time travel, schema evolution on S3 with AWS Glue catalog
🗺️ Geosquare Grid
Advanced spatial indexing (level 12 = ~60m² grid) replacing geohash
📂 Dual Aggregation
Direct grid aggregation + catchment area (buffer/isochrone) methods
🏷️ Category Mapping
2000+ raw Overture categories → standardized 3-level hierarchy
🔄 Monthly Updates
Incremental ETL from Overture releases (2026-04-15.0)
📊 Dashboard
Streamlit + PyDeck for interactive visualization
🔄 Data Pipeline Architecture
Overture Maps (S3)
GeoParquet format, global coverage
ETL Pipeline (9 Steps)
Python + DuckDB + PyArrow
AWS S3 + Glue Catalog
Iceberg tables (poi_raw, poi_master, poi_clean)
Streamlit Dashboard
Interactive POI visualization
⚙️ Configuration (AWS)
# AWS Glue + S3 + Iceberg Configuration
release: "2026-04-15.0"
catalog:
type: "glue"
warehouse: "s3://geosquare-warehouse"
db_name: "geosquare_poi"
aws:
region: "ap-southeast-3"
projects:
- name: "geosquare"
gid_level: 12
aggregation:
precisions: [12]
advanced_aggregation:
mode: "buffer"
buffer:
radius_meters: 500
catchment:
provider: "valhalla"
📈 Processing Distribution
🔬 Two Aggregation Methods
Method 1: Direct Grid Aggregation
Simple count per Geosquare grid cell at specified precision level.
- Level 12: ~60m² cells
- Fast computation: O(n) complexity
- Use case: Overall POI density
- Table: poi_stats_geosquare12
Method 2: Catchment/Isochrone
Calculate service area per POI (buffer or isochrone), then intersect with grid.
- Buffer: Fixed radius (e.g., 500m)
- Isochrone: Travel time (e.g., 5 min)
- Use case: Accessibility analysis
- Table: poi_adv_stats_geosquare12
📋 Sample ETL Execution
# Run full pipeline (9 steps)
$ python run_etl.py
# Run specific steps
$ python run_etl.py --steps 1,2,3,4
# Run aggregation only
$ python run_etl.py --steps 7
# Run advanced aggregation (isochrone)
$ python run_etl.py --steps 8
# Override config via CLI
$ python run_etl.py --city jakarta --limit 1000
# With custom config
$ python run_etl.py --config config.aws.yaml