๐ Data Pipeline Flow (ETL)
๐ฅ RAW APIs
โ
๐ Validate
โ
๐งน Cleanse
โ
๐ Transform
โ
๐ Aggregate
โ
๐พ Warehouse
๐ Processing Distribution by Region
๐ Data Sources Configuration
PYTHON from pyspark import SparkSession
# Apache Spark-style pipeline
# Data Sources Configuration
DATA_SOURCES = {
"indonesia_population": {
"url": "bps.go.id/api/v1",
"format": "JSON",
"records": 2_400_000
},
"seasia_weather": {
"url": "open-meteo.com/v1",
"format": "GeoJSON",
"records": 1_200_000
},
"osm_indonesia": {
"url": "overpass-api.de",
"format": "XML",
"records": 4_500_000
}
}
# Spark Pipeline Execution
spark = SparkSession.builder \
.appName("IndonesiaBigData") \
.config("spark.sql.shuffle.partitions", 200) \
.getOrCreate()
๐ Live Data Points - Indonesia
๐ Recent Processing Jobs
| Job ID | Source | Region | Records | Duration | Status |
|---|---|---|---|---|---|
| Loading... | |||||