A forensic walkthrough of Paul Dockery's Bangladesh model — every data source, every script, every output, and how it stitches together. Built for Karim, May 2026.
PyPSA-Earth is a global, open-source energy-system optimisation framework built on top of PyPSA (Python for Power System Analysis). It reduces the question “how should this country decarbonise?” to a single Linear Program:
It's capacity expansion + dispatch at the same time. The optimiser doesn't just pick what to build — it co-optimises where on the grid, how often each plant runs, and how electricity / heat / gas / hydrogen flow between zones.
The whole workflow is a Snakemake pipeline (a Python DSL like a Makefile). Each rule = one Python script with declared inputs/outputs. Snakemake figures out execution order from the DAG.
An in-memory graph of buses, lines, links, generators, loads, storage_units, stores. Saved as NetCDF (.nc). Every script either builds or augments this object.
An ERA5 NetCDF block of every weather variable (wind, GHI, temperature, runoff) for the country bounding box × the simulation year. Renewables potentials are derived from this.
Linopy formulates the LP. The solver (Gurobi here; HiGHS as fallback) finds the optimum. Solution is written back into p_nom_opt, dispatch timeseries, etc.
Vanilla PyPSA-Earth was electricity-only. The sector-coupled variant adds, as additional carriers and links on the same graph:
So a single LP simultaneously decides whether to electrify a steel plant, whether to build a heat pump or keep a gas boiler, and whether to import LNG or build an electrolyser.
The Snakemake DAG runs in three stages. Each box below is a script in pypsa-earth/scripts/. Arrows show what feeds what.
networks/base.nc — empty PyPSA graph with only buses + lineselec.nc — base + powerplants + RE profiles + loads + costselec_s.nc — collapse parallel lines, dissolve transformers, keep just ACelec_s_{N}.nc — k-means or GADM-based clustering down to N buses (8 here, 10 in pre-built BD)elec_s_{N}_ec.nc — battery + H₂ stores/links per buselec_s_{N}_ec_l{ll}_{opts}.nc — applies CO₂ cap, line-volume cap, time aggregation (e.g. Co2L-3h)results/networks/... — electricity-only solved LP. First decision point: does the elec grid even balance under the cap?urban_percent.csv)prenetworks/... — adds heat / transport / industry / H₂ buses+links to the clustered electricity networkh2export: [0])postnetworks/...nc — solves the LP for that horizon's prenetwork; output becomes the brownfield input for the next horizon| Wildcard | Meaning | This run |
|---|---|---|
{simpl} | Simplification level (HAC merging) | "" (none) |
{clusters} | Number of buses after clustering | 8 (GADM-1) |
{ll} | Line-volume cap: copt=optimum, v1.0=fixed at base | copt |
{opts} | Constraint string (e.g. Co2L-24H) | Co2L-24H |
{sopts} | Sector-coupled time resolution | 24h |
{planning_horizons} | Which year is being solved | 2030, 2040, 2050 |
{discountrate} | Real discount rate | 0.071 |
{demand} | SSP demand scenario | AB (SSP2 moderate) |
{h2export} | H₂ export demand (TWh) | 0 |
So the canonical filename of a final solved BD network is:
results/Case2_NetZero2050/postnetworks/elec_s_8_ec_lcopt_Co2L-24H_24h_2050_0.071_AB_0export.nc
Every external dataset PyPSA-Earth touches, what it's used for, and where it lives in our copy.
| Source | What & why | File(s) |
|---|---|---|
| GADM 4.1 | Country + division polygons. The clustering uses GADM-1 (8 BD divisions) instead of k-means. | data/3_bangladesh_data/gadm/gadm41_BGD/gadm41_BGD.gpkg |
| GEBCO 2025 | Global bathymetry — used to mask offshore wind (max_depth: 50 m). | data/2_global_geodata/gebco/GEBCO_2025_sub_ice.nc |
| HydroBASINS lvl 6 | Watershed polygons — feed Atlite's hydro inflow model (flowspeed 1 m/s). | data/2_global_geodata/hydrobasins/hybas_*_lev06_v1c.shp |
| Copernicus PROBAV LC100 | Global land-cover raster (100 m). Excludes urban (50), water (80) etc. for RE potentials. | data/2_global_geodata/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_*.tif |
| Marine Regions EEZ v11 | Exclusive Economic Zones — bounds offshore wind buildable area. | data/2_global_geodata/eez/eez_v11.gpkg |
| Natura 2000 | Protected-area mask. Included tif (natura: true) excludes RE in those cells. | data/2_global_geodata/natura/natura.tiff |
| WorldPop 2020 1 km (UN-adj, constrained) | Population raster — underpins the urban/rural split, demand allocation, GDP layouts. | data/3_bangladesh_data/WorldPop/bgd_ppp_2020_UNadj_constrained.tif |
| OSM (BD planet extract) | HV grid skeleton (≥51 kV substations, lines, cables) and OSM-listed power plants. | data/3_bangladesh_data/osm_pbf/bangladesh-latest.osm.pbf |
| OSM Power JSON | Pre-extracted BD power infrastructure (substations, lines, generators). | data/3_bangladesh_data/osm_power/BD_power.json |
| Microsoft Global ML Buildings | Building footprints — used by solar_rooftop (currently disabled — IncompleteRead errors). | data/3_bangladesh_data/global_buildings/ |
| ERA5 (Copernicus CDS) | Hourly weather cutout (wind, GHI, T, runoff) — the entire RE potentials story. | missing — must rebuild via CDS API or get from Paul |
| SSP scenarios | Per-cell load projections under SSP1–5 (we use SSP2-2.6). | data/2_global_geodata/ssp2-2.6/{2030,2040,2050,2100}/ |
| Source | What & why | File(s) |
|---|---|---|
| UN Statistics Division Energy DB | National-level energy balances by carrier and use; basis for sectoral demand split. | data/4_reference_csvs/demand/unsd/data/UNdata_Export_*.txt (double-extracted at two timestamps in the zip — use the newer one) |
| IEA Bangladesh portal | National generation by source, TES, low-carbon share. | data/IEA Energy Supply by Source/*.csv |
| IRENA capacity stats | Reference renewable installed capacity by country (used by estimate_renewable_capacities, year 2023). | fetched at runtime via powerplantmatching |
| technology-data v0.13.2 (DEA / Fraunhofer / NREL) | CAPEX, FOM, VOM, lifetime, efficiency, fuel cost for every tech, per year. | resources/costs_2030.csv, costs_2040.csv, ... generated from raw.githubusercontent.com/PyPSA/technology-data/v0.13.2 — but disabled here (retrieve_cost_data: false) in favour of BD-specific costs. |
| BPDB / IEPMP 2023 | Bangladesh-specific cost data feeding the LP — overrides the DEA defaults. Mentioned in config comments. | data/costs_*_bd.csv (referenced; ship as resources/costs_{year}.csv) |
| IEA Hydro annual generation | Used to rescale Atlite's hydro inflow to match observed annual energy. | data/4_reference_csvs/eia_hydro_annual_generation.csv |
| Custom industrial database | Coordinates & capacity of cement, steel, paper, aluminium plants — input to industrial distribution key. | data/4_reference_csvs/industrial_database.csv (partial BD coverage) |
| SFI cement / pulp&paper databases | Source data for the industrial database. | data/4_reference_csvs/industry/SFI-Global-Cement-Database-July-2021.xlsx + SFI_ALD_Pulp_Paper_Sample_LatAm_Jan_2023.xlsx |
| Aluminium production | Annual primary Al demand (year 2019 here). | data/4_reference_csvs/AL_production.csv |
| BDEW heat-load profile (Germany) | Standard hourly heat-demand shape, scaled to local annual totals. ⚠ German shape — see caveats. | data/4_reference_csvs/heat_load_profile_BDEW.csv |
| Hydro capacities | EIA dataset — country totals for hydro / PHS power and energy. | data/4_reference_csvs/hydro_capacities.csv |
| European car ownership (e-mobility) | Per-capita car-stock used to scale transport demand. | data/4_reference_csvs/emobility/European_countries_car_ownership.csv |
| Custom powerplants (Nahid & Roy 2025) | BPDB-validated list of operating BD powerplants with retrofit/retire dates — replaces the auto-fetched OSM2PM list (custom_powerplants: replace). | data/custom_powerplants.csv |
| SSP CAGR tables | Per-sector growth + efficiency-gain compound-annual rates used to project demand to 2030/40/50. | data/4_reference_csvs/demand/{efficiency_gains_cagr.csv, growth_factors_cagr.csv, industry_growth_cagr.csv, fuel_shares.csv} ⚠ no BD row — falls back to “default” which is Morocco's curve. |
| File | Purpose |
|---|---|
configs/config.yaml | The run config — overrides config.default.yaml. This is the only file that matters day-to-day. |
configs/config.default.yaml | Upstream defaults (every key documented here). |
configs/bundle_config.yaml | Lists data bundles + Zenodo/Drive URLs for retrieve_databundle. Disabled here. |
configs/regions_definition_config.yaml | OSM column types + ISO-2/world region map. |
configs/powerplantmatching_config.yaml | Overrides for the powerplantmatching package (custom replace policy). |
pdockery/pypsa-earth sits two commits ahead of upstream pypsa-meets-earth/pypsa-earth main:
fe4ee35 Add solved Bangladesh energy system network results 2f77da87 Update config.default.yaml with Bangladesh configuration
The BD-specific decisions in our team's configs/config.yaml on top:
["BD"]
snapshots2013-01-01 → 2014-01-01 (left-inclusive)
weather year2013 (ERA5)
foresightmyopic (sequential per horizon)
planning_horizons[2030, 2040, 2050]
opts / soptsCo2L-24H + 24h → 365 daily snapshots/horizon
clusters8 = GADM-1 divisions (alternative_clustering: true)
co2_budget:
enable: true
override_co2opt: true
co2base_value: co2limit
year:
2030: 0.9 # 10% cut → 87.3 MtCO2
2040: 0.45 # 55% cut → 43.7 MtCO2
2050: 0.05 # 95% cut → 4.85 MtCO2
The fractions multiply electricity.co2base = 9.7 × 10⁷ tCO₂/yr (full-economy BD baseline). With override_co2opt: true, the rule rewrites the Co2L wildcard from opts using these fractions per horizon.
| On | Off | Why off |
|---|---|---|
heat, biomass, industry, land_transport, residential, services, agriculture |
shipping, aviation, rail_transport |
<0.3 % of national emissions; cause NaN issues for non-port nodes |
| On | Off | Why off |
|---|---|---|
tes, chp, solar_thermal, boilers, biomass_transport, dac, cc | v2g, bev_dsm, solar_rooftop, electricity_distribution_grid, home_battery, methanation, fischer_tropsch, SMR, SMR CC, helmeth, oil_boilers, micro_chp | Phase 1 — minimise spatial-link explosion; solar_rooftop has flaky download |
gas.spatial_gas: true, oil.spatial_oil: true, coal.spatial_coal: true | gas.network, hydrogen.network, co2_network, lignite.spatial_lignite | BD has no GGIT pipeline data; spatial H₂/CO₂ networks add huge link counts |
sector.industry_decarbonisation: coal_to_gas: true # gas substitutes industrial coal coal_to_elec: true # electric process heat replaces coal oil_to_gas: true oil_to_elec: true
When enabled, industrial coal/oil emissions become optimisable links rather than fixed loads — the LP gets to fuel-switch on the cost curve. This block is not in the public pdockery/pypsa-earth tree — it lives on an unpushed branch.
replace (use the BPDB-validated list)
powerplants_filter(DateOut ≥ 2022 or DateOut != DateOut) — drops plants already retired
allow_early_retirementtrue — optimiser can retire a unit before end-of-life if the LP justifies it
keep_existing_capacitiestrue — 2030 starts from the actual ~38 GW fleet, not a blank slate
estimate_renewable_capacities.statsirena, year 2023, p_nom_min: 1 (lock current installed RE in)
BD cost data spans a 6 × 10⁹ range (very small marginal costs vs. very large investments). Default Gurobi diverges. The config switches to a custom gurobi-numeric-focus profile:
NumericFocus: 3 # stability over speed ScaleFlag: 2 # aggressive matrix scaling BarHomogeneous: 1 # fall back to homogeneous barrier DualReductions: 0 # don't claim "infeasible_or_unbounded" on H2 export bus BarConvTol: 1e-4 # relaxed tolerances FeasibilityTol: 1e-4 OptimalityTol: 1e-4
What the optimiser is actually computing, in plain language.
| Variable family | Per | Bounds |
|---|---|---|
Generator.p_nom (capacity) | tech × bus | [p_nom_min, ∞] (or fixed if extendable: false) |
Generator.p (dispatch) | tech × bus × snapshot | [0, p_max_pu × p_nom] |
StorageUnit.p_dispatch / p_store / state_of_charge | bus × snapshot | storage capacity, max_hours |
Store.e (energy state) | bus × snapshot | 0 ≤ e ≤ e_nom |
Line.s_nom | line | fixed at base × line-volume cap (here: copt = optimum subject to global cap) |
Link.p_nom + Link.p | link × snapshot | e.g. heat pumps, electrolysers, fuel cells |
p ≤ p_max_pu(t) × p_nom (RE) or p ≤ availability × p_nom (thermal)SoC(t) = η · SoC(t-1) + p_charge − p_dischargeco2limit × fraction(year)length × s_nom ≤ line_volume_capmin Σ_techs ( annuity(CAPEX) × p_nom + FOM × p_nom ) + Σ_techs,t ( marginal_cost(t) × p(t) ) + Σ_lines ( line CAPEX × s_nom ) + Σ_co2 ( emission_price × emissions )
“Annuity” converts CAPEX into a per-year cost using the discount rate (0.071) and tech lifetime. So we're minimising annual system cost in 2030 / 2040 / 2050, separately, with each horizon's solution feeding the next as fixed brownfield.
2030 horizon: - existing fleet (~38 GW, year ≥2022) → fixed - extendable: solar, onwind, OCGT, CCGT, batteries, H2… - CO2 cap = 0.9 × 9.7e7 = 87.3 Mt - SOLVE → p_nom_opt[2030] 2040 horizon: - p_nom_opt[2030] is now BROWNFIELD (fixed) ← add_brownfield - retire what aged out - new build optimised on top - CO2 cap = 0.45 × 9.7e7 = 43.7 Mt - SOLVE → p_nom_opt[2040] 2050 horizon: - p_nom_opt[2030] + p_nom_opt[2040] BROWNFIELD - CO2 cap = 0.05 × 9.7e7 = 4.85 Mt - SOLVE → p_nom_opt[2050]
Myopic ≠ optimal-overall. The 2030 LP doesn't know 2050's cap is 95 % cut — it only sees its own 10 % cap. So you can get short-sighted gas build-out in 2030 that becomes stranded in 2050. That's by design; it mimics how real planners decide. Perfect-foresight (Case3, future) would avoid it.
| File | What's in it |
|---|---|
postnetworks/elec_s_8_ec_lcopt_Co2L-24H_24h_2030_0.071_AB_0export.nc | Solved network for 2030 — capacities, dispatch, emissions, prices |
postnetworks/...2040...nc | 2040 (with 2030 as brownfield) |
postnetworks/...2050...nc | 2050 (with 2030+2040 as brownfield) |
summaries/elec_s_8...csv | Costs / capacities / energy / emissions tabulated across horizons |
maps/...pdf | Geographic plots — capacity by node, line flows, costs |
graphs/{costs,energy,balances-energy}.pdf | Stacked-bar timelines |
import pypsa
n = pypsa.Network("results/Case2_NetZero2050/postnetworks/...2050_0.071_AB_0export.nc")
# Capacities
n.generators.groupby("carrier").p_nom_opt.sum() / 1e3 # GW per tech, year 2050
n.links.groupby("carrier").p_nom_opt.sum() / 1e3 # heat pumps, electrolysers…
n.stores.groupby("carrier").e_nom_opt.sum() / 1e6 # TWh of H2 / battery storage
# Dispatch
n.generators_t.p.sum() * n.snapshot_weightings.objective.iloc[0] / 1e6 # TWh/year
n.statistics.energy_balance().round() # full sectoral balance
# Costs
n.statistics.capex().round() # €/yr CAPEX by tech
n.statistics.opex().round() # €/yr OPEX by tech
n.statistics.installed_capex().round() # one-off €
# Emissions
n.global_constraints # CO2 cap + shadow price
n.generators_t.p.mul(n.generators.carrier.map(n.carriers.co2_emissions))
With alternative_clustering: true + gadm_layer_id: 1, the 8 buses are the GADM-1 divisions:
| NAME_1 | Population (M) | Modelled demand share |
|---|---|---|
| Dhaka | ~45 | 26 % |
| Chittagong | ~32 | 19 % |
| Rajshahi | ~20 | 14 % |
| Rangpur | ~17 | 10 % |
| Khulna | ~16 | 17 % |
| Mymensingh | ~12 | (0 % — clustering artefact, see caveats) |
| Sylhet | ~12 | 5 % |
| Barisal | ~8 | 10 % |
Run on pypsa-bd/networks/elec_s_10*.nc (Paul's pre-built BD network, 10 buses, 2013 hourly).
Source scripts in karim_outputs/scripts/.
DateOut (≤2025). Replacement window opens immediately.efficiency_gains_cagr.csv, growth_factors_cagr.csv, industry_growth_cagr.csv, fuel_shares.csv. The loader silently falls back to the row labelled default, which is parameterised on Morocco. This means BD's residential heat demand grows on a Moroccan curve unless we add a BD-row override.
heat_load_profile_BDEW.csv is a German residential heat-demand shape (BDEW H0). BD has effectively no space-heat — “residential heat” here is mostly cooking/water. The BDEW shape will misrepresent both magnitude and timing. Worth a sensitivity.
data/4_reference_csvs/costs.csv is 2010–2013 vintage. The config disables retrieve_cost_data and points at BD-specific costs — but those need verification (IEPMP 2023 / BPDB sources). Renewable CAPEX in particular has dropped >60 % since 2013.
industrial_database.csv has partial BD coverage. The pulp/paper and cement files in data/4_reference_csvs/industry/ are LatAm samples, not BD. Industrial heat demand is therefore dispatched against an under-specified node-level industrial layout.
Data/1_cutouts (see ___All_Errors.txt). Without it, build_renewable_profiles, build_temperature_profiles, build_solar_thermal_profiles, and build_heat_demand all fail. Two fixes: (a) get a Copernicus CDS API key and run build_cutout (~30 min for BD bbox × 2013); (b) request the file from Paul.
industry_decarbonisation, co2_budget.override_co2opt, the Case2_NetZero2050 resource folder, and allow_early_retirement all appear in our team's config.yaml but none are in the git history of pdockery/pypsa-earth on GitHub. They live on an unpushed working branch. Action: ask Paul for the branch + push it.
elec_s_10 network has 10 PyPSA buses, but they don't centroidally fall inside all 8 GADM-1 polygons. Mymensingh has no PyPSA bus → zero demand allocated. If that bus mapping survives into the GADM-1 (8-cluster) run, equity analysis will mis-attribute. Verify after the GADM-1 run is solved.
download_global_buildings keeps failing with IncompleteRead timeouts → solar_rooftop: false. So distributed PV potential is excluded from the LP. Given Bangladesh's million-solar-home-systems history, this is a meaningful omission for both equity and security narratives.
Generated for the FalgunXBangladesh team, May 2026.
Source repo: /Users/karimarnous/FalgunXBangladesh.
PyPSA-Earth fork: github.com/pdockery/pypsa-earth.
PyPSA-BD analysis bundle: github.com/pdockery/pypsa-bd.
Karim outputs: /Users/karimarnous/FalgunXBangladesh/karim_outputs/.