PyPSA-Earth, end-to-end

A forensic walkthrough of Paul Dockery's Bangladesh model — every data source, every script, every output, and how it stitches together. Built for Karim, May 2026.

Repo: pdockery/pypsa-earth Repo: pdockery/pypsa-bd Sector-coupled v0.8.0 lineage Custom branch unmerged

Contents
  1. What PyPSA-Earth actually is
  2. The pipeline, rule by rule
  3. Data sources — the complete inventory
  4. Bangladesh customisations (Paul's diff)
  5. Inside the Case2_NetZero2050 run
  6. Outputs — what comes out, how to read it
  7. Karim outputs — first results off the network
  8. Data caveats & open issues

1 · What PyPSA-Earth actually is

PyPSA-Earth is a global, open-source energy-system optimisation framework built on top of PyPSA (Python for Power System Analysis). It reduces the question “how should this country decarbonise?” to a single Linear Program:

Core question: what investment + dispatch decisions across generation, storage, transmission, sectors, and fuel switching minimise total system cost over a planning horizon, subject to (i) every demand timeseries being met every hour, (ii) physical laws on the network, and (iii) a CO₂ cap?

It's capacity expansion + dispatch at the same time. The optimiser doesn't just pick what to build — it co-optimises where on the grid, how often each plant runs, and how electricity / heat / gas / hydrogen flow between zones.

The four building blocks

① Snakemake DAG

The whole workflow is a Snakemake pipeline (a Python DSL like a Makefile). Each rule = one Python script with declared inputs/outputs. Snakemake figures out execution order from the DAG.

② PyPSA network object

An in-memory graph of buses, lines, links, generators, loads, storage_units, stores. Saved as NetCDF (.nc). Every script either builds or augments this object.

③ Atlite weather cutout

An ERA5 NetCDF block of every weather variable (wind, GHI, temperature, runoff) for the country bounding box × the simulation year. Renewables potentials are derived from this.

④ Linopy + a solver

Linopy formulates the LP. The solver (Gurobi here; HiGHS as fallback) finds the optimum. Solution is written back into p_nom_opt, dispatch timeseries, etc.

What “sector-coupled” means here

Vanilla PyPSA-Earth was electricity-only. The sector-coupled variant adds, as additional carriers and links on the same graph:

So a single LP simultaneously decides whether to electrify a steel plant, whether to build a heat pump or keep a gas boiler, and whether to import LNG or build an electrolyser.

2 · The pipeline, rule by rule

The Snakemake DAG runs in three stages. Each box below is a script in pypsa-earth/scripts/. Arrows show what feeds what.

Stage A — Build the electricity network
download_osm_data raw OSM cables/lines/substations/generators for BD (scripts/download_osm_data.py)
clean_osm_data filter ≥51 kV, drop ghosts, snap voltages
build_osm_network stitch lines+buses into a topological network (CSV)
build_shapes country/offshore/GADM-1/EEZ shapes from GADM41 + ENV‑NaturalEarth
base_network networks/base.nc — empty PyPSA graph with only buses + lines
build_bus_regions Voronoi-style onshore/offshore region polygons per bus
build_cutout ERA5 NetCDF for BD bbox × 2013 (Copernicus CDS API). missing in zip
build_natura_raster + build_renewable_profiles per-tech hourly capacity factors per bus, applying land-use exclusions (Natura, Copernicus PROBAV)
build_powerplants existing thermal fleet via powerplantmatching + OSM2PowerPlantMatching, optionally replaced by BD-specific custom CSV
build_demand_profiles hourly load per bus from SSP scenarios + GDP/pop weighting + IRENA stats
add_electricity elec.nc — base + powerplants + RE profiles + loads + costs
simplify_network elec_s.nc — collapse parallel lines, dissolve transformers, keep just AC
cluster_network elec_s_{N}.nc — k-means or GADM-based clustering down to N buses (8 here, 10 in pre-built BD)
add_extra_components elec_s_{N}_ec.nc — battery + H₂ stores/links per bus
prepare_network elec_s_{N}_ec_l{ll}_{opts}.nc — applies CO₂ cap, line-volume cap, time aggregation (e.g. Co2L-3h)
solve_network results/networks/... — electricity-only solved LP. First decision point: does the elec grid even balance under the cap?
Stage B — Add sectors
build_population_layouts + build_clustered_population_layouts WorldPop raster aggregated to bus regions (urban/rural split via urban_percent.csv)
build_temperature_profiles + build_cop_profiles hourly soil/air temps per node + heat-pump COP timeseries
build_heat_demand hourly residential / services / industrial heat demand per node, scaled by HDD and floor area
build_solar_thermal_profiles hourly solar thermal output per node
build_base_energy_totals + prepare_energy_totals sectoral energy balances scaled by SSP CAGR (efficiency_gains_cagr.csv, growth_factors_cagr.csv)
build_industry_demand + build_industrial_distribution_key per-bus industrial process demand split by subsector (cement, steel, chemicals…)
prepare_transport_data nodal transport energy demand, EV availability, share splits by mode
prepare_ports + prepare_airports + build_ship_profile shipping/aviation demand (off in BD)
prepare_sector_network prenetworks/... — adds heat / transport / industry / H₂ buses+links to the clustered electricity network
add_export append H₂ export load (off in BD: h2export: [0])
override_respot override built-in RE potentials with custom CSVs (off here)
Stage C — Solve, then walk forward (myopic)
add_existing_baseyear for the first horizon, inject the existing thermal fleet + retirement schedule into the prenetwork
add_brownfield for subsequent horizons, copy the previous-horizon solved capacities forward as fixed brownfield
solve_network_myopic postnetworks/...nc — solves the LP for that horizon's prenetwork; output becomes the brownfield input for the next horizon
make_sector_summary + plot_sector_summary CSV+PDF graphs of nodal costs, capacities, energy balances

The wildcard system (how Snakemake parameterises a run)

WildcardMeaningThis run
{simpl}Simplification level (HAC merging)"" (none)
{clusters}Number of buses after clustering8 (GADM-1)
{ll}Line-volume cap: copt=optimum, v1.0=fixed at basecopt
{opts}Constraint string (e.g. Co2L-24H)Co2L-24H
{sopts}Sector-coupled time resolution24h
{planning_horizons}Which year is being solved2030, 2040, 2050
{discountrate}Real discount rate0.071
{demand}SSP demand scenarioAB (SSP2 moderate)
{h2export}H₂ export demand (TWh)0

So the canonical filename of a final solved BD network is:

results/Case2_NetZero2050/postnetworks/elec_s_8_ec_lcopt_Co2L-24H_24h_2050_0.071_AB_0export.nc

3 · Data sources — the complete inventory

Every external dataset PyPSA-Earth touches, what it's used for, and where it lives in our copy.

Geospatial & weather

SourceWhat & whyFile(s)
GADM 4.1Country + division polygons. The clustering uses GADM-1 (8 BD divisions) instead of k-means.data/3_bangladesh_data/gadm/gadm41_BGD/gadm41_BGD.gpkg
GEBCO 2025Global bathymetry — used to mask offshore wind (max_depth: 50 m).data/2_global_geodata/gebco/GEBCO_2025_sub_ice.nc
HydroBASINS lvl 6Watershed polygons — feed Atlite's hydro inflow model (flowspeed 1 m/s).data/2_global_geodata/hydrobasins/hybas_*_lev06_v1c.shp
Copernicus PROBAV LC100Global land-cover raster (100 m). Excludes urban (50), water (80) etc. for RE potentials.data/2_global_geodata/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_*.tif
Marine Regions EEZ v11Exclusive Economic Zones — bounds offshore wind buildable area.data/2_global_geodata/eez/eez_v11.gpkg
Natura 2000Protected-area mask. Included tif (natura: true) excludes RE in those cells.data/2_global_geodata/natura/natura.tiff
WorldPop 2020 1 km (UN-adj, constrained)Population raster — underpins the urban/rural split, demand allocation, GDP layouts.data/3_bangladesh_data/WorldPop/bgd_ppp_2020_UNadj_constrained.tif
OSM (BD planet extract)HV grid skeleton (≥51 kV substations, lines, cables) and OSM-listed power plants.data/3_bangladesh_data/osm_pbf/bangladesh-latest.osm.pbf
OSM Power JSONPre-extracted BD power infrastructure (substations, lines, generators).data/3_bangladesh_data/osm_power/BD_power.json
Microsoft Global ML BuildingsBuilding footprints — used by solar_rooftop (currently disabled — IncompleteRead errors).data/3_bangladesh_data/global_buildings/
ERA5 (Copernicus CDS)Hourly weather cutout (wind, GHI, T, runoff) — the entire RE potentials story.missing — must rebuild via CDS API or get from Paul
SSP scenariosPer-cell load projections under SSP1–5 (we use SSP2-2.6).data/2_global_geodata/ssp2-2.6/{2030,2040,2050,2100}/

Energy / techno-economic data

SourceWhat & whyFile(s)
UN Statistics Division Energy DBNational-level energy balances by carrier and use; basis for sectoral demand split.data/4_reference_csvs/demand/unsd/data/UNdata_Export_*.txt (double-extracted at two timestamps in the zip — use the newer one)
IEA Bangladesh portalNational generation by source, TES, low-carbon share.data/IEA Energy Supply by Source/*.csv
IRENA capacity statsReference renewable installed capacity by country (used by estimate_renewable_capacities, year 2023).fetched at runtime via powerplantmatching
technology-data v0.13.2 (DEA / Fraunhofer / NREL)CAPEX, FOM, VOM, lifetime, efficiency, fuel cost for every tech, per year.resources/costs_2030.csv, costs_2040.csv, ... generated from raw.githubusercontent.com/PyPSA/technology-data/v0.13.2 — but disabled here (retrieve_cost_data: false) in favour of BD-specific costs.
BPDB / IEPMP 2023Bangladesh-specific cost data feeding the LP — overrides the DEA defaults. Mentioned in config comments.data/costs_*_bd.csv (referenced; ship as resources/costs_{year}.csv)
IEA Hydro annual generationUsed to rescale Atlite's hydro inflow to match observed annual energy.data/4_reference_csvs/eia_hydro_annual_generation.csv
Custom industrial databaseCoordinates & capacity of cement, steel, paper, aluminium plants — input to industrial distribution key.data/4_reference_csvs/industrial_database.csv (partial BD coverage)
SFI cement / pulp&paper databasesSource data for the industrial database.data/4_reference_csvs/industry/SFI-Global-Cement-Database-July-2021.xlsx + SFI_ALD_Pulp_Paper_Sample_LatAm_Jan_2023.xlsx
Aluminium productionAnnual primary Al demand (year 2019 here).data/4_reference_csvs/AL_production.csv
BDEW heat-load profile (Germany)Standard hourly heat-demand shape, scaled to local annual totals. ⚠ German shape — see caveats.data/4_reference_csvs/heat_load_profile_BDEW.csv
Hydro capacitiesEIA dataset — country totals for hydro / PHS power and energy.data/4_reference_csvs/hydro_capacities.csv
European car ownership (e-mobility)Per-capita car-stock used to scale transport demand.data/4_reference_csvs/emobility/European_countries_car_ownership.csv
Custom powerplants (Nahid & Roy 2025)BPDB-validated list of operating BD powerplants with retrofit/retire dates — replaces the auto-fetched OSM2PM list (custom_powerplants: replace).data/custom_powerplants.csv
SSP CAGR tablesPer-sector growth + efficiency-gain compound-annual rates used to project demand to 2030/40/50.data/4_reference_csvs/demand/{efficiency_gains_cagr.csv, growth_factors_cagr.csv, industry_growth_cagr.csv, fuel_shares.csv} ⚠ no BD row — falls back to “default” which is Morocco's curve.

Configuration files

FilePurpose
configs/config.yamlThe run config — overrides config.default.yaml. This is the only file that matters day-to-day.
configs/config.default.yamlUpstream defaults (every key documented here).
configs/bundle_config.yamlLists data bundles + Zenodo/Drive URLs for retrieve_databundle. Disabled here.
configs/regions_definition_config.yamlOSM column types + ISO-2/world region map.
configs/powerplantmatching_config.yamlOverrides for the powerplantmatching package (custom replace policy).

4 · Bangladesh customisations (Paul's diff)

pdockery/pypsa-earth sits two commits ahead of upstream pypsa-meets-earth/pypsa-earth main:

fe4ee35  Add solved Bangladesh energy system network results
2f77da87 Update config.default.yaml with Bangladesh configuration

The BD-specific decisions in our team's configs/config.yaml on top:

① Geographic + temporal scope

countries["BD"] snapshots2013-01-01 → 2014-01-01 (left-inclusive) weather year2013 (ERA5) foresightmyopic (sequential per horizon) planning_horizons[2030, 2040, 2050] opts / soptsCo2L-24H + 24h → 365 daily snapshots/horizon clusters8 = GADM-1 divisions (alternative_clustering: true)

② CO₂ trajectory (custom block, not in vanilla)

co2_budget:
  enable: true
  override_co2opt: true
  co2base_value: co2limit
  year:
    2030: 0.9    # 10% cut → 87.3 MtCO2
    2040: 0.45   # 55% cut → 43.7 MtCO2
    2050: 0.05   # 95% cut → 4.85 MtCO2

The fractions multiply electricity.co2base = 9.7 × 10⁷ tCO₂/yr (full-economy BD baseline). With override_co2opt: true, the rule rewrites the Co2L wildcard from opts using these fractions per horizon.

③ Sector flags — Phase-1 set

OnOffWhy off
heat, biomass, industry, land_transport, residential, services, agriculture shipping, aviation, rail_transport <0.3 % of national emissions; cause NaN issues for non-port nodes
OnOffWhy off
tes, chp, solar_thermal, boilers, biomass_transport, dac, ccv2g, bev_dsm, solar_rooftop, electricity_distribution_grid, home_battery, methanation, fischer_tropsch, SMR, SMR CC, helmeth, oil_boilers, micro_chpPhase 1 — minimise spatial-link explosion; solar_rooftop has flaky download
gas.spatial_gas: true, oil.spatial_oil: true, coal.spatial_coal: truegas.network, hydrogen.network, co2_network, lignite.spatial_ligniteBD has no GGIT pipeline data; spatial H₂/CO₂ networks add huge link counts

④ Custom industry-decarbonisation block (not vanilla — likely Paul's branch)

sector.industry_decarbonisation:
  coal_to_gas: true   # gas substitutes industrial coal
  coal_to_elec: true  # electric process heat replaces coal
  oil_to_gas: true
  oil_to_elec: true

When enabled, industrial coal/oil emissions become optimisable links rather than fixed loads — the LP gets to fuel-switch on the cost curve. This block is not in the public pdockery/pypsa-earth tree — it lives on an unpushed branch.

⑤ Existing-fleet treatment

custom_powerplantsreplace (use the BPDB-validated list) powerplants_filter(DateOut ≥ 2022 or DateOut != DateOut) — drops plants already retired allow_early_retirementtrue — optimiser can retire a unit before end-of-life if the LP justifies it keep_existing_capacitiestrue — 2030 starts from the actual ~38 GW fleet, not a blank slate estimate_renewable_capacities.statsirena, year 2023, p_nom_min: 1 (lock current installed RE in)

⑥ Numerical tuning for BD's bad scaling

BD cost data spans a 6 × 10⁹ range (very small marginal costs vs. very large investments). Default Gurobi diverges. The config switches to a custom gurobi-numeric-focus profile:

NumericFocus: 3        # stability over speed
ScaleFlag:    2        # aggressive matrix scaling
BarHomogeneous: 1      # fall back to homogeneous barrier
DualReductions: 0      # don't claim "infeasible_or_unbounded" on H2 export bus
BarConvTol: 1e-4       # relaxed tolerances
FeasibilityTol: 1e-4
OptimalityTol: 1e-4

5 · Inside the Case2_NetZero2050 run

What the optimiser is actually computing, in plain language.

Decision variables

Variable familyPerBounds
Generator.p_nom (capacity)tech × bus[p_nom_min, ∞] (or fixed if extendable: false)
Generator.p (dispatch)tech × bus × snapshot[0, p_max_pu × p_nom]
StorageUnit.p_dispatch / p_store / state_of_chargebus × snapshotstorage capacity, max_hours
Store.e (energy state)bus × snapshot0 ≤ e ≤ e_nom
Line.s_nomlinefixed at base × line-volume cap (here: copt = optimum subject to global cap)
Link.p_nom + Link.plink × snapshote.g. heat pumps, electrolysers, fuel cells

Constraints (the LP wall)

Objective

min  Σ_techs ( annuity(CAPEX) × p_nom + FOM × p_nom )
   +  Σ_techs,t ( marginal_cost(t) × p(t) )
   +  Σ_lines  ( line CAPEX × s_nom )
   +  Σ_co2    ( emission_price × emissions )

“Annuity” converts CAPEX into a per-year cost using the discount rate (0.071) and tech lifetime. So we're minimising annual system cost in 2030 / 2040 / 2050, separately, with each horizon's solution feeding the next as fixed brownfield.

Myopic foresight, in pictures

2030 horizon:
  - existing fleet (~38 GW, year ≥2022)            → fixed
  - extendable: solar, onwind, OCGT, CCGT, batteries, H2…
  - CO2 cap = 0.9 × 9.7e7 = 87.3 Mt
  - SOLVE → p_nom_opt[2030]

2040 horizon:
  - p_nom_opt[2030] is now BROWNFIELD (fixed)      ← add_brownfield
  - retire what aged out
  - new build optimised on top
  - CO2 cap = 0.45 × 9.7e7 = 43.7 Mt
  - SOLVE → p_nom_opt[2040]

2050 horizon:
  - p_nom_opt[2030] + p_nom_opt[2040] BROWNFIELD
  - CO2 cap = 0.05 × 9.7e7 = 4.85 Mt
  - SOLVE → p_nom_opt[2050]

Myopic ≠ optimal-overall. The 2030 LP doesn't know 2050's cap is 95 % cut — it only sees its own 10 % cap. So you can get short-sighted gas build-out in 2030 that becomes stranded in 2050. That's by design; it mimics how real planners decide. Perfect-foresight (Case3, future) would avoid it.

6 · Outputs — what comes out, how to read it

The headline files per horizon

FileWhat's in it
postnetworks/elec_s_8_ec_lcopt_Co2L-24H_24h_2030_0.071_AB_0export.ncSolved network for 2030 — capacities, dispatch, emissions, prices
postnetworks/...2040...nc2040 (with 2030 as brownfield)
postnetworks/...2050...nc2050 (with 2030+2040 as brownfield)
summaries/elec_s_8...csvCosts / capacities / energy / emissions tabulated across horizons
maps/...pdfGeographic plots — capacity by node, line flows, costs
graphs/{costs,energy,balances-energy}.pdfStacked-bar timelines

How to read a solved network in Python

import pypsa
n = pypsa.Network("results/Case2_NetZero2050/postnetworks/...2050_0.071_AB_0export.nc")

# Capacities
n.generators.groupby("carrier").p_nom_opt.sum() / 1e3   # GW per tech, year 2050
n.links.groupby("carrier").p_nom_opt.sum() / 1e3        # heat pumps, electrolysers…
n.stores.groupby("carrier").e_nom_opt.sum() / 1e6       # TWh of H2 / battery storage

# Dispatch
n.generators_t.p.sum() * n.snapshot_weightings.objective.iloc[0] / 1e6  # TWh/year
n.statistics.energy_balance().round()                   # full sectoral balance

# Costs
n.statistics.capex().round()                            # €/yr CAPEX by tech
n.statistics.opex().round()                             # €/yr OPEX by tech
n.statistics.installed_capex().round()                  # one-off €

# Emissions
n.global_constraints                                    # CO2 cap + shadow price
n.generators_t.p.mul(n.generators.carrier.map(n.carriers.co2_emissions))

The 8 buses

With alternative_clustering: true + gadm_layer_id: 1, the 8 buses are the GADM-1 divisions:

NAME_1Population (M)Modelled demand share
Dhaka~4526 %
Chittagong~3219 %
Rajshahi~2014 %
Rangpur~1710 %
Khulna~1617 %
Mymensingh~12(0 % — clustering artefact, see caveats)
Sylhet~125 %
Barisal~810 %

7 · Karim outputs — first results off the network

Run on pypsa-bd/networks/elec_s_10*.nc (Paul's pre-built BD network, 10 buses, 2013 hourly). Source scripts in karim_outputs/scripts/.

Demand fundamentals

80.1TWh/year (2013)
12.7GW peak (2 Jun 2013, 13:00)
0.72Annual load factor
Annual demand heatmap
Annual demand surface — hour-of-day × day-of-year. Strong afternoon peak May–Sept (cooling load), winter trough.
Daily demand shapes
Peak day (Jun) vs winter day (Jan) — 50 % swing in absolute level, similar diurnal shape.
Seasonality
Monthly mean ± min/max — summer cooling drives the seasonal peak.
Demand by division
Per-division breakdown of monthly mean demand.

Transmission stress — pandapower load-flow snapshot

25 %Max line loading
11 %Average line loading
0Lines > 70 %
Loading histogram
The modelled 380 kV grid is dramatically over-built at this snapshot — every line below 25 %. Suggests under-aggregated demand or that the post-clustering line capacity is too generous.
Bus voltages
All bus voltages 1.01–1.05 pu — comfortably inside operating limits.
Grid map
Geographic plot — buses + AC lines coloured by loading.

Equity — demand per capita by division

Per-capita demand
Equity ratio = (demand share) / (population share). Barisal & Khulna register 1.6–1.9× — modelled demand allocation overshoots their share. Mymensingh = 0 is a clustering bug (no PyPSA bus inside polygon).
Equity map
Choropleth — kWh per capita by GADM-1 division.

Energy security — fleet composition & concentration

93 %imported-fuel reliance
4366Fuel HHI (very concentrated)
0.6 %Renewables share of installed
Fuel mix
Natural gas dominates (61 %); coal + oil + nuclear add another 38 %. Renewables visible on the rim.
Retirement schedule
3.4 GW of CCGT capacity is already past its DateOut (≤2025). Replacement window opens immediately.

8 · Data caveats & open issues

The “default = Morocco” problem
Four reference CSVs have no Bangladesh row — efficiency_gains_cagr.csv, growth_factors_cagr.csv, industry_growth_cagr.csv, fuel_shares.csv. The loader silently falls back to the row labelled default, which is parameterised on Morocco. This means BD's residential heat demand grows on a Moroccan curve unless we add a BD-row override.
Heat-load shape is German (BDEW)
heat_load_profile_BDEW.csv is a German residential heat-demand shape (BDEW H0). BD has effectively no space-heat — “residential heat” here is mostly cooking/water. The BDEW shape will misrepresent both magnitude and timing. Worth a sensitivity.
Costs are vintage
data/4_reference_csvs/costs.csv is 2010–2013 vintage. The config disables retrieve_cost_data and points at BD-specific costs — but those need verification (IEPMP 2023 / BPDB sources). Renewable CAPEX in particular has dropped >60 % since 2013.
Empty industrial coverage
The bundled industrial_database.csv has partial BD coverage. The pulp/paper and cement files in data/4_reference_csvs/industry/ are LatAm samples, not BD. Industrial heat demand is therefore dispatched against an under-specified node-level industrial layout.
ERA5 cutout missing
The OneDrive export skipped Data/1_cutouts (see ___All_Errors.txt). Without it, build_renewable_profiles, build_temperature_profiles, build_solar_thermal_profiles, and build_heat_demand all fail. Two fixes: (a) get a Copernicus CDS API key and run build_cutout (~30 min for BD bbox × 2013); (b) request the file from Paul.
Custom config blocks aren't in Paul's pushed code
industry_decarbonisation, co2_budget.override_co2opt, the Case2_NetZero2050 resource folder, and allow_early_retirement all appear in our team's config.yaml but none are in the git history of pdockery/pypsa-earth on GitHub. They live on an unpushed working branch. Action: ask Paul for the branch + push it.
Mymensingh clustering artefact
The pre-built elec_s_10 network has 10 PyPSA buses, but they don't centroidally fall inside all 8 GADM-1 polygons. Mymensingh has no PyPSA bus → zero demand allocated. If that bus mapping survives into the GADM-1 (8-cluster) run, equity analysis will mis-attribute. Verify after the GADM-1 run is solved.
Solar rooftop disabled
download_global_buildings keeps failing with IncompleteRead timeouts → solar_rooftop: false. So distributed PV potential is excluded from the LP. Given Bangladesh's million-solar-home-systems history, this is a meaningful omission for both equity and security narratives.

Generated for the FalgunXBangladesh team, May 2026. Source repo: /Users/karimarnous/FalgunXBangladesh. PyPSA-Earth fork: github.com/pdockery/pypsa-earth. PyPSA-BD analysis bundle: github.com/pdockery/pypsa-bd. Karim outputs: /Users/karimarnous/FalgunXBangladesh/karim_outputs/.