Skip to content
This repository was archived by the owner on Apr 2, 2026. It is now read-only.
This repository was archived by the owner on Apr 2, 2026. It is now read-only.

Agree on an approach for handling units in .csv files #60

Description

@irm-codebase

One of the last steps in establishing interfacing between a module and the 'outside' is units metadata for .csv files.

This is the only file type we handle that has this issue, since netCDF files have units as standardized metadata.

Based on several tests, I believe the best approach is to add a second 'header' that states the unit of each column, using No Unit to specify columns with text values, ratios, etc.

Reasoning:

  • Removing a second header row is easy in pandas
  • Fits well with the 'tidy' dataframe approach.
  • It produces way less data overhead than adding an extra column in the case of timeseries.
  • calliope v0.7 is able to skip rows easily, so it avoids extra processing on the modelling side.
  • Allows module devs to easily use pint for unit conversion, if they choose to.

Examples

Picture this table:

attribute,country,vehicle_type,vehicle_subtype,carrier,year,TotalEnergyConsumption
units,No Unit,No Unit,No Unit,No Unit, years, ktoe
index,,,,,,
0,DEU,Powered two-wheelers,Gasoline engine,Gasoline,2000,476.0664153213859
1,DEU,Powered two-wheelers,Gasoline engine,BioGasoline,2000,0.0
2,DEU,Passenger cars,Gasoline engine,Gasoline,2000,29431.27094818872
3,DEU,Passenger cars,Gasoline engine,BioGasoline,2000,0.0
4,DEU,Passenger cars,Diesel oil engine,Diesel,2000,8255.653519490721
attribute country vehicle_type vehicle_subtype carrier year TotalEnergyConsumption
units No Unit No Unit No Unit No Unit years ktoe
index
0 DEU Powered two-wheelers Gasoline engine Gasoline 2000 476.0664153213859
1 DEU Powered two-wheelers Gasoline engine BioGasoline 2000 0.0
2 DEU Passenger cars Gasoline engine Gasoline 2000 29431.27094818872
3 DEU Passenger cars Gasoline engine BioGasoline 2000 0.0
4 DEU Passenger cars Diesel oil engine Diesel 2000 8255.653519490721

Loading and removing the second header:

If you do not want to use any fancy libraries to handle units, cleaning the data is trivial:

data = pd.read_csv("tmp/test2.csv", header=[0, 1], index_col=0)
data.columns = data.columns.droplevel("units")
data.head()

image

Automatic unit conversion with pint

If you want to be fancy (and lazy), you can just as easily use pint to do all the unit heavy lifting for you.

data = pd.read_csv("tmp/test2.csv", header=[0,1], index_col=0)
data = data.pint.quantify(level=-1).head()
data.head()

image

data.dtypes

image

data['TotalEnergyConsumption'].pint.to_base_units()

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions