Histogram

Description

The Histogram diagnostic computes and plots histograms or probability density functions (PDFs) of climate variables over specified regions. The diagnostic supports both raw histograms (counts per bin) and normalized PDFs (probability density functions with integral = 1). Histograms can be computed over specific geographic regions, with default regions available or custom regions definable in the configuration file. Optional latitudinal weighting accounts for grid cell area variations.

Classes

There are two main classes for computing and plotting histograms:

  • Histogram: Computes histograms or PDFs of climate variables.

    • Supports raw histograms (counts) and normalized PDFs (density=True)

    • Optional latitudinal weighting to account for grid cell area

    • Customizable bin count and range for histogram computation

    • Regional analysis with predefined or custom regions

  • PlotHistogram: Produces publication-quality line plots of computed histograms/PDFs.

    • Single or multi-model comparison plots

    • Optional reference dataset overlay

    • Logarithmic scales for x and y axes

    • Optional smoothing with configurable window size

    • Customizable axis limits

Note

The diagnostic computes histograms over the entire temporal period specified (no seasonal decomposition).

Getting Started

File locations:

  • Diagnostic code: src/aqua_diagnostics/histogram/

  • Region definitions: config/tools/histogram/definitions/regions.yaml

  • Example notebook: notebooks/diagnostics/histogram/

  • Config template: templates/diagnostics/config-histogram.yaml

Supported variables:

The diagnostic works with climate variables on regular latitude-longitude grids:

  • Direct variables: tprate (precipitation), 2t (temperature), sst (sea surface temperature), etc.

  • Derived variables: Using EvaluateFormula syntax (e.g., 2t - 273.15 for Β°C)

Supported regions:

global (or null), tropics, europe, nh (Northern Hemisphere), sh (Southern Hemisphere).

Basic usage

The recommended way to use this diagnostic is through the Python API, as shown in the notebook below.

Minimal example:

from aqua.diagnostics.histogram import Histogram, PlotHistogram

# Compute histogram/PDF
hist = Histogram(
    catalog='climatedt-phase1',
    model='ICON',
    exp='historical-1990',
    source='lra-r100-monthly',
    startdate='1990-01-01',
    enddate='1999-12-31',
    bins=100,
    weighted=True
)
hist.run(var='tprate', units='mm/day', density=True)

# Plot PDF
plot = PlotHistogram(data=[hist.histogram_data])
plot.run(outputdir='./', ylogscale=True)

For multi-model comparisons or reference data, see the detailed examples in the section below.

Available demo notebooks

πŸ““ Single histogram/PDF plot β†’ histogram.ipynb

Learn the basics: compute histograms/PDFs, compare with observations, customize plots

Key concepts covered:

  • Histogram vs PDF: density=False (counts) vs density=True (probability density)

  • Latitudinal weighting: weighted=True for area-corrected distributions

  • Bin configuration: bins (number) and range (min/max) parameters

  • Plot customization: log scales (xlogscale, ylogscale), smoothing, axis limits

  • Regional selection and custom regions

CLI usage

For batch processing or automation, the diagnostic can be run via CLI using a configuration file:

# Copy and customize the template
cp templates/diagnostics/config-histogram.yaml my_config.yaml

# Run diagnostic
python src/aqua_diagnostics/histogram/cli_histogram.py \
    --config my_config.yaml \
    --model ICON \
    --exp historical-1990 \
    --loglevel INFO

Key CLI arguments:

--config, --model, --exp, --catalog, --source, --regrid, --realization, --outputdir, --startdate, --enddate, --loglevel, --nworkers

For the complete list of arguments, run:

python src/aqua_diagnostics/histogram/cli_histogram.py --help

Note

Suggested workflow: Copy the template (cp templates/diagnostics/config-histogram.yaml my_config.yaml), customize it with your parameters, and run with --config my_config.yaml.

Quick testing: CLI arguments (--model, --exp, etc.) can override config file values without editing the file, useful for rapid experimentation.

For most use cases, we recommend the programmatic approach (notebooks) rather than CLI.

Configuration file structure

The template (templates/diagnostics/config-histogram.yaml) defines datasets, reference data, and diagnostic parameters:

Basic structure:

# Dataset(s) to analyze
datasets:
  - catalog: 'climatedt-phase1'
    model: 'ICON'
    exp: 'historical-1990'
    source: 'lra-r100-monthly'
    startdate: '1990-01-01'
    enddate: '1999-12-31'

# Reference dataset (optional)
references:
  - catalog: 'obs'
    model: 'ERA5'
    exp: 'era5'
    source: 'monthly'
    startdate: '1990-01-01'
    enddate: '1999-12-31'

# Output settings
output:
  outputdir: "./"
  save_pdf: true
  save_png: true
  dpi: 300

# Diagnostic configuration
diagnostics:
  histogram:
    run: true
    bins: 100                    # Number of bins
    range: null                  # [min, max] or null for auto
    weighted: true               # Use latitudinal weights
    density: true                # Compute PDF (normalized)
    xlogscale: false             # Log scale for x-axis
    ylogscale: true              # Log scale for y-axis
    smooth: false                # Apply smoothing
    variables:
      - name: 'tprate'
        units: 'mm/day'
        regions: ['global', 'tropics']

Multiple datasets example (for multi-model comparison):

datasets:
  - catalog: 'climatedt-phase1'
    model: 'ICON'
    exp: 'historical-1990'
    source: 'lra-r100-monthly'
    startdate: '1990-01-01'
    enddate: '1999-12-31'

  - catalog: 'climatedt-phase1'
    model: 'IFS-NEMO'
    exp: 'historical-1990'
    source: 'lra-r100-monthly'
    startdate: '1990-01-01'
    enddate: '1999-12-31'

Variable-specific parameters:

diagnostics:
  histogram:
    variables:
      - name: 'tprate'
        regions: ['global']
        range: [0, 20]           # Custom range for this variable
        bins: 50                 # Override global bins setting
        lon_limits: [-180, 180]  # Optional spatial constraints
        lat_limits: [-60, 60]

Derived variables (using formulas):

diagnostics:
  histogram:
    formulae:
      - name: 'temp_celsius'
        formula: '2t - 273.15'
        units: 'Β°C'
        long_name: 'Temperature in Celsius'
        regions: ['global', 'tropics']

For the complete template with all available options, see templates/diagnostics/config-histogram.yaml.

Outputs

The diagnostic generates:

πŸ“Š Plots (PDF and/or PNG):

  • Histogram/PDF line plots

  • Multi-model comparisons with reference data

  • Optional smoothing and custom axis limits

πŸ“ NetCDF files:

  • Computed histogram data with bin centers and counts/densities

  • Metadata preserved from original variables

Naming convention:

histogram.<diagnostic>.<catalog>.<model>.<exp>.<realization>.<var>.nc

histogram.<diagnostic>_pdf.<catalog>.<model>.<exp>.<realization>.<var>.<format>

Example:

histogram.histogram_pdf.climatedt-phase1.ICON.historical-1990.r1.tprate.png

Example plots

stateoftheart_diagnostics/figures/histogram_tprate_global.png

Probability density function (PDF) of precipitation rate (mm/day) for the global region, showing ICON model output compared to ERA5 reference data.

Reference datasets

Common reference datasets:

  • ERA5: ECMWF’s fifth generation reanalysis for global climate

  • MSWEP: Multi-Source Weighted-Ensemble Precipitation dataset

  • BERKELEY-EARTH: Berkeley Earth Surface Temperature dataset

Authors and contributors

This diagnostic is maintained by Marco Cadau (@mcadau, marco.cadau@polito.it), member of the AQUA team.

Contributions are welcome β€” please open an issue or pull request. For questions, contact the AQUA team or the maintainer.

Developer Notes

Internal structure:

The diagnostic uses a three-step process:

  1. Data retrieval via Reader from catalog:

    • Applies temporal and spatial selection

    • Handles unit conversion if needed

  2. Histogram computation via aqua.histogram.histogram():

    • Optional latitudinal weighting: weights = cos(lat)

    • Bin calculation: NumPy or Dask histogram

    • Normalization: if density=True, integrates to 1

  3. Storage as xarray DataArray:

    • Dimension: center_of_bin (bin centers)

    • Coordinate: width (bin widths)

    • Attributes: preserves original variable metadata

Data attributes:

Metadata attached to histogram DataArrays:

  • AQUA_catalog, AQUA_model, AQUA_exp: Data provenance

  • AQUA_region: Selected region name

  • size_of_the_data: Original data size

  • units: 'counts' or 'probability density'

  • Standard CF attributes: long_name, standard_name

Graphics function:

  • plot_histogram(): Line plot with flexible styling

    • Handles single or multiple DataArrays

    • Supports reference data overlay

    • Optional smoothing with moving average

    • Logarithmic scales for both axes

    • Auto-detects bin centers and values

Data flow:

  1. Histogram.retrieve() β†’ Get data from catalog

  2. Histogram.compute_histogram() β†’ Call aqua.histogram.histogram()

  3. Histogram.save_netcdf() β†’ Save processed data

  4. PlotHistogram.__init__() β†’ Load data and metadata

  5. PlotHistogram.run() β†’ Create and save plots

Smoothing algorithm:

Simple moving average with edge handling:

kernel = np.ones(window_size) / window_size
smoothed = np.convolve(data, kernel, mode='same')

Latitudinal weighting:

Accounts for decreasing grid cell area toward poles:

weights = np.cos(np.radians(lat))

API Reference

class aqua.diagnostics.histogram.Histogram(model: str, exp: str, source: str, catalog: str = None, regrid: str = None, startdate: str = None, enddate: str = None, region: str = None, lon_limits: list = None, lat_limits: list = None, regions_file_path: str = None, bins: int = 100, range: tuple = None, weighted: bool = True, diagnostic_name: str = 'histogram', loglevel: str = 'WARNING')

Bases: Diagnostic

Class to compute histograms and probability density functions (PDFs) of a variable over a specified region. Retrieves data from catalog, computes histograms/PDFs for the entire period, and saves results to netcdf files.

Initialize the Histogram diagnostic class.

Parameters:
  • model (str) – Model to be used for data retrieval.

  • exp (str) – Experiment to be used for data retrieval.

  • source (str) – Source to be used for data retrieval.

  • catalog (str, optional) – Catalog for data retrieval.

  • regrid (str, optional) – Regridding method.

  • startdate (str, optional) – Start date of data to retrieve.

  • enddate (str, optional) – End date of data to retrieve.

  • region (str, optional) – Region for data retrieval.

  • lon_limits (list, optional) – Longitude limits of region.

  • lat_limits (list, optional) – Latitude limits of region.

  • regions_file_path (str, optional) – Path to regions file.

  • bins (int, optional) – Number of bins for histogram. Default 100.

  • range (tuple, optional) – Range for histogram bins (min, max).

  • weighted (bool, optional) – Use latitudinal weights. Default True.

  • diagnostic_name (str, optional) – Name of diagnostic. Default β€˜histogram’.

  • loglevel (str, optional) – Log level.

compute_histogram(box_brd: bool = True, density: bool = True)

Compute histogram of the data for the entire period.

Parameters:
  • box_brd (bool) – Include box boundaries in area selection.

  • density (bool) – If True, returns PDF normalized to integrate to 1.

retrieve(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, reader_kwargs: dict = {})

Retrieve data for the specified variable using the parent Diagnostic class.

Parameters:
  • var (str) – Variable to retrieve.

  • formula (bool) – Whether to use formula for variable.

  • long_name (str) – Long name of variable.

  • units (str) – Units of variable.

  • standard_name (str) – Standard name of variable.

  • reader_kwargs (dict) – Additional Reader kwargs.

run(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, box_brd: bool = True, density: bool = True, outputdir: str = './', rebuild: bool = True, reader_kwargs: dict = {})

Run all steps for histogram computation.

Parameters:
  • var (str) – Variable to retrieve and compute.

  • formula (bool) – Use formula for variable.

  • long_name (str) – Long name of variable.

  • units (str) – Units of variable.

  • standard_name (str) – Standard name of variable.

  • box_brd (bool) – Include box boundaries.

  • density (bool) – Return PDF (normalized) instead of counts.

  • outputdir (str) – Output directory.

  • rebuild (bool) – Rebuild existing files.

  • reader_kwargs (dict) – Additional Reader kwargs.

save_netcdf(outputdir: str = './', rebuild: bool = True)

Save histogram data to netcdf file.

Parameters:
  • outputdir (str) – Output directory.

  • rebuild (bool) – Rebuild if file exists.

class aqua.diagnostics.histogram.PlotHistogram(data=None, ref_data=None, diagnostic_name='histogram', loglevel: str = 'WARNING')

Bases: object

Class for plotting Histogram diagnostics. Provides methods to plot histogram/PDF data with customizable labels, titles, and styling options.

Initialize the PlotHistogram class.

Parameters:
  • data – List of histogram DataArrays to plot, or single DataArray.

  • ref_data – Reference histogram DataArray.

  • diagnostic_name (str) – Name of the diagnostic. Default is β€˜histogram’.

  • loglevel (str) – Logging level. Default is β€˜WARNING’.

get_data_info()

Extract metadata from data arrays.

plot(data_labels=None, ref_label=None, title=None, style=None, xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5)

Plot histogram data.

Parameters:
  • data_labels (list, optional) – Labels for the data.

  • ref_label (str, optional) – Label for the reference data.

  • title (str, optional) – Title for the plot.

  • style (str, optional) – Plotting style.

  • xlogscale (bool) – Use log scale for x-axis.

  • ylogscale (bool) – Use log scale for y-axis.

  • xmax (float, optional) – Maximum x value.

  • xmin (float, optional) – Minimum x value.

  • ymax (float, optional) – Maximum y value.

  • ymin (float, optional) – Minimum y value.

  • smooth (bool) – Apply smoothing to data.

  • smooth_window (int) – Window size for smoothing.

Returns:

Matplotlib figure and axes objects.

Return type:

tuple

run(outputdir='./', rebuild=True, dpi=300, style=None, format='png', xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5)

Run the complete plotting workflow.

Parameters:
  • outputdir (str) – Output directory to save the plot.

  • rebuild (bool) – If True, rebuild the plot even if it already exists.

  • dpi (int) – Dots per inch for the plot.

  • style (str) – Plotting style.

  • format (str) – Format of the plot (β€˜png’ or β€˜pdf’).

  • xlogscale (bool) – Use log scale for x-axis.

  • ylogscale (bool) – Use log scale for y-axis.

  • xmax (float, optional) – Maximum x value.

  • xmin (float, optional) – Minimum x value.

  • ymax (float, optional) – Maximum y value.

  • ymin (float, optional) – Minimum y value.

  • smooth (bool) – Apply smoothing to data.

  • smooth_window (int) – Window size for smoothing.

save_plot(fig, description: str = None, rebuild: bool = True, outputdir: str = './', dpi: int = 300, format: str = 'png')

Save the plot to a file.

Parameters:
  • fig (matplotlib.figure.Figure) – Figure object.

  • description (str) – Description of the plot.

  • rebuild (bool) – If True, rebuild the plot even if it already exists.

  • outputdir (str) – Output directory to save the plot.

  • dpi (int) – Dots per inch for the plot.

  • format (str) – Format of the plot (β€˜png’ or β€˜pdf’).

set_data_labels()

Set the data labels for the plot.

set_description()

Set the description for the plot.

set_ref_label()

Set the reference label for the plot.

set_title()

Set the title for the plot.