Histogramο
Descriptionο
The Histogram diagnostic computes and plots histograms or probability density functions (PDFs) of climate variables over specified regions. The diagnostic supports both raw histograms (counts per bin) and normalized PDFs (probability density functions with integral = 1). Histograms can be computed over specific geographic regions, with default regions available or custom regions definable in the configuration file. Optional latitudinal weighting accounts for grid cell area variations.
Classesο
There are two main classes for computing and plotting histograms:
Histogram: Computes histograms or PDFs of climate variables.
Supports raw histograms (counts) and normalized PDFs (
density=True)Optional latitudinal weighting to account for grid cell area
Customizable bin count and range for histogram computation
Regional analysis with predefined or custom regions
PlotHistogram: Produces publication-quality line plots of computed histograms/PDFs.
Single or multi-model comparison plots
Optional reference dataset overlay
Logarithmic scales for x and y axes
Optional smoothing with configurable window size
Customizable axis limits
Note
The diagnostic computes histograms over the entire temporal period specified (no seasonal decomposition).
Getting Startedο
File locations:
Diagnostic code:
src/aqua_diagnostics/histogram/Region definitions:
config/tools/histogram/definitions/regions.yamlExample notebook:
notebooks/diagnostics/histogram/Config template:
templates/diagnostics/config-histogram.yaml
Supported variables:
The diagnostic works with climate variables on regular latitude-longitude grids:
Direct variables:
tprate(precipitation),2t(temperature),sst(sea surface temperature), etc.Derived variables: Using
EvaluateFormulasyntax (e.g.,2t - 273.15for Β°C)
Supported regions:
global (or null), tropics, europe, nh (Northern Hemisphere),
sh (Southern Hemisphere).
Basic usageο
The recommended way to use this diagnostic is through the Python API, as shown in the notebook below.
Minimal example:
from aqua.diagnostics.histogram import Histogram, PlotHistogram
# Compute histogram/PDF
hist = Histogram(
catalog='climatedt-phase1',
model='ICON',
exp='historical-1990',
source='lra-r100-monthly',
startdate='1990-01-01',
enddate='1999-12-31',
bins=100,
weighted=True
)
hist.run(var='tprate', units='mm/day', density=True)
# Plot PDF
plot = PlotHistogram(data=[hist.histogram_data])
plot.run(outputdir='./', ylogscale=True)
For multi-model comparisons or reference data, see the detailed examples in the section below.
Available demo notebooksο
π Single histogram/PDF plot β histogram.ipynb
Learn the basics: compute histograms/PDFs, compare with observations, customize plots
Key concepts covered:
Histogram vs PDF:
density=False(counts) vsdensity=True(probability density)Latitudinal weighting:
weighted=Truefor area-corrected distributionsBin configuration:
bins(number) andrange(min/max) parametersPlot customization: log scales (
xlogscale,ylogscale), smoothing, axis limitsRegional selection and custom regions
CLI usageο
For batch processing or automation, the diagnostic can be run via CLI using a configuration file:
# Copy and customize the template
cp templates/diagnostics/config-histogram.yaml my_config.yaml
# Run diagnostic
python src/aqua_diagnostics/histogram/cli_histogram.py \
--config my_config.yaml \
--model ICON \
--exp historical-1990 \
--loglevel INFO
Key CLI arguments:
--config, --model, --exp, --catalog, --source, --regrid,
--realization, --outputdir, --startdate, --enddate, --loglevel, --nworkers
For the complete list of arguments, run:
python src/aqua_diagnostics/histogram/cli_histogram.py --help
Note
Suggested workflow: Copy the template
(cp templates/diagnostics/config-histogram.yaml my_config.yaml), customize it with
your parameters, and run with --config my_config.yaml.
Quick testing: CLI arguments (--model, --exp, etc.) can override config file
values without editing the file, useful for rapid experimentation.
For most use cases, we recommend the programmatic approach (notebooks) rather than CLI.
Configuration file structureο
The template (templates/diagnostics/config-histogram.yaml) defines datasets,
reference data, and diagnostic parameters:
Basic structure:
# Dataset(s) to analyze
datasets:
- catalog: 'climatedt-phase1'
model: 'ICON'
exp: 'historical-1990'
source: 'lra-r100-monthly'
startdate: '1990-01-01'
enddate: '1999-12-31'
# Reference dataset (optional)
references:
- catalog: 'obs'
model: 'ERA5'
exp: 'era5'
source: 'monthly'
startdate: '1990-01-01'
enddate: '1999-12-31'
# Output settings
output:
outputdir: "./"
save_pdf: true
save_png: true
dpi: 300
# Diagnostic configuration
diagnostics:
histogram:
run: true
bins: 100 # Number of bins
range: null # [min, max] or null for auto
weighted: true # Use latitudinal weights
density: true # Compute PDF (normalized)
xlogscale: false # Log scale for x-axis
ylogscale: true # Log scale for y-axis
smooth: false # Apply smoothing
variables:
- name: 'tprate'
units: 'mm/day'
regions: ['global', 'tropics']
Multiple datasets example (for multi-model comparison):
datasets:
- catalog: 'climatedt-phase1'
model: 'ICON'
exp: 'historical-1990'
source: 'lra-r100-monthly'
startdate: '1990-01-01'
enddate: '1999-12-31'
- catalog: 'climatedt-phase1'
model: 'IFS-NEMO'
exp: 'historical-1990'
source: 'lra-r100-monthly'
startdate: '1990-01-01'
enddate: '1999-12-31'
Variable-specific parameters:
diagnostics:
histogram:
variables:
- name: 'tprate'
regions: ['global']
range: [0, 20] # Custom range for this variable
bins: 50 # Override global bins setting
lon_limits: [-180, 180] # Optional spatial constraints
lat_limits: [-60, 60]
Derived variables (using formulas):
diagnostics:
histogram:
formulae:
- name: 'temp_celsius'
formula: '2t - 273.15'
units: 'Β°C'
long_name: 'Temperature in Celsius'
regions: ['global', 'tropics']
For the complete template with all available options, see
templates/diagnostics/config-histogram.yaml.
Outputsο
The diagnostic generates:
π Plots (PDF and/or PNG):
Histogram/PDF line plots
Multi-model comparisons with reference data
Optional smoothing and custom axis limits
π NetCDF files:
Computed histogram data with bin centers and counts/densities
Metadata preserved from original variables
Naming convention:
histogram.<diagnostic>.<catalog>.<model>.<exp>.<realization>.<var>.nc
histogram.<diagnostic>_pdf.<catalog>.<model>.<exp>.<realization>.<var>.<format>
Example:
histogram.histogram_pdf.climatedt-phase1.ICON.historical-1990.r1.tprate.png
Example plotsο
Probability density function (PDF) of precipitation rate (mm/day) for the global region, showing ICON model output compared to ERA5 reference data.ο
Reference datasetsο
Common reference datasets:
ERA5: ECMWFβs fifth generation reanalysis for global climate
MSWEP: Multi-Source Weighted-Ensemble Precipitation dataset
BERKELEY-EARTH: Berkeley Earth Surface Temperature dataset
Developer Notesο
Internal structure:
The diagnostic uses a three-step process:
Data retrieval via
Readerfrom catalog:Applies temporal and spatial selection
Handles unit conversion if needed
Histogram computation via
aqua.histogram.histogram():Optional latitudinal weighting:
weights = cos(lat)Bin calculation: NumPy or Dask histogram
Normalization: if
density=True, integrates to 1
Storage as xarray DataArray:
Dimension:
center_of_bin(bin centers)Coordinate:
width(bin widths)Attributes: preserves original variable metadata
Data attributes:
Metadata attached to histogram DataArrays:
AQUA_catalog,AQUA_model,AQUA_exp: Data provenanceAQUA_region: Selected region namesize_of_the_data: Original data sizeunits:'counts'or'probability density'Standard CF attributes:
long_name,standard_name
Graphics function:
plot_histogram(): Line plot with flexible stylingHandles single or multiple DataArrays
Supports reference data overlay
Optional smoothing with moving average
Logarithmic scales for both axes
Auto-detects bin centers and values
Data flow:
Histogram.retrieve()β Get data from catalogHistogram.compute_histogram()β Callaqua.histogram.histogram()Histogram.save_netcdf()β Save processed dataPlotHistogram.__init__()β Load data and metadataPlotHistogram.run()β Create and save plots
Smoothing algorithm:
Simple moving average with edge handling:
kernel = np.ones(window_size) / window_size
smoothed = np.convolve(data, kernel, mode='same')
Latitudinal weighting:
Accounts for decreasing grid cell area toward poles:
weights = np.cos(np.radians(lat))
API Referenceο
- class aqua.diagnostics.histogram.Histogram(model: str, exp: str, source: str, catalog: str = None, regrid: str = None, startdate: str = None, enddate: str = None, region: str = None, lon_limits: list = None, lat_limits: list = None, regions_file_path: str = None, bins: int = 100, range: tuple = None, weighted: bool = True, diagnostic_name: str = 'histogram', loglevel: str = 'WARNING')ο
Bases:
DiagnosticClass to compute histograms and probability density functions (PDFs) of a variable over a specified region. Retrieves data from catalog, computes histograms/PDFs for the entire period, and saves results to netcdf files.
Initialize the Histogram diagnostic class.
- Parameters:
model (str) β Model to be used for data retrieval.
exp (str) β Experiment to be used for data retrieval.
source (str) β Source to be used for data retrieval.
catalog (str, optional) β Catalog for data retrieval.
regrid (str, optional) β Regridding method.
startdate (str, optional) β Start date of data to retrieve.
enddate (str, optional) β End date of data to retrieve.
region (str, optional) β Region for data retrieval.
lon_limits (list, optional) β Longitude limits of region.
lat_limits (list, optional) β Latitude limits of region.
regions_file_path (str, optional) β Path to regions file.
bins (int, optional) β Number of bins for histogram. Default 100.
range (tuple, optional) β Range for histogram bins (min, max).
weighted (bool, optional) β Use latitudinal weights. Default True.
diagnostic_name (str, optional) β Name of diagnostic. Default βhistogramβ.
loglevel (str, optional) β Log level.
- compute_histogram(box_brd: bool = True, density: bool = True)ο
Compute histogram of the data for the entire period.
- Parameters:
box_brd (bool) β Include box boundaries in area selection.
density (bool) β If True, returns PDF normalized to integrate to 1.
- retrieve(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, reader_kwargs: dict = {})ο
Retrieve data for the specified variable using the parent Diagnostic class.
- Parameters:
var (str) β Variable to retrieve.
formula (bool) β Whether to use formula for variable.
long_name (str) β Long name of variable.
units (str) β Units of variable.
standard_name (str) β Standard name of variable.
reader_kwargs (dict) β Additional Reader kwargs.
- run(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, box_brd: bool = True, density: bool = True, outputdir: str = './', rebuild: bool = True, reader_kwargs: dict = {})ο
Run all steps for histogram computation.
- Parameters:
var (str) β Variable to retrieve and compute.
formula (bool) β Use formula for variable.
long_name (str) β Long name of variable.
units (str) β Units of variable.
standard_name (str) β Standard name of variable.
box_brd (bool) β Include box boundaries.
density (bool) β Return PDF (normalized) instead of counts.
outputdir (str) β Output directory.
rebuild (bool) β Rebuild existing files.
reader_kwargs (dict) β Additional Reader kwargs.
- save_netcdf(outputdir: str = './', rebuild: bool = True)ο
Save histogram data to netcdf file.
- Parameters:
outputdir (str) β Output directory.
rebuild (bool) β Rebuild if file exists.
- class aqua.diagnostics.histogram.PlotHistogram(data=None, ref_data=None, diagnostic_name='histogram', loglevel: str = 'WARNING')ο
Bases:
objectClass for plotting Histogram diagnostics. Provides methods to plot histogram/PDF data with customizable labels, titles, and styling options.
Initialize the PlotHistogram class.
- Parameters:
data β List of histogram DataArrays to plot, or single DataArray.
ref_data β Reference histogram DataArray.
diagnostic_name (str) β Name of the diagnostic. Default is βhistogramβ.
loglevel (str) β Logging level. Default is βWARNINGβ.
- get_data_info()ο
Extract metadata from data arrays.
- plot(data_labels=None, ref_label=None, title=None, style=None, xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5)ο
Plot histogram data.
- Parameters:
data_labels (list, optional) β Labels for the data.
ref_label (str, optional) β Label for the reference data.
title (str, optional) β Title for the plot.
style (str, optional) β Plotting style.
xlogscale (bool) β Use log scale for x-axis.
ylogscale (bool) β Use log scale for y-axis.
xmax (float, optional) β Maximum x value.
xmin (float, optional) β Minimum x value.
ymax (float, optional) β Maximum y value.
ymin (float, optional) β Minimum y value.
smooth (bool) β Apply smoothing to data.
smooth_window (int) β Window size for smoothing.
- Returns:
Matplotlib figure and axes objects.
- Return type:
tuple
- run(outputdir='./', rebuild=True, dpi=300, style=None, format='png', xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5)ο
Run the complete plotting workflow.
- Parameters:
outputdir (str) β Output directory to save the plot.
rebuild (bool) β If True, rebuild the plot even if it already exists.
dpi (int) β Dots per inch for the plot.
style (str) β Plotting style.
format (str) β Format of the plot (βpngβ or βpdfβ).
xlogscale (bool) β Use log scale for x-axis.
ylogscale (bool) β Use log scale for y-axis.
xmax (float, optional) β Maximum x value.
xmin (float, optional) β Minimum x value.
ymax (float, optional) β Maximum y value.
ymin (float, optional) β Minimum y value.
smooth (bool) β Apply smoothing to data.
smooth_window (int) β Window size for smoothing.
- save_plot(fig, description: str = None, rebuild: bool = True, outputdir: str = './', dpi: int = 300, format: str = 'png')ο
Save the plot to a file.
- Parameters:
fig (matplotlib.figure.Figure) β Figure object.
description (str) β Description of the plot.
rebuild (bool) β If True, rebuild the plot even if it already exists.
outputdir (str) β Output directory to save the plot.
dpi (int) β Dots per inch for the plot.
format (str) β Format of the plot (βpngβ or βpdfβ).
- set_data_labels()ο
Set the data labels for the plot.
- set_description()ο
Set the description for the plot.
- set_ref_label()ο
Set the reference label for the plot.
- set_title()ο
Set the title for the plot.