API Reference
This section provides a detailed reference for AQUA’s Application Programming Interface (API).
AQUA core package - provides core functionality
- class aqua.Drop(catalog=None, model=None, exp=None, source=None, var=None, configdir=None, resolution=None, frequency=None, fix=True, startdate=None, enddate=None, outdir=None, tmpdir=None, nproc=1, loglevel=None, region=None, drop=False, overwrite=False, definitive=False, performance_reporting=False, rebuild=False, exclude_incomplete=False, stat='mean', stat_kwargs={}, compact='xarray', engine='fdb', output_format='netcdf', zarr_chunks=None, **kwargs)
Bases:
objectClass to generate DROP outputs at required frequency/resolution
Initialize the DROP class
- Parameters:
catalog (string) – The catalog you want to read. If None, guessed by the reader.
model (string) – The model name from the catalog
exp (string) – The experiment name from the catalog
source (string) – The sourceid name from the catalog
var (str, list) – Variable(s) to be processed and archived.
resolution (string) – The target resolution for the DROP output. If None, no regridding is performed.
frequency (string,opt) – The target frequency for averaging the DROP output, if no frequency is specified, no time average is performed
fix (bool, opt) – True to fix the data, default is True
startdate (string,opt) – Start date for the data to be processed, format YYYYMMDD, default is None
enddate (string,opt) – End date for the data to be processed, format YYYYMMDD, default is None
outdir (string) – Where the DROP output is stored.
tmpdir (string) – Where to store temporary files, default is None. Necessary for dask.distributed
configdir (string) – Configuration directory where the catalog are found
nproc (int, opt) – Number of processors to use. default is 1
loglevel (string, opt) – Logging level
region (dict, opt) – Region to be processed, default is None, meaning ‘global’. Requires ‘name’ (str), ‘lon’ (list) and ‘lat’ (list)
drop (bool, opt) – Drop the missing values in the region selection.
overwrite (bool, opt) – True to overwrite existing files, default is False
definitive (bool, opt) – True to create the output file, False to just explore the reader operations, default is False
performance_reporting (bool, opt) – True to save an html report of the dask usage, default is False. This will run a single month to collect the performance data.
exclude_incomplete (bool,opt) – True to remove incomplete chunk when averaging, default is false.
rebuild (bool, opt) – Rebuild the weights when calling the reader
stat (string, opt) – Statistic to compute. Can be ‘mean’, ‘std’, ‘max’, ‘min’, ‘sum’ or ‘histogram’. Default is ‘mean’.
stat_kwargs (dict, opt) – kwargs to be sent to the statistic function, as ‘bins’ for histogram. Default is empty dict.
compact (string, opt) – Compact the data into yearly files using xarray or cdo. If set to None, no compacting is performed. Default is “xarray”
engine (string, opt) – Engine to be used by the Reader. Default is ‘fdb’.
output_format (string, opt) – Output format: ‘netcdf’, ‘zarr’ or ‘icechunk’. Default is ‘netcdf’. When set to ‘icechunk’, catalog entry generation is skipped.
**kwargs – kwargs to be sent to the Reader, as ‘zoom’ or ‘realization’
- append_history(data)
Append comprehensive processing history to the data attributes
- Parameters:
data – xarray Dataset or DataArray to append history to
- Returns:
Input data with updated history attribute
- Return type:
data
- check_integrity(varname)
To check if the DROP entry is fine before running (delegates to writer)
- create_catalog_entry()
Create an entry in the catalog for DROP
- property dask
Check if dask is needed
- drop_generator()
Generate DROP output
- get_filename(var, year=None, month=None, tmp=False)
Create output filenames (delegates to writer)
- retrieve()
Retrieve data from the catalog
- class aqua.Fixer(fixer_name=None, fixes_dictionary=None, convention=None, metadata=None, loglevel='WARNING')
Bases:
objectFixer module
- Parameters:
fixer_name (str) – The fixer name defined in the fixes dictionary
datamodel (str) – The target datamodel name
fixes_dictionary (dict) – The dictionary of fixes
convention (name) – The convention name
metadata (dict) – The metadata dictionary
loglevel (str) – The log level
- fixer(data, destvar, apply_unit_fix=True)
Perform fixes (var name, units, coord name adjustments) of the input dataset.
- Parameters:
data (xr.Dataset) – the input dataset
destvar (list of str) – the name of the desired variables to be fixed, if None all available variables are fixed
apply_unit_fix (bool) – if to perform immediately unit conversions (which requite a product or an addition). The fixer sets anyway an offset or a multiplicative factor in the data attributes. These can be applied also later with the method apply_unit_fix. (true)
- Returns:
A xarray.Dataset containing the fixed data and target units, factors and offsets in variable attributes.
- get_fixer_varname(var)
Load the fixes and check if the variable requested is there
- Parameters:
var (str or list) – The variable to be checked
- Returns:
A list of variables to be loaded
- class aqua.FldStat(area: Dataset | DataArray | None = None, horizontal_dims: list[str] | None = None, grid_name: str | None = None, loglevel: str = 'WARNING')
Bases:
objectAQUA class for field statitics
Initialize the FldStat.
- Parameters:
area (xr.Dataset, xr.DataArray, optional) – The area to calculate the statistics for.
horizontal_dims (list, optional) – The horizontal dimensions of the data.
grid_name (str, optional) – The name of the grid, used for logging history.
loglevel (str, optional) – The logging level.
- align_area_coordinates(data: Dataset | DataArray, decimals: int = 5)
Check if the coordinates of the area and data are aligned. If they are not aligned, try to flip the coordinates.
- Parameters:
data (xr.DataArray or xr.Dataset) – The input data to align with the area.
decimals (int) – Number of decimals to use for rounding when aligning coordinates.
- Returns:
The area with aligned coordinates.
- Return type:
xr.DataArray or xr.Dataset
- align_area_dimensions(data: Dataset | DataArray)
Align the area dimensions with the data dimensions. If the area and data have different number of horizontal dimensions, try to rename them.
- Parameters:
data (xr.DataArray or xr.Dataset) – The input data to align with the area.
- property available_fldstats
Return available field statistics.
- fldstat(data: DataArray | Dataset, stat: str = 'mean', region: Regions | None = None, region_sel: str | int | list | None = None, mask_kwargs: dict = {}, lon_limits: list | None = None, lat_limits: list | None = None, dims: list | None = None, **kwargs)
Compute a spatial statistic on the input field, optionally area-weighted. The statistic can be computed globally or over a sub-region selected either by longitude/latitude bounds or via a
regionmask.Regionsdefinition. If anareadataset is provided at initialization, supported statistics are performed with area-weighting where applicable.- Parameters:
data (xr.DataArray or xarray.DataDataset) – the input data
stat (str) – the statistic to compute, only supported is “mean”
region (regionmask.Regions, optional) – A regionmask Regions object defining a class regions.
region_sel (str, int or list, optional) – The region(s) to select by name or number from the region object.
mask_kwargs (dict, optional) – Additional keyword arguments passed to region.mask().
lon_limits (list, optional) – the longitude limits of the subset
lat_limits (list, optional) – the latitude limits of the subset
dims (list, optional) – the dimensions to average over, if not provided, horizontal_dims are used
**kwargs – Additional keyword arguments forwarded to
AreaSelection.select_area(); for example: - box_brd (bool, optional): if coordinates are comprised or not in area selection, default is True. - to_180 (bool, optional): if longitude coordinates are converted to [-180, 180] range, default is True.
- Returns:
Result of the requested spatial statistic.
- Return type:
xr.DataArray or xr.Dataset
- integrate_over_area(data: Dataset | DataArray, areacell: DataArray, dims: list)
Compute the integral of the data over the area.
- Parameters:
data (xr.DataArray or xr.Dataset) – The data, used also for masking.
areacell (xr.DataArray) – The area cells.
dims (list) – Dimensions to sum over.
- Returns:
The integral of the data over the area
- Return type:
xr.DataArray or xr.Dataset
- select_area(data: Dataset | DataArray, lon: list | None = None, lat: list | None = None, box_brd: bool = True, drop: bool = False, lat_name: str = 'lat', lon_name: str = 'lon', region: Regions | None = None, region_sel: str | int | list | None = None, mask_kwargs: dict = {}, default_coords: dict | None = None, to_180: bool = True) Dataset | DataArray
Select a specific area from the dataset based on longitude and latitude ranges. Wrapper for AreaSelection.select_area method.
- sum_area(data: Dataset | DataArray, areacell: DataArray, dims: list)
Compute the sum of area cells where masked data is not null.
This is useful for computing field such as sea ice extent, by summing the area of cells that contain data not null.
Note: if data is not masked might return incorrect results. If irrelevant regions (e.g., low level or land in sea-ice data) are not masked beforehand, their area will be incorrectly included in the sum.
- Parameters:
data (xr.DataArray or xr.Dataset) – The data (check if pre-masking is needed in the considered variable)
areacell (xr.DataArray) – The area cells.
dims (list) – Dimensions to sum over.
- Returns:
The sum of area cells
- Return type:
xr.DataArray or xr.Dataset
- class aqua.GridBuilder(outdir: str = '.', model_name: str | None = None, grid_name: str | None = None, original_resolution: str | None = None, vert_coord: str | None = None, force_unstructured: bool = False, loglevel: str = 'warning')
Bases:
objectClass to build automatically grids from data sources. Currently tested with HEALPix grids and can be extended for other grid types.
Initialize the GridBuilder with a reader instance.
- Parameters:
outdir (str) – The output directory for the grid files.
model_name (str, optional) – The name of the model, if different from the model argument.
grid_name (str, optional) – The name of the grid, to specify extra information in the grid file
original_resolution (str, optional) – The original resolution of the grid if using an interpolated source.
vert_coord (str, optional) – The vertical coordinate to consider for the grid build, to override the one detected by the GridInspector.
force_unstructured (bool) – Whether to force the grid detection to use unstructured grid type.
loglevel (str, optional) – The logging level for the logger. Defaults to ‘warning’.
- GRIDTYPE_REGISTRY = {'Curvilinear': <class 'aqua.core.gridbuilder.extragridbuilder.CurvilinearGridBuilder'>, 'GaussianRegular': <class 'aqua.core.gridbuilder.extragridbuilder.GaussianRegularGridBuilder'>, 'HEALPix': <class 'aqua.core.gridbuilder.extragridbuilder.HealpixGridBuilder'>, 'Regular': <class 'aqua.core.gridbuilder.extragridbuilder.RegularGridBuilder'>, 'Unstructured': <class 'aqua.core.gridbuilder.extragridbuilder.UnstructuredGridBuilder'>}
- build(data, rebuild=False, version=None, verify=True, create_yaml=True)
Retrieve and build the grid data for all gridtypes available.
- Parameters:
rebuild (bool) – Whether to rebuild the grid file if it already exists. Defaults to False.
fix (bool) – Whether to fix the original source. Might be useful for some models. Defaults to False.
version (int, optional) – The version number to append to the grid file name. Defaults to None.
verify (bool) – Whether to verify the grid file after creation. Defaults to True.
create_yaml (bool) – Whether to create the grid entry in the grid file. Defaults to True.
- class aqua.Reader(model=None, exp=None, source=None, catalog=None, fix=True, datamodel=None, regrid=None, regrid_method=None, areas=True, streaming=False, startdate=None, enddate=None, rebuild=False, loglevel=None, nproc=4, aggregation=None, chunks=None, preproc=None, convention='eccodes', engine='fdb', **kwargs)
Bases:
objectGeneral reader for climate data.
Initializes the Reader class, which uses the catalog config/config.yaml to identify the required data.
- Parameters:
model (str) – Model ID. Mandatory
exp (str) – Experiment ID. Mandatory.
source (str) – Source ID. Mandatory
catalog (str, optional) – Catalog where to search for the triplet. Default to None will allow for autosearch in the installed catalogs.
datamodel (str, optional) – Data model to apply for coordinate transformations (e.g., ‘aqua’). Defaults to ‘aqua’.
regrid (str, optional) – Perform regridding to grid regrid, as defined in config/regrid.yaml. Defaults to None.
regrid_method (str, optional) – CDO Regridding regridding method. Read from grid configuration. If not specified anywhere, using “ycon”.
fix (bool, optional) – Activate data fixing
areas (bool, optional) – Compute pixel areas if needed. Defaults to True.
streaming (bool, optional) – If to retrieve data in a streaming mode. Defaults to False.
startdate (str, optional) – The starting date for reading/streaming the data (e.g. ‘2020-02-25’). Defaults to None.
enddate (str, optional) – The final date for reading/streaming the data (e.g. ‘2020-03-25’). Defaults to None.
rebuild (bool, optional) – Force rebuilding of area and weight files. Defaults to False.
loglevel (str, optional) – Level of logging according to logging module. Defaults to log_level_default of loglevel().
nproc (int, optional) – Number of processes to use for weights generation. Defaults to 4.
aggregation (str, optional) – the streaming frequency in pandas style (1M, 7D etc. or ‘monthly’, ‘daily’ etc.) Defaults to None (using default from catalog, recommended).
chunks (str or dict, optional) – chunking to be used for data access. Defaults to None (using default from catalog, recommended). If it is a string time chunking is assumed. If it is a dictionary the keys ‘time’ and ‘vertical’ are looked for. Time chunking can be one of S (step), 10M, 15M, 30M, h, 1h, 3h, 6h, D, 5D, W, M, Y. Vertical chunking is expressed as the number of vertical levels to be used.
preproc (function, optional) – a function to be applied to the dataset when retrieved. Defaults to None.
convention (str, optional) – convention to be used for reading data. Defaults to ‘eccodes’. (Only one supported so far)
engine (str, optional) – Engine to be used for GSV retrieval: ‘polytope’ or ‘fdb’. Defaults to ‘fdb’.
- Keyword Arguments:
zoom (int, optional) – HEALPix grid zoom level (e.g. zoom=10 is h1024). Allows for multiple gridname definitions.
realization (int, optional) – The ensemble realization number.
**kwargs – Additional arbitrary keyword arguments to be passed as additional parameters to the intake catalog entry.
- Returns:
A Reader class object.
- Return type:
- detrend(data, dim='time', degree=1, skipna=False)
Remove the trend from an xarray object using polynomial fitting.
- Parameters:
data (DataArray or Dataset) – The input data.
dim (str) – Dimension to apply detrend along. Defaults to ‘time’.
degree (int) – Degree of the polynomial. Defaults to 1.
skipna (bool) – Whether to skip NaNs. Defaults to False.
- Returns:
The detrended data.
- Return type:
DataArray or Dataset
- fldarea(data, **kwargs)
Field area wrapper which is calling the fldstat module.
- fldintg(data, **kwargs)
Field integral wrapper which is calling the fldstat module.
- fldmax(data, **kwargs)
Field max wrapper which is calling the fldstat module.
- fldmean(data, **kwargs)
Field mean wrapper which is calling the fldstat module.
- fldmin(data, **kwargs)
Field min wrapper which is calling the fldstat module.
- fldstat(data, stat, lon_limits=None, lat_limits=None, dims=None, region=None, region_sel=None, mask_kwargs={}, **kwargs)
Field statistic wrapper which is calling the fldstat module from FldStat class. This method is exposing and providing field functions as Reader class methods through the wrapper accessors.
- Parameters:
data (xr.DataArray or xarray.Dataset) – the input data
stat (str) – the statistical function to be applied
lon_limits (list, optional) – the longitude limits of the subset
lat_limits (list, optional) – the latitude limits of the subset
dims (list, optional) – the dimensions to average over
region (regionmask.Regions, optional) – A regionmask Regions object defining a class regions.
region_sel (str, int or list, optional) – The region(s) to select by name or number from the region object.
mask_kwargs (dict, optional) – Additional keyword arguments passed to region.mask().
**kwargs – additional arguments passed to fldstat
- fldstd(data, **kwargs)
Field standard deviation wrapper which is calling the fldstat module.
- fldsum(data, **kwargs)
Field sum wrapper which is calling the fldstat module.
- histogram(data, **kwargs)
Wrapper for the histogram function
- instance = None
- property intake_user_parameters
Lazy loader for intake user parameters to avoid expensive describe() calls.
- reader_esm(esmcat, var)
Read intake-esm entry. Returns a dataset.
- Parameters:
esmcat (intake_esm.core.esm_datastore) – The intake-esm catalog datastore to read from.
var (str or list) – Variable(s) to retrieve. If None, uses the query from catalog metadata.
- Returns:
The dataset retrieved from the intake-esm catalog.
- Return type:
xarray.Dataset
- reader_fdb(esmcat, var, startdate, enddate, dask=False, level=None)
Read fdb data. Returns a dask array.
- Parameters:
esmcat (intake catalog) – the intake catalog to read
var (str, int or list) – the variable(s) to read
startdate (str) – a starting date and time in the format YYYYMMDD:HHTT
enddate (str) – an ending date and time in the format YYYYMMDD:HHTT
dask (bool) – return directly a dask array
level (list, float, int) – level to be read, overriding default in catalog
- Returns:
An xarray.Dataset
- reader_intake(esmcat, var, loadvar, keep='first')
Read regular intake entry. Returns dataset.
- Parameters:
esmcat (intake.catalog.Catalog) – your catalog
var (list or str) – Variable to load
loadvar (list of str) – List of variables to load
keep (str, optional) – which duplicate entry to keep (“first” (default), “last” or None)
- Returns:
Dataset
- regrid(data)
Call the regridder function returning container or iterator
- retrieve(var=None, level=None, startdate=None, enddate=None, history=True, sample=False)
Perform a data retrieve.
- Parameters:
var (str, list) – the variable(s) to retrieve. Defaults to None. If None, all variables are retrieved.
level (list, float, int) – Levels to be read, overriding default in catalog source.
startdate (str) – The starting date for reading/streaming the data (e.g. ‘2020-02-25’). Defaults to None.
enddate (str) – The final date for reading/streaming the data (e.g. ‘2020-03-25’). Defaults to None.
history (bool) – If you want to add to the metadata history information about retrieve. Defaults to True.
sample (bool) – read only one default variable (used only if var is not specified). Defaults to False.
- Returns:
A xarray.Dataset containing the required data.
- select_area(data, lon=None, lat=None, **kwargs)
Select a specific area from the dataset based on longitude and latitude ranges.
- Parameters:
lon (list, optional) – Longitude limits for the area selection.
lat (list, optional) – Latitude limits for the area selection.
**kwargs – Additional keyword arguments to pass to the selection function. (See AreaSelection)
- set_default()
Sets this reader as the default for the accessor.
- timfirst(data, **kwargs)
Time first wrapper which is calling the timstat module.
- timhist(data, **kwargs)
Wrapper for the histogram function, with added timstat functionality. It accepts arguments of timstat to resample in time before computing the histogram.
- timlast(data, **kwargs)
Time last wrapper which is calling the timstat module.
- timmax(data, **kwargs)
Time max wrapper which is calling the timstat module.
- timmean(data, **kwargs)
Time mean wrapper which is calling the timstat module.
- timmin(data, **kwargs)
Time min wrapper which is calling the timstat module.
- timstat(data, stat, freq=None, exclude_incomplete=False, time_bounds=False, center_time=False, **kwargs)
Time statistic wrapper which is calling the timstat module from TimStat class. This method is exposing and providing time functions as Reader class methods through the wrapper accessors.
- Parameters:
data (xr.DataArray or xarray.Dataset) – the input data
stat (str) – the statistical function to be applied
freq (str) – the frequency of the time average
exclude_incomplete (bool) – exclude incomplete time averages
time_bounds (bool) – produce time bounds after averaging
center_time (bool) – center time for averaging
kwargs – additional arguments to be passed to the statistical function
- timstd(data, **kwargs)
Time standard deviation wrapper which is calling the timstat module.
- timsum(data, **kwargs)
Time sum wrapper which is calling the timstat module.
- vertinterp(data, levels=None, vert_coord='plev', units=None, method='linear')
A basic vertical interpolation based on interp function of xarray within AQUA. Given an xarray object, will interpolate the vertical dimension along the vert_coord. If it is a Dataset, only variables with the required vertical coordinate will be interpolated.
- Parameters:
data (DataArray, Dataset) – your dataset
levels (float, or list) – The level you want to interpolate the vertical coordinate
units (str, optional,) – The units of your vertical axis. Default ‘Pa’
vert_coord (str, optional) – The name of the vertical coordinate. Default ‘plev’
method (str, optional) – The type of interpolation method supported by interp()
- Return
A DataArray or a Dataset with the new interpolated vertical dimension
- class aqua.Regridder(cfg_grid_dict: dict = None, src_grid_name: str = None, data: Dataset = None, cdo: str = None, loglevel: str = 'WARNING')
Bases:
objectAQUA Regridder class
The (new) Regridder class. Can be initialized with a data (xr.Dataset/DataArray) or a src_grid_name It provides methods to generate areas and weights, and to regrid a dataset.
- Parameters:
cfg_grid_dict (dict) – The dictionary containing the full AQUA grid configuration.
src_grid_name (str, optional) – The name of the source grid in the AQUA convention.
data (xarray.Dataset, optional) – The dataset to be regridded if src_grid_name is not provided.
cdo (str, optional) – The path to the CDO executable. If None, guess it from the system.
loglevel (str) – The logging level.
- loglevel
The logging level.
- Type:
str
- logger
The logger.
- Type:
logging.Logger
- cfg_grid_dict
The full AQUA grid dictionary.
- Type:
dict
- src_grid_name
The source grid name.
- Type:
str
- handler
The grid dictionary handler.
- Type:
GridDictHandler
- src_grid_dict
The normalized source grid dictionary.
- Type:
dict
- src_horizontal_dims
The source horizontal dimensions.
- Type:
str
- src_mask_dim
The source vertical dimension.
- Type:
str
- tgt_horizontal_dims
The target horizontal dimensions.
- Type:
str
- error
The error message to be used by the Reader.
- Type:
str
- cdo
The CDO path.
- Type:
str
- smmregridder
The SMMregrid regridder object for each vertical coordinate.
- Type:
dict
- src_grid_area
The source grid area.
- Type:
xarray.Dataset
- tgt_grid_area
The target grid area.
- Type:
xarray.Dataset
- masked_attrs
The masked attributes.
- Type:
dict
- masked_vars
The masked variables.
- Type:
list
- extra_dims
The extra dimensions (from cfg_grid_dict) to be sent to smmregrid.
- Type:
dict
- areas(tgt_grid_name=None, rebuild=False, reader_kwargs=None)
Load or generate regridding areas for the source or target grid.
- Parameters:
tgt_grid_name (str, optional) – Name of the target grid. If None, the self.src_grid_name is used.
rebuild (bool, optional) – If True, forces regeneration of the area.
reader_kwargs (dict, optional) – Additional parameters for the reader.
- Returns:
The computed grid area.
- Return type:
xr.Dataset
- static configure_masked_fields(src_grid_dict)
if the grids has the ‘masked’ option, this can be based on generic attribute or alternatively of a series of specific variables using the ‘vars’ key
- Parameters:
source_grid (dict) – Dictionary containing the grid information
- Returns:
Dict with name and proprierty of the attribute to be used for masking masked_vars (list): List of variables to mask
- Return type:
masked_attr (dict)
- initialize(weights)
Initialize the SMMRegridder for each vertical coordinate.
- Parameters:
weights (dict) – The weights dictionary for each vertical coordinate.
Please notice that we cannot use src_grid_path because we might have applied fixer or data model
- regrid(data)
Actual regridding core function. Regrid the dataset or dataarray using common gridtypes Firstly, expand the dimensions of the dataset to include the vertical dimensions if necessary. Then, group variables that share the same dimensions. Finally, apply regridding on the different vertical coordinates, including 2d and 2dm.
- Parameters:
data (xarray.Dataset, xarray.DataArray) – The dataset to be regridded.
- weights(tgt_grid_name, regrid_method=None, nproc=1, rebuild=False, reader_kwargs=None, initialize=True)
Load or generate regridding weights by calling smmregrid
- Parameters:
tgt_grid_name (str) – The destination grid name.
regrid_method (str) – The regrid method.
nproc (int) – The number of processors to use.
rebuild (bool) – If True, rebuild the weights.
reader_kwargs (dict) – The reader kwargs for filename definition, including info on model, exp, source, etc.
initialize (bool) – If True, initialize the regridders with the loaded weights.
- Returns:
The weights dictionary for each vertical coordinate.
- Return type:
dict
- class aqua.Streaming(aggregation='S', startdate=None, enddate=None, loglevel=None)
Bases:
objectStreaming class to be used in Reader and elsewhere
The Streaming constructor. The streamer is used to stream data by either a specific time interval or by a specific number of samples. If the unit parameter is specified, the data is streamed by the specified unit and stream_step (e.g. 1 month). If the unit parameter is not specified, the data is streamed by stream_step steps of the original time resolution of input data.
If the stream function is called a second time, it will return the subsequent chunk of data in the sequence. The function keeps track of the state of the streaming process through the use of an internal counter. This allows the user to stream through the entire dataset in multiple calls to the function, retrieving consecutive chunks of data each time.
If startdate is not specified, the method will use the first date in the dataset.
- Parameters:
startdate (str) – the starting date for streaming the data (e.g. ‘2020-02-25’) (None)
enddate (str) – the ending date for streaming the data (e.g. ‘2021-01-01’) (None)
aggregation (str) – the streaming frequency in pandas style (1M, 7D etc.)
loglevel (string) – Level of logging according to logging module (default: log_level_default of loglevel())
- Returns:
A Streaming class object.
- reset()
Reset the state of the streaming process. This means that if the stream function is called again after calling reset_stream, it will start streaming the input data from the beginning.
- stream(data, startdate=None, enddate=None, aggregation=None, timechunks=None, reset=False)
Stream a chunk of a dataset using startdate, enddate and aggregation defined by the constructor.
- Parameters:
data (xr.Dataset) – the input xarray.Dataset
startdate (str) – the starting date for streaming the data (e.g. ‘2020-02-25’) (None)
enddate (str) – the ending date for streaming the data (e.g. ‘2021-01-01’) (None)
aggregation (str) – the streaming frequency in pandas style (1M, 7D etc.)
timechunks (DataArrayResample, optional) – a precomputed chunked time axis
reset (bool, optional) – reset the streaming
- Returns:
A xarray.Dataset containing the subset of the input data that has been streamed.
- stream_chunk(data, startdate=None, enddate=None, aggregation=None)
Compute chunks for a dataset using startdate, enddate and aggregation defined by the constructor.
- Parameters:
data (xr.Dataset) – the input xarray.Dataset
startdate (str) – the starting date for streaming the data (e.g. ‘2020-02-25’) (None)
enddate (str) – the ending date for streaming the data (e.g. ‘2021-01-01’) (None)
aggregation (str) – the streaming frequency in pandas style (1M, 7D etc.)
- Returns:
A DataArrayResample object for the time axis
- aqua.histogram(data: DataArray, range: tuple = None, bins: int = 10, units: str = None, weighted: bool = True, weights: DataArray | None = None, loglevel: str = 'WARNING', dask: bool = True, check: bool = False, density: bool = False)
Function to calculate a histogram of a DataArray.
- Parameters:
data (xarray.Dataset) – The input DataArray. If it is a Dataset, the first variable is used.
range (tuple, optional) – The lower and upper range of the bins. Defaults to None.
bins (int, optional) – The number of bins for the histogram. Defaults to 10.
weighted (bool, optional) – Use latitudinal weights for the histogram. Defaults to True.
weights (xr.DataArray, optional) – Weights for the histogram. Defaults to None.
dask (bool, optional) – If True, uses Dask for parallel computation. Defaults to True.
units (str, optional) – Convert data to these units. Defaults to None.
check (bool, optional) – Checks if the sum of counts in the histogram is equal to the size of the data. Defaults to False. This forces the histogram to be computed.
density (bool, optional) – Returns a probability density function, normalized such that the integral over the range is 1. Defaults to False.
loglevel (str, optional) – Logging level. Defaults to ‘WARNING’.
- Raises:
TypeError – If the input data is not an xarray DataArray.
ValueError – If no range is provided or if the DataArray does not have a ‘lat’ coordinate when weighted is True.
- Returns:
The histogram of the input data.
- Return type:
xarray.DataArray
- aqua.plot_hovmoller(data: DataArray, invert_axis=False, invert_time=False, invert_space_coord=False, sym=False, style=None, contour=True, dim='lon', figsize=(8, 13), vmin=None, vmax=None, cmap='PuOr_r', title=None, box_text=True, cbar: bool = True, text: list | str = None, nlevels=8, cbar_label=None, cbar_orientation: str = 'horizontal', return_fig=False, fig: Figure = None, ax: Axes = None, ax_pos: tuple = (1, 1, 1), loglevel: str = 'WARNING')
Plot a hovmoller diagram for a given xarray DataArray.
- Parameters:
data (xr.DataArray) – The data to be plotted.
invert_axis (bool) – If True, the x-axis will be inverted.
invert_time (bool) – If True, the direction time axis will be inverted.
invert_space_coord (bool) – If True, the space coordinate axis will be inverted.
sym (bool) – If True, the color limits will be symmetric around zero.
style (str) – The style to be used for the plot. Default is None, which uses the default AQUA style.
contour (bool) – If True, contours will be plotted. If False, pcolormesh will be used.
dim (str) – The dimension to be averaged over. Default is ‘lon’.
figsize (tuple) – Size of the figure. Default is (8, 13).
vmin (float) – Minimum value for the color limits. If None, it will be evaluated from the data.
vmax (float) – Maximum value for the color limits. If None, it will be evaluated from the data.
cmap (str) – Colormap to be used for the plot. Default is ‘PuOr_r’.
title (str) – Title for the plot. If None, no title will be set.
box_text (bool) – If True, a box with the min and max values of the dimension will be added to the plot.
cbar (bool) – If True, a colorbar will be added to the plot.
text (list | str) – Text to be added to the plot. If None, no text will be added.
nlevels (int) – Number of contour levels. Default is 8.
cbar_label (str) – Label for the colorbar. If None, it will be generated from the data attributes.
cbar_orientation (str) – Orientation of the colorbar. Default is ‘horizontal’.
return_fig (bool) – If True, the figure and axes will be returned. Default is False.
fig (plt.Figure) – Matplotlib figure object to plot on. If None, a new figure will be created.
ax (plt.Axes) – Matplotlib axes object to plot on. If None, a new axes will be created.
ax_pos (tuple) – Position of the axes in the figure. Default is (1, 1, 1), which means a single subplot.
loglevel (str) – Logging level. Default is “WARNING”.
- Returns:
The matplotlib figure object containing the hovmoller plot. plt.Axes: The matplotlib axes object containing the hovmoller plot.
- Return type:
plt.Figure
- aqua.plot_lat_lon_profiles(data: DataArray | list[DataArray], ref_data: DataArray | None = None, ref_std_data: DataArray | None = None, data_labels: list | None = None, ref_label: str | None = None, style: str | None = None, fig: Figure | None = None, ax: Axes | None = None, figsize: tuple = (10, 5), title: str | None = None, loglevel: str = 'WARNING')
Plot latitude or longitude profiles of data, averaging over the specified axis.
- Parameters:
data (xr.DataArray | list[xr.DataArray] | None) – Data to plot. Must be xarray DataArrays with ‘lat’, ‘lon’, ‘latitude’, or ‘longitude’ dimensions. Can be a single DataArray or a list of DataArrays.
ref_data (xr.DataArray, optional) – Reference data to plot.
ref_std_data (xr.DataArray | None, optional) – Standard deviation of the reference data.
data_labels (list | None, optional) – Labels for the data.
ref_label (str | None, optional) – Label for the reference data.
style (str | None, optional) – Style for the plot.
fig (plt.Figure | None, optional) – Matplotlib figure object.
ax (plt.Axes | None, optional) – Matplotlib axes object.
figsize (tuple, optional) – Figure size if a new figure is created.
title (str | None, optional) – Title for the plot.
loglevel (str, optional) – Logging level.
- Returns:
Matplotlib figure and axes objects.
- Return type:
tuple
- aqua.plot_maps(maps: list, contour: bool = True, sym: bool = False, proj: Projection = <Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich, extent: list = None, style=None, figsize: tuple = None, vmin: float = None, vmax: float = None, nlevels: int = 11, title: str = None, title_size: int = 16, titles: list = None, titles_size: int = None, cmap='RdBu_r', cbar_label: str = None, transform_first=False, cyclic_lon=True, return_fig=False, loglevel='WARNING', **kwargs)
Plot multiple maps. This is supposed to be used for maps to be compared together. A list of xarray.DataArray objects is expected and a map is plotted for each of them
- Parameters:
maps (list) – list of xarray.DataArray objects
contour (bool,opt) – If True, plot a contour map, otherwise a pcolormesh. Defaults to True.
sym (bool,opt) – symetric colorbar, default is False
proj (cartopy.crs.Projection,opt) – projection, default is ccrs.Robinson()
extent (list,opt) – extent of the map, default is None
style (str,opt) – style for the plot, default is the AQUA style
figsize (tuple,opt) – figure size, default is (6,6) for each map. Here the full figure size is set.
vmin (float,opt) – minimum value for the colorbar, default is None
vmax (float,opt) – maximum value for the colorbar, default is None
nlevels (int,opt) – number of levels for the colorbar, default is 11
title (str,opt) – super title for the figure
title_size (int,opt) – size of the super title, default is 16
titles (list,opt) – list of titles for the maps
titles_size (int,opt) – size of the titles, default is None
cmap (str,opt) – colormap, default is ‘RdBu_r’
cbar_label (str,opt) – colorbar label
transform_first (bool, optional) – If True, transform the data before plotting. Defaults to False.
cyclic_lon (bool,opt) – add cyclic longitude, default is True
return_fig (bool,opt) – return the figure, default is False
loglevel (str,opt) – log level, default is ‘WARNING’
**kwargs – Keyword arguments for plot_single_map
- Raises:
ValueError – if nothing to plot, i.e. maps is None or not a list of xarray.DataArray
- Returns:
fig if more manipulations on the figure are needed, if return_fig=True
- aqua.plot_seasonal_lat_lon_profiles(seasonal_data, ref_data=None, ref_std_data=None, style: str = None, loglevel='WARNING', data_labels: list = None, title: str = None, ref_label: str = None)
Plot seasonal lat-lon profiles in a 2x2 subplot layout for the four meteorological seasons.
This function creates exactly 4 subplots arranged in a 2x2 grid, each showing lat-lon profiles for a specific season. The seasons are hardcoded and must be provided in the exact order: [DJF, MAM, JJA, SON].
- Parameters:
seasonal_data (list) –
List of exactly 4 elements, one for each season. Must be in order: [DJF, MAM, JJA, SON]. Each element can be either: - A single xarray DataArray (for single model) - A list of xarray DataArrays (for multiple models)
Examples: Single model: [djf_data, mam_data, jja_data, son_data] Multiple models: [[model1_djf, model2_djf], [model1_mam, model2_mam], …]
DJF = December-January-February (Winter) MAM = March-April-May (Spring) JJA = June-July-August (Summer) SON = September-October-November (Autumn)
ref_data (list, optional) – Reference data for each season, same structure as seasonal_data.
ref_std_data (list, optional) – Reference standard deviation data for each season.
style (str, optional) – Style configuration for the plot.
loglevel (str) – Logging level.
data_labels (list, optional) – List of data_labels for each subplot. If provided, must have 4 elements.
title (str, optional) – Overall title for the 2x2 subplot figure.
ref_label (str, optional) – Label for the reference data in the legend.
- Returns:
Matplotlib figure and axes objects (2x2 subplot layout).
- Return type:
fig, axs
- Raises:
ValueError – If seasonal_data is not a list of exactly 4 elements.
- aqua.plot_single_map(data: DataArray, contour: bool = True, sym: bool = False, proj: Projection = <Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich, gridlines: bool = False, extent: list | None = None, coastlines: bool = True, style: str | None = None, figsize: tuple = (11, 8.5), nlevels: int = 11, vmin: float | None = None, vmax: float | None = None, cmap: str = 'RdBu_r', cbar: bool = True, cbar_label: str | None = None, norm: object | None = None, title: str | None = None, title_size: int | None = 12, transform_first: bool = False, cyclic_lon: bool = True, add_land: bool = False, fig: Figure | None = None, ax: Axes | None = None, ax_pos: tuple = (1, 1, 1), return_fig: bool = False, loglevel='WARNING', **kwargs)
Plot contour or pcolormesh map of a single variable. By default the contour map is plotted.
- Parameters:
data (xr.DataArray) – Data to plot.
contour (bool, optional) – If True, plot a contour map, otherwise a pcolormesh. Defaults to True.
sym (bool, optional) – If True, set the colorbar to be symmetrical. Defaults to False.
proj (cartopy.crs.Projection, optional) – Projection to use. Defaults to ccrs.Robinson().
gridlines (bool, optional) – If True, add gridlines. Defaults to False
extent (list, optional) – Extent of the map to limit the projection. Defaults to None.
coastlines (bool, optional) – If True, add coastlines. Defaults to True.
style (str, optional) – Style to use. Defaults to None (aqua style).
figsize (tuple, optional) – Figure size. Defaults to (11, 8.5).
nlevels (int, optional) – Number of levels for the contour map. Defaults to 11.
vmin (float, optional) – Minimum value for the colorbar. Defaults to None.
vmax (float, optional) – Maximum value for the colorbar. Defaults to None.
cmap (str, optional) – Colormap. Defaults to ‘RdBu_r’.
norm (matplotlib.colors.Normalize, optional) – Normalization to use for the colormap.
cbar (bool, optional) – If True, add a colorbar. Defaults to True.
cbar_label (str, optional) – Colorbar label. Defaults to None.
title (str, optional) – Title of the figure. Defaults to None.
title_size (int, optional) – Title size. Defaults to None.
transform_first (bool, optional) – If True, transform the data before plotting. Defaults to False.
cyclic_lon (bool, optional) – If True, add cyclic longitude. Defaults to True.
add_land (bool, optional) – If True, add land to the map. Defaults to False.
fig (plt.Figure, optional) – Figure to plot on. By default a new figure is created.
ax (plt.Axes, optional) – Axes to plot on. By default a new axes is created.
ax_pos (list, optional) – Axes position. Used if the axes has to be created. Defaults to (1, 1, 1).
return_fig (bool, optional) – If True, return the figure and axes. Defaults to False.
loglevel (str, optional) – Log level. Defaults to ‘WARNING’.
- Keyword Arguments:
nxticks (int, optional) – Number of x ticks. Defaults to 7.
nyticks (int, optional) – Number of y ticks. Defaults to 7.
ticks_rounding (int, optional) – Number of digits to round the ticks. Defaults to 0 for full map, 1 if min-max < 10, 2 if min-max < 1.
cbar_ticks_rounding (int, optional) – Number of digits to round the colorbar ticks. Default is no rounding.
- Returns:
Figure and axes.
- Return type:
tuple
- aqua.plot_single_map_diff(data: DataArray, data_ref: DataArray, proj: Projection = <Projected CRS: +proj=robin +a=6378137.0 +lon_0=0 +no_defs +type=c ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: unknown - method: Robinson Datum: unknown - Ellipsoid: unknown - Prime Meridian: Greenwich, extent: list | None = None, vmin_fill: float | None = None, vmax_fill: float | None = None, vmin_contour: float | None = None, vmax_contour: float | None = None, norm=None, sym_contour: bool = False, sym: bool = True, add_contour: bool = True, add_land=False, line_levels: int | None = 10, cyclic_lon: bool = True, return_fig: bool = False, fig: Figure | None = None, ax: Axes | None = None, title: str | None = None, title_size: int | None = 12, gridlines: bool = False, loglevel: str = 'WARNING', **kwargs)
Plot the difference of data-data_ref as map and add the data as a contour plot.
- Parameters:
data (xr.DataArray) – Data to plot.
data_ref (xr.DataArray) – Reference data to plot the difference.
proj (cartopy.crs.Projection, optional) – Projection to use. Defaults to PlateCarree.
extent (list, optional) – Extent of the map to limit the projection. Defaults to None.
vmin_fill (float, optional) – Minimum value for the colorbar of the fill.
vmax_fill (float, optional) – Maximum value for the colorbar of the fill.
vmin_contour (float, optional) – Minimum value for the colorbar of the contour.
vmax_contour (float, optional) – Maximum value for the colorbar of the contour.
norm (matplotlib.colors.Normalize, optional) – Normalization to use for the colormap.
sym_contour (bool, optional)
sym (bool, optional) – If True, set the colorbar for the diff to be symmetrical. Default to True
add_contour (bool, optional) – If True, add the contour plot. Defaults to True.
add_land (bool, optional) – If True, add land to the map. Defaults to False.
line_levels (int, optional) – Number of contour levels. Defaults to 10.
cyclic_lon (bool, optional) – If True, add cyclic longitude. Defaults to True.
return_fig (bool, optional) – If True, return the figure and axes. Defaults to False.
fig (plt.Figure, optional) – Figure to plot on. By default a new figure is created.
ax (plt.Axes, optional) – Axes to plot on. By default a new axes is created.
title (str, optional) – Title of the figure. Defaults to None.
title_size (int, optional) – Title size. Defaults to 12.
gridlines (bool, optional) – If True, add gridlines. Defaults to False.
loglevel (str, optional) – Log level. Defaults to ‘WARNING’.
**kwargs – Keyword arguments for plot_single_map. Check the docstring of plot_single_map.
- Keyword Arguments:
contour (bool, optional) – Plot the difference as contour. False to plot a pcolormesh
coastlines (bool, optional) – If True, add coastlines. Defaults to True.
- Raises:
ValueError – If data or data_ref is not a DataArray.
- aqua.plot_timeseries(monthly_data: list[DataArray] | DataArray = None, annual_data: list[DataArray] | DataArray = None, ref_monthly_data: DataArray | None = None, ref_annual_data: DataArray | None = None, std_monthly_data: DataArray | None = None, std_annual_data: DataArray | None = None, ens_monthly_data: DataArray | None = None, ens_annual_data: DataArray | None = None, std_ens_monthly_data: DataArray | None = None, std_ens_annual_data: DataArray | None = None, data_labels: list | None = None, suffix: bool | None = False, ref_label: str | None = None, ens_label: str | None = None, legend: bool | None = True, style: str | None = None, fig: Figure | None = None, ax: Axes | None = None, figsize: tuple = (10, 5), title: str | None = None, colors: list | None = None, loglevel: str = 'WARNING')
monthly_data and annual_data are list of xr.DataArray that are plot as timeseries together with their reference data and standard deviation.
- Parameters:
monthly_data (list of xr.DataArray) – monthly data to plot
annual_data (list of xr.DataArray) – annual data to plot
ref_monthly_data (xr.DataArray) – reference monthly data to plot
ref_annual_data (xr.DataArray) – reference annual data to plot
std_monthly_data (xr.DataArray) – standard deviation of the reference monthly data
std_annual_data (xr.DataArray) – standard deviation of the reference annual data
ens_monthly_data (xr.DataArray) – ensemble monthly data to plot
ens_annual_data (xr.DataArray) – ensemble annual data to plot
std_ens_monthly_data (xr.DataArray) – standard deviation of the ensemble monthly data
std_ens_annual_data (xr.DataArray) – standard deviation of the ensemble annual data
data_labels (list of str) – labels for the data
suffix (bool) – whether to add a suffix to the label based on the kind (monthly or annual). Default is False. If False, only one label is used for both monthly and annual data.
ref_label (str) – label for the reference data
ens_label (str) – label for the ensemble data
legend (bool) – whether to show the legend. Default is True.
style (str) – style to use for the plot. By default the schema specified in the configuration file is used.
fig (plt.Figure) – figure object to plot on
ax (plt.Axes) – axis object to plot on
figsize (tuple) – size of the figure
title (str) – title of the plot
colors (list of str) – colors to use for the plot lines
loglevel (str) – logging level
- Returns:
tuple containing the figure and axis objects
- Return type:
fig, ax (tuple)
- aqua.show_catalog_content(catalog=None, model=None, exp=None, source=None, configdir=None, catalog_name=None, loglevel='WARNING', verbose=True, show_descriptions=False)
Display the catalog content structure (model/exp/source) without requiring manual ConfigPath instantiation.
This is a convenience wrapper around ConfigPath.show_catalog_content() that handles the ConfigPath initialization internally.
- Parameters:
catalog (str | list | None) – Specific catalog(s) to scan. If None, loops over all available catalogs.
model (str | None) – Optional model filter. If provided, only shows entries for this model.
exp (str | None) – Optional experiment filter. If provided, only shows entries for this exp.
source (str | None) – Optional source filter. If provided, only shows entries for this source.
configdir (str, optional) – The directory containing the configuration files. If not provided, ConfigPath will determine it automatically.
catalog_name (str, optional) – Override the catalog name. If not provided, uses the default catalog.
loglevel (str, optional) – Logging level. Defaults to ‘WARNING’.
verbose (bool) – If True, prints the formatted catalog structure. Defaults to True.
show_descriptions (bool) – If True, also print per-source descriptions.
- Returns:
- Dictionary with catalog names as keys and nested dict structure
as values.
- Return type:
dict