gridded API
The gridded
API is built around a few core object types. In the general case, model results are delivered in a file or collection of files that contain a definition of the grid and data (or variables) that are associated with that grid. Depending on use-case, it can make sense to work with the entire collection of data and grid, or to work with just a couple of particular variables. The gridded
object model is designed to allow either use case.
gridded.Dataset
The gridded.gridded.Dataset
object represents the entire gridded dataset. It more-or-less maps to what is usually found in a CF-compliant netcdf file output from a model run.
A gridded.gridded.Dataset
contains a few core pieces of information:
Attributes of the whole dataset
A Grid object – this is a duck-typed grid object that could be any of the supported grid types.
A dict of
gridded.variable.Variable
objects – names as the keyAssorted utilities for accessing, saving, and loading the data.
Grid objects
Grid objects represent the grid itself, and contain functionality to work with the grid that requires specific knowledge of grid types, such as interpolation searching, etc.
The gridded.grids.GridBase
object provides shared functionality that all Grids require. It does not (but probably should) specify the full Grid API.
Variable objects
gridded.variable.Variable
contains the data associated with a particular quantity in the Dataset
. It maps more-or-less to a netcdf Variable, with a name, attributes, and an array that holds the actual data. It also holds a reference to the Grid, Time, and Depth objects for the Grid that it is defined on, so that it can use those object to perform interpolation, etc.
The idea is that a Variable represents a continuous field of some property in 4-D space (lat, lon, depth, time).
Time object
gridded.time.Time
object represent the time axis of the dataset, providing interpolation in time for Variables.
Depth object
gridded.depth.Depth
objects represent the depth axis of the data set, it provides an abstraction around various vertical coordinate schemes, and provides interpolation in depth for Variables.
Design Principles
Grid Independence
The primary goal of gridded
is for an end-user to be able to do data analysis and visualization without needing to understand the intricacies of the grid structure, and ideally to not have to even know what grid a particular dataset or variable is using.
For example, one should be able to plot a time series of a parameter, like sea surface temperature, with exactly the same code regardless of the underlying grid:
ds = gridded.Dataset(path_to_file)
sst = ds.get_variables_by_attributes(standard_name='sea_surface_temperature')
time_series = sst.at(lat, lon, times)
The user can now plot the time series for a given location and times, without having to know anything about the grid structure.
Duck Typing
The Grid objects (Grid, Time, Depth) are for the most part “duck typed”, rather than strict subclassing. Though there are base classes that provide shared functionality.
We are trying to be clear about the “public” vs “private” API by using leading underscores for methods and attributes not intended for external use.
Lazy loading / data arrays
Many of the datasets users need to work with can be quite large. As a result it is impractical to load entire datasets into memory at once. gridded
for the most part shifts the burden of handling lazy loading to external libraries, and does this by keeping data stored in a “numpy array-like” objects. Users can use pure numpy arrays, or any object that “acts” like a numpy array. This should allow gridded
to work with netcdf variables, hdf5 arrays, dask arrays, etc.
In practice, there is no clear definition of “array-like”, so gridded
has defined its own definition, based on features we know we need. But it is assumed that nd indexing behaves that same as numpy arrays – as there is no way to easily confirm that. This is a goal, but in fact, only numpy arrays and netCDF4 Variables
have been implimented and tested. In the future, we may use xarray
as a single abstration layer, rather than rolling our own.
To support that, we support converting and testing for array-like with:
gridded.utils.asarraylike()
and
gridded.utils.isarraylike()
Those utilities will be updated as new needs arrise.
Reference
- gridded package
- Subpackages
- Submodules
- gridded.depth module
- gridded.gridded module
- gridded.grids module
- gridded.time module
- gridded.utilities module
asarraylike()
convert_mask_to_numpy_mask()
convert_numpy_datetime64_to_datetime()
gen_celltree_mask_from_center_mask()
get_dataset()
get_dataset_attrs()
get_writable_dataset()
isarraylike()
isstring()
merge_var_search_dicts()
regrid_variable()
search_dataset_for_any_long_name()
search_dataset_for_variables_by_longname()
search_dataset_for_variables_by_varname()
varnames_merge()
- gridded.variable module
Variable
Variable.at()
Variable.center_values()
Variable.cf_names
Variable.constant()
Variable.data
Variable.data_shape
Variable.default_names
Variable.dimension_ordering
Variable.from_netCDF()
Variable.grid_shape
Variable.info
Variable.interpolate()
Variable.is_data_on_nodes
Variable.location
Variable.time
Variable.units
VectorVariable
VectorVariable.at()
VectorVariable.cf_names
VectorVariable.comp_order
VectorVariable.data_shape
VectorVariable.default_names
VectorVariable.from_netCDF()
VectorVariable.is_data_on_nodes
VectorVariable.location
VectorVariable.locations
VectorVariable.save()
VectorVariable.time
VectorVariable.units
VectorVariable.varnames
- Module contents
Dataset
Grid
Grid_R
Grid_S
Grid_U
Time
Variable
Variable.at()
Variable.center_values()
Variable.cf_names
Variable.constant()
Variable.data
Variable.data_shape
Variable.default_names
Variable.dimension_ordering
Variable.from_netCDF()
Variable.grid_shape
Variable.info
Variable.interpolate()
Variable.is_data_on_nodes
Variable.location
Variable.time
Variable.units
VectorVariable
VectorVariable.at()
VectorVariable.cf_names
VectorVariable.comp_order
VectorVariable.data_shape
VectorVariable.default_names
VectorVariable.from_netCDF()
VectorVariable.is_data_on_nodes
VectorVariable.location
VectorVariable.locations
VectorVariable.save()
VectorVariable.time
VectorVariable.units
VectorVariable.varnames