adios_db.data_sources.env_canada.v1 package
This version of the Env. Canada data source importing modules is designed to import the file:
April 2020-Physiochemical_properties_of_petroleum_products. EN.xlsm
Submodules
adios_db.data_sources.env_canada.v1.mapper module
- class adios_db.data_sources.env_canada.v1.mapper.EnvCanadaRecordMapper(record)
Bases:
MapperBase
A translation/conversion layer for the Environment Canada imported record object. This is intended to be used interchangeably with either an Environment Canada record or record parser object. Its purpose is to generate named attributes that are suitable for creation of a NOAA Oil Database record.
- property metadata
- property oil_id
- property oil_labels
- py_json()
- resolve_oil_api(record)
- property sub_samples
- class adios_db.data_sources.env_canada.v1.mapper.EnvCanadaSampleMapper(parser, sample_id, ests_code)
Bases:
MapperBase
- property CCME
- property ESTS_hydrocarbon_fractions
- property SARA
- Note: Each measurement appears to be associated with a method.
However the Sara class only supports a single method as a first order attribute.
- property bulk_composition
Gather up all the groups of compounds that comprise a ‘bulk’ amount and compile them into an organized list.
Data points that are classified in bulk composition: - wax - water - Sulfur - GC-TPH - GC-TSH - GC-TAH - Hydrocarbon Content Ratio
- property compounds
Gather up all the groups of compounds scattered throughout the EC and compile them into an organized list.
Compounds apply to: - individual chemicals - mixed isomers
Compounds do not apply to: - waxes - SARA - Sulfur - Carbon
- Note: Although we could in theory assign multiple groups to a
particular compound, we will only assign one group to the list. This group will have a close relationship to the category of compounds where it is found in the EC datasheet.
- Note: Most of the compound groups don’t have replicates or
standard deviation. We will not add these attributes if they aren’t found within the attribute group.
- compounds_in_group(category, group_category, unit, unit_type, filter_compounds=True)
- Parameters:
category – The category attribute containing the data
group_category – The category attribute containing the group label
unit – The unit.
unit_type – The type of thing that the unit measures (length, mass, etc.)
filter – Filter only those attributes that have a suffix matching the unit value.
Example of content:
{ 'name': '1-Methyl-2-Isopropylbenzene', 'method': 'ESTS 2002b', 'groups': ['C4-C6 Alkyl Benzenes', ...], 'measurement': { value: 3.4, unit: 'ppm', unit_type: 'massfraction', replicates: 3, standard_deviation: 0.1 } }
- property densities
- dict()
- property dispersibilities
- property distillation_data
- property dynamic_viscosities
- property emulsions
- property environmental_behavior
- property flash_point
- generate_sample_id_attrs(sample_id, ests_code)
- Parameters:
sample_id – The value we will use to internally identify a sample. This will be mapped to metadata.name
ests_code – This is the identifier that Env Canada uses for oil samples. This will be mapped to metadata.sample_id
- property interfacial_tension_air
- property interfacial_tension_seawater
- property interfacial_tension_water
- property interfacial_tensions
- property physical_properties
- property pour_point
- prepend_ests(method)
- transpose_dict_of_lists(obj)
A common step with the parsed data is to convert a dict containing an orthoganal set of list values into a list of dicts, each dict having an indexed slice of the list values
adios_db.data_sources.env_canada.v1.parser module
- class adios_db.data_sources.env_canada.v1.parser.EnvCanadaRecordParser(values, conditions, file_props)
Bases:
ParserBase
A record class for the Environment Canada oil spreadsheet. This is intended to be used with a set of data representing a single record from the spreadsheet.
We manage a hierarcical structure of properties extracted from the Excel columns for an oil. Basically this will be a dictionary of raw property category names, where each property category itself will contain a dictionary of raw individual properties.
The data associated with any individual property will be a list of values corresponding to the weathered subsamples that exist for the oil record.
- property API
- property comments
- dict()
- property ests_codes
- get_label(nav_list)
For an attribute in our values hierarchy, get the original source label information.
- Ex:
parser_obj.get_label((‘gc_total_aromatic_hydrocarbon.tah’))
- property location
- property metadata
- property name
For now we will just concatenate all the names we see in the list. In the future, we will want to be a bit smarter.
- property oil_id
- property product_type
- property reference
It has been decided that we will at this time use a hard-coded reference for all records coming from the Env. Canada datasheet.
- property sample_date
- property source_id
We will use the ESTS codes in the record as the identifier.
ESTS codes are a series of numbers separated by a period ‘.’. The first number in the series seems to identify the species of the petroleum substance, and the rest identify a degree of weathering. So we will use just the first one.
- property sub_samples
- vertical_slice(index)
All values in our self.values structure will be a list that conforms to the weathering subsamples for a record Recursively navigate values structure
- property weathering
- class adios_db.data_sources.env_canada.v1.parser.EnvCanadaSampleParser(values, conditions, labels)
Bases:
ParserBase
A sample class for the Environment Canada oil spreadsheet. This is intended to be used with a set of data representing a single subsample inside an oil record.
We manage a hierarcical structure of properties similar to that of the record parser
The data associated with any individual property will be a single scalar value corresponding to a weathered subsample that exists for an oil record.
- property adhesion
- property api
- attr_map = {'benzene': 'benzene_alkylated_benzene', 'boiling_point_cumulative_fraction': 'boiling_point_cumulative_weight_fraction', 'boiling_point_distribution': 'boiling_point_distribution_temperature', 'ccme': 'ccme_fractions', 'emulsion_complex_modulus': 'complex_modulus', 'emulsion_complex_viscosity': 'complex_viscosity', 'emulsion_loss_modulus': 'loss_modulus', 'emulsion_storage_modulus': 'storage_modulus', 'emulsion_tan_delta_v_e': 'tan_delta_v_e', 'emulsion_visual_stability': 'visual_stability', 'emulsion_water_content': 'water_content', 'sara_total_fractions': 'hydrocarbon_group_content'}
- property chromatography
The Evironment Canada data sheet contains data for gas chromatography analysis, which we will try to capture.
We have four property groups in this case, which we will merge.
GC-TPH
GC-TSH
GC-TAH
Hydrocarbon Content Ratio
Dimensional parameters are (weathering).
Values Units are split between mg/g and percent.
- deep_get(attr_path, default=None)
- property densities
There is now a single category, density. Attributes within the category conform to an expected sequential block consisting of: - Density - Standard Deviation - Replicates - Method
There will be 3 such blocks in the category
- dict()
- property dvis
There is now a single viscosity category.
- Note: Sometimes there is a greater than (‘>’) indication for a
viscosity value. In this case, we parse the float value as an interval with the operator indicating whether it is a min or a max.
- property emulsions
The emulsions struct is more complicated in that it is a mixed bag of different measurements, each with their own units, temperatures, and age. So we just pass the conditions as a separate attribute.
- property ests_evaporation_test
- get_conditions(name)
The conditions object is indexed in the same way as the values.
- get_label(nav_list)
For an attribute in our values hierarchy, get the original source label information.
- Ex:
parser_obj.get_label((‘gc_total_aromatic_hydrocarbon.tah’))
- property ifts
Now the only tricky bit is to merge the surface/interfacial tension attributes.
- prepend_ests(method)
adios_db.data_sources.env_canada.v1.reader module
- class adios_db.data_sources.env_canada.v1.reader.EnvCanadaOilExcelFile(name)
Bases:
object
A specialized file reader for the Environment Canada oil spreadsheet.
This is an Excel spreadsheet with an .xlsx extension. We can use the third party openpyxl package to reach the content.
The first column in the file contains the names of oil property categories.
The second column in the file contains the names of specific oil properties.
The rest of the columns in the file contain oil property values.
- property conditions
The April 2020 update of the Environment Canada datasheet contained a few extra columns that contained data concerning the testing conditions for the measurements.
This information is indexed in the same way as the field data, but we only need to create it one time upon opening the file.
- get_record(name)
A ‘record’ coming out of our reader is a dict of dicts representing the data for a single oil.
The top level keys are the raw category names as seen in the first column of the spreadsheet
The second level keys are the raw field names that are contained within the category, as seen in the second column of the spreadsheet
Each value in the field dict is a list representing a horizontal slice of the columns that comprise the record
- get_records()
Iterate through all the oils, returning all the properties of each one.