adios_db.data_sources.env_canada.v1 package

This version of the Env. Canada data source importing modules is designed to import the file:

April 2020-Physiochemical_properties_of_petroleum_products. EN.xlsm

Submodules

adios_db.data_sources.env_canada.v1.mapper module

class adios_db.data_sources.env_canada.v1.mapper.EnvCanadaRecordMapper(record)

Bases: MapperBase

A translation/conversion layer for the Environment Canada imported record object. This is intended to be used interchangeably with either an Environment Canada record or record parser object. Its purpose is to generate named attributes that are suitable for creation of a NOAA Oil Database record.

property metadata
property oil_id
property oil_labels
py_json()
resolve_oil_api(record)
property sub_samples
class adios_db.data_sources.env_canada.v1.mapper.EnvCanadaSampleMapper(parser, sample_id, ests_code)

Bases: MapperBase

property CCME
property ESTS_hydrocarbon_fractions
property SARA
Note: Each measurement appears to be associated with a method.

However the Sara class only supports a single method as a first order attribute.

property bulk_composition

Gather up all the groups of compounds that comprise a ‘bulk’ amount and compile them into an organized list.

Data points that are classified in bulk composition: - wax - water - Sulfur - GC-TPH - GC-TSH - GC-TAH - Hydrocarbon Content Ratio

property compounds

Gather up all the groups of compounds scattered throughout the EC and compile them into an organized list.

Compounds apply to: - individual chemicals - mixed isomers

Compounds do not apply to: - waxes - SARA - Sulfur - Carbon

Note: Although we could in theory assign multiple groups to a

particular compound, we will only assign one group to the list. This group will have a close relationship to the category of compounds where it is found in the EC datasheet.

Note: Most of the compound groups don’t have replicates or

standard deviation. We will not add these attributes if they aren’t found within the attribute group.

compounds_in_group(category, group_category, unit, unit_type, filter_compounds=True)
Parameters:
  • category – The category attribute containing the data

  • group_category – The category attribute containing the group label

  • unit – The unit.

  • unit_type – The type of thing that the unit measures (length, mass, etc.)

  • filter – Filter only those attributes that have a suffix matching the unit value.

Example of content:

{
    'name': '1-Methyl-2-Isopropylbenzene',
    'method': 'ESTS 2002b',
    'groups': ['C4-C6 Alkyl Benzenes', ...],
    'measurement': {
        value: 3.4,
        unit: 'ppm',
        unit_type: 'massfraction',
        replicates: 3,
        standard_deviation: 0.1
    }
}
property densities
dict()
property dispersibilities
property distillation_data
property dynamic_viscosities
property emulsions
property environmental_behavior
property flash_point
generate_sample_id_attrs(sample_id, ests_code)
Parameters:
  • sample_id – The value we will use to internally identify a sample. This will be mapped to metadata.name

  • ests_code – This is the identifier that Env Canada uses for oil samples. This will be mapped to metadata.sample_id

property interfacial_tension_air
property interfacial_tension_seawater
property interfacial_tension_water
property interfacial_tensions
property physical_properties
property pour_point
prepend_ests(method)
transpose_dict_of_lists(obj)

A common step with the parsed data is to convert a dict containing an orthoganal set of list values into a list of dicts, each dict having an indexed slice of the list values

adios_db.data_sources.env_canada.v1.parser module

class adios_db.data_sources.env_canada.v1.parser.EnvCanadaRecordParser(values, conditions, file_props)

Bases: ParserBase

A record class for the Environment Canada oil spreadsheet. This is intended to be used with a set of data representing a single record from the spreadsheet.

  • We manage a hierarcical structure of properties extracted from the Excel columns for an oil. Basically this will be a dictionary of raw property category names, where each property category itself will contain a dictionary of raw individual properties.

  • The data associated with any individual property will be a list of values corresponding to the weathered subsamples that exist for the oil record.

property API
property comments
dict()
property ests_codes
get_label(nav_list)

For an attribute in our values hierarchy, get the original source label information.

Ex:

parser_obj.get_label((‘gc_total_aromatic_hydrocarbon.tah’))

property location
property metadata
property name

For now we will just concatenate all the names we see in the list. In the future, we will want to be a bit smarter.

property oil_id
property product_type
property reference

It has been decided that we will at this time use a hard-coded reference for all records coming from the Env. Canada datasheet.

property sample_date
property source_id

We will use the ESTS codes in the record as the identifier.

ESTS codes are a series of numbers separated by a period ‘.’. The first number in the series seems to identify the species of the petroleum substance, and the rest identify a degree of weathering. So we will use just the first one.

property sub_samples
vertical_slice(index)

All values in our self.values structure will be a list that conforms to the weathering subsamples for a record Recursively navigate values structure

property weathering
class adios_db.data_sources.env_canada.v1.parser.EnvCanadaSampleParser(values, conditions, labels)

Bases: ParserBase

A sample class for the Environment Canada oil spreadsheet. This is intended to be used with a set of data representing a single subsample inside an oil record.

  • We manage a hierarcical structure of properties similar to that of the record parser

  • The data associated with any individual property will be a single scalar value corresponding to a weathered subsample that exists for an oil record.

property adhesion
property api
attr_map = {'benzene': 'benzene_alkylated_benzene', 'boiling_point_cumulative_fraction': 'boiling_point_cumulative_weight_fraction', 'boiling_point_distribution': 'boiling_point_distribution_temperature', 'ccme': 'ccme_fractions', 'emulsion_complex_modulus': 'complex_modulus', 'emulsion_complex_viscosity': 'complex_viscosity', 'emulsion_loss_modulus': 'loss_modulus', 'emulsion_storage_modulus': 'storage_modulus', 'emulsion_tan_delta_v_e': 'tan_delta_v_e', 'emulsion_visual_stability': 'visual_stability', 'emulsion_water_content': 'water_content', 'sara_total_fractions': 'hydrocarbon_group_content'}
property chromatography

The Evironment Canada data sheet contains data for gas chromatography analysis, which we will try to capture.

  • We have four property groups in this case, which we will merge.

    • GC-TPH

    • GC-TSH

    • GC-TAH

    • Hydrocarbon Content Ratio

  • Dimensional parameters are (weathering).

  • Values Units are split between mg/g and percent.

deep_get(attr_path, default=None)
property densities

There is now a single category, density. Attributes within the category conform to an expected sequential block consisting of: - Density - Standard Deviation - Replicates - Method

There will be 3 such blocks in the category

dict()
property dvis

There is now a single viscosity category.

Note: Sometimes there is a greater than (‘>’) indication for a

viscosity value. In this case, we parse the float value as an interval with the operator indicating whether it is a min or a max.

property emulsions

The emulsions struct is more complicated in that it is a mixed bag of different measurements, each with their own units, temperatures, and age. So we just pass the conditions as a separate attribute.

property ests_evaporation_test
get_conditions(name)

The conditions object is indexed in the same way as the values.

get_label(nav_list)

For an attribute in our values hierarchy, get the original source label information.

Ex:

parser_obj.get_label((‘gc_total_aromatic_hydrocarbon.tah’))

property ifts

Now the only tricky bit is to merge the surface/interfacial tension attributes.

prepend_ests(method)

adios_db.data_sources.env_canada.v1.reader module

class adios_db.data_sources.env_canada.v1.reader.EnvCanadaOilExcelFile(name)

Bases: object

A specialized file reader for the Environment Canada oil spreadsheet.

  • This is an Excel spreadsheet with an .xlsx extension. We can use the third party openpyxl package to reach the content.

  • The first column in the file contains the names of oil property categories.

  • The second column in the file contains the names of specific oil properties.

  • The rest of the columns in the file contain oil property values.

property conditions

The April 2020 update of the Environment Canada datasheet contained a few extra columns that contained data concerning the testing conditions for the measurements.

This information is indexed in the same way as the field data, but we only need to create it one time upon opening the file.

get_record(name)

A ‘record’ coming out of our reader is a dict of dicts representing the data for a single oil.

  • The top level keys are the raw category names as seen in the first column of the spreadsheet

  • The second level keys are the raw field names that are contained within the category, as seen in the second column of the spreadsheet

  • Each value in the field dict is a list representing a horizontal slice of the columns that comprise the record

get_records()

Iterate through all the oils, returning all the properties of each one.