adios_db.data_sources.env_canada.v2 package

This version of the Env. Canada data source importing modules is designed to import the file:

ests_data_03-03-2021.csv

Submodules

adios_db.data_sources.env_canada.v2.mapper module

class adios_db.data_sources.env_canada.v2.mapper.EnvCanadaCsvRecordMapper(record)

Bases: MapperBase

A translation/conversion layer for the Environment Canada imported record object. Basically, the parser has already got the structure mostly in order, but because of the nature of the .csv measurement rows, some re-mapping will be necessary to put it in a form that the Oil object expects.

property oil_id

py_json()

remap_CCME()

remap_ESTS_evaporation()

remap_ESTS_hydrocarbon_fractions()

remap_SARA()

remap_adhesion()

remap_cuts()

remap_emulsions()

remap_interfacial_tension()

reorder_methods(methods)

Generally the remap_*() methods are supposed to be executable in any order, but there could be a situation where a particular method needs to run before another.

This method receives a list of method names, and modifies the list in-place.

Example:

try:
# move the method ‘remap_method’ to the front of the list methods.insert(0, methods.pop(methods.index(‘remap_method’)))

except ValueError:
# Raise a warning that the method could not be reordered. pass

adios_db.data_sources.env_canada.v2.parser module

class adios_db.data_sources.env_canada.v2.parser.BPCumulativeWeightFraction(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

ref_temp_attr = 'vapor_temp'

value_attr = 'fraction'

class adios_db.data_sources.env_canada.v2.parser.BPTemperatureDistribution(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

ref_temp_attr = 'vapor_temp'

value_attr = 'fraction'

class adios_db.data_sources.env_canada.v2.parser.ECAdhesion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

value_attr = 'adhesion'

class adios_db.data_sources.env_canada.v2.parser.ECCompound(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECCompoundUngrouped

py_json()

class adios_db.data_sources.env_canada.v2.parser.ECCompoundUngrouped(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

class adios_db.data_sources.env_canada.v2.parser.ECDensity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

value_attr = 'density'

class adios_db.data_sources.env_canada.v2.parser.ECDispersibility(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

py_json()

value_attr = 'effectiveness'

class adios_db.data_sources.env_canada.v2.parser.ECEmulsion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

class adios_db.data_sources.env_canada.v2.parser.ECEvaporationEq(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

class adios_db.data_sources.env_canada.v2.parser.ECInterfacialTension(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json(): What do we do when the value is ‘Too Viscous’?

value_attr = 'tension'

class adios_db.data_sources.env_canada.v2.parser.ECMeasurement(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurementDataclass

classmethod from_obj(obj)

py_json()

ref_temp_attr = 'ref_temp'

value_attr = 'measurement'

class adios_db.data_sources.env_canada.v2.parser.ECMeasurementDataclass(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: object

An incoming density will have the attributes: - value - unit_of_measure - temperature - condition_of_analysis - standard_deviation - replicates - method

We will output an object with the attributes: - measurement (Measurement type) - method - ref_temp (Temperature Measurement type)

condition_of_analysis: str = None

determine_min_max(): The value field in the Env. Canada measurement row can have relational annotations like ‘>N’ or ‘<N’. In these cases, we turn them into an interval pair.

determine_unit_type()

fix_unit()

Some units are in the form ‘X or Y’. We will just choose the first one.

Temperature units (e.g. ‘°C’) need to be stripped of the degree character

max_value: float = None

method: str = None

min_value: float = None

parse_temperature_string(): The temperature field can have varying content, like ‘15 °C’ or simply ‘15’, in which case we will assume it is Celsius.

property_group: str = None

property_name: str = None

replicates: float = None

standard_deviation: float = None

temperature: str = None

treat_any_bad_initial_values()

unit_of_measure: str = None

value: float = None

class adios_db.data_sources.env_canada.v2.parser.ECValueOnly(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

class adios_db.data_sources.env_canada.v2.parser.ECViscosity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json(): What do we do when the value is ‘Too Viscous’?

value_attr = 'viscosity'

class adios_db.data_sources.env_canada.v2.parser.EnvCanadaCsvRecordParser(values)

Bases: ParserBase

A record class for the Env. Canada .csv flat data file. This is intended to be used with a set of data representing a single oil record from the data file. This set is in the form of a list containing dict objects, each representing a single measurement for the oil we are processing.

There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:
- oil_id: ID of an oil record. This appears to be the camelcase name of the oil joined by an underscore with the ESTS oil ID. There is one common value per oil, but there are redundant copies of this field in every measurement.
- ests: ESTS ID of an oil record with one or more sub-samples. There is one common value per oil, but there are redundant copies of this field in every measurement.
There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties. There is usually one actual field value per oil, but there are redundant copies in every measurement. Sometimes though, there are multiple names that show up in the measurements for an oil. Biodiesel records are an example of this.
- oil_name
- reference
- date_sample_received
- source:
- comments:
There are a number of fields that would intuitively seem to be used to link a measurement to a sub-sample. There is usually one common value per sub-sample, but there are redundant copies in every measurement.
- ests_id: ESTS ID of an oil sample
- weathering_fraction
- weathering_percent
- weathering_method
And finally, we have a set of fields that are used uniquely for the measurement
- value_id
- property_id
- property_group
- property_name
- unit_of_measure
- temperature
- condition_of_analysis
- value
- standard_deviation
- replicates
- method

property API

API Gravity needs to be stored as an oil property, but it is in fact a sub-sample scoped property. So we need to figure out the fresh sample ID and get that specific API gravity property.

Note: API for Biodiesels shows a weathering value of ‘None’,: but clearly it is the “fresh sample”. We need to allow it.

property fresh_sample_id

get_subsample(sample_id)

oil_common_props = ('oil_name', 'ests', 'source', 'date_sample_received', 'comments')

property product_type

prune_incoming(values): The Incoming objects contain some unwanted garbage from the spreadsheet that would be better handled before we start parsing anything.

sample_id_field_name = 'ests_id'

property sample_ids: This function relies on dict having keys ordered by the sequence of insertion into the dict. This is true of Python 3.6, but could break in the future.

set_aggregate_oil_property(attr)

Oil scoped properties are redundantly stored in each measurement object in our list, so they need to be accumulated and treated in some way depending on the type of data we would like to set in the model.

Attributes to be treated as strings will have their values accumulated in a unique set to prune the redundant information, and then the unique strings in the set will be concatenated into a single string.
Attributes to be treated as integers will also be accumulated in a unique set to prune the redundant values. But multiple ints can not be stored in another int the same way a string can. So we issue a warning and then use the first one in the set. This isn’t perfect, but there are only a handful of oil scoped attributes and we can make an exception if there is an obvious problem.

set_aggregate_oil_props()

These are properties commonly associated with an oil.

There is a copy of this information inside every measurement, so we need to reconcile them in order to come up with an aggregate value with which to set the oil properties.

set_aggregate_subsample_props()

These are properties commonly associated with a sub-sample. There is a copy of this information inside every measurement, so we need to reconcile them to determine the identifying properties of each sub-sample.

Sub-sample properties:

ests_id: One common value per sub-sample. This could be numeric,
so we force it to be a string.
weathering_fraction: One value per sub-sample. These values
look like some kind of code that EC uses. Probably not useful to us.
weathering_percent: One common value per sub-sample. These
values are mostly a string in the format ‘N.N%’. We will convert to a structure suitable for a Measurement type.
weathering_method: One common value per sub-sample. This is
information that might be good to save, but it doesn’t fit into the Adios oil model.

set_measurement_property(obj_in)

Set a single measurement from an incoming measurement object

Basically we need to decide how to apply the property to our record

oil scoped properties are applied to the oil object.
sample scoped properties can are applied to a particular sub-sample determined by the object

The properties that describe the measurement are:

value_id: This is a concatenation of the ests and property_id
fields delimited with underscores ‘_’.
property_id: This is a concatenation of the camel cased
property_name and, as far as I can tell, the index value of the sequence in which the property appears.
property_group: This is the name of a group or category with
which a set of measurements might be associated.
property_name: The prose name of the property that is measured.
unit_of_measure: The units for which the measurement describes
a quantity.
temperature: The temperature at which the measurement was taken.
condition_of_analysis: A reasonably free-form line of text that
describes some special condition of the measurement, such as a prerequisite for measurement, a specification on the type of measurement, or its result.
value: A number representing the quantity of the measurement
standard_deviation: The amount of variation in the set of
measurements taken.
replicates: A number representing the quantity of repeated
experiments where measurements were taken.
method: A line of text showing the name of the testing method.

set_measurement_props(): All objects in the incoming list have the primary function of describing a particular measurement of an oil. Here we iterate over these objects.

value_is_invalid(value)

adios_db.data_sources.env_canada.v2.reader module

class adios_db.data_sources.env_canada.v2.reader.EnvCanadaCsvFile(name, **kwargs)

Bases: CsvFile

A file reader for the Env. Canada .csv (.txt, actually) flat datafile.

The original source had a comma separated format with actual commas (‘,’) as field separators. This was insufficient, as some fields, notably the reference field, contained commmas in their content.
Each row represents a single measurement
There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:
- oil_id: ID of an oil record. This appears to be the camelcase
  name of the oil joined by an underscore with the ESTS oil ID.
- ests: ESTS ID of an oil record with one or more sub-samples
There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties.
- oil_name
- date_sample_received
- source
- comments
- reference
There are a number of fields that would intuitively seem to be used to link a measurement to a sub-sample
- ests_id: ESTS ID of an oil sample
- weathering_fraction
- weathering_percent
- weathering_method
And finally, we have a set of fields that are used uniquely for the measurement
- value_id
- property_id
- property_group
- property_name
- unit_of_measure
- temperature
- condition_of_analysis
- value
- standard_deviation
- replicates
- method

get_records()

This is the API that the oil import processes expect

A ‘record’ coming out of our reader is a list of rows representing the data for a single oil.

number_of_columns = 23

oil_id_field_name = 'ests'

exception adios_db.data_sources.env_canada.v2.reader.InvalidFileError

Bases: Exception

Error trying to open a file that is non-compliant with the Env. Canada .csv format.