adios_db.data_sources.env_canada.v2 package

This version of the Env. Canada data source importing modules is designed to import the file:

ests_data_03-03-2021.csv

Submodules

adios_db.data_sources.env_canada.v2.mapper module

class adios_db.data_sources.env_canada.v2.mapper.EnvCanadaCsvRecordMapper(record)

Bases: MapperBase

A translation/conversion layer for the Environment Canada imported record object. Basically, the parser has already got the structure mostly in order, but because of the nature of the .csv measurement rows, some re-mapping will be necessary to put it in a form that the Oil object expects.

property oil_id
py_json()
remap_CCME()
remap_ESTS_evaporation()
remap_ESTS_hydrocarbon_fractions()
remap_SARA()
remap_adhesion()
remap_cuts()
remap_emulsions()
remap_interfacial_tension()

adios_db.data_sources.env_canada.v2.parser module

class adios_db.data_sources.env_canada.v2.parser.BPCumulativeWeightFraction(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

ref_temp_attr = 'vapor_temp'
value_attr = 'fraction'
class adios_db.data_sources.env_canada.v2.parser.BPTemperatureDistribution(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
ref_temp_attr = 'vapor_temp'
value_attr = 'fraction'
class adios_db.data_sources.env_canada.v2.parser.ECAdhesion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

value_attr = 'adhesion'
class adios_db.data_sources.env_canada.v2.parser.ECCompound(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECCompoundUngrouped

py_json()
class adios_db.data_sources.env_canada.v2.parser.ECCompoundUngrouped(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v2.parser.ECDensity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

value_attr = 'density'
class adios_db.data_sources.env_canada.v2.parser.ECDispersibility(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

py_json()
value_attr = 'effectiveness'
class adios_db.data_sources.env_canada.v2.parser.ECEmulsion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v2.parser.ECEvaporationEq(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v2.parser.ECInterfacialTension(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

What do we do when the value is ‘Too Viscous’?

value_attr = 'tension'
class adios_db.data_sources.env_canada.v2.parser.ECMeasurement(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurementDataclass

classmethod from_obj(obj)
py_json()
ref_temp_attr = 'ref_temp'
value_attr = 'measurement'
class adios_db.data_sources.env_canada.v2.parser.ECMeasurementDataclass(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: object

An incoming density will have the attributes: - value - unit_of_measure - temperature - condition_of_analysis - standard_deviation - replicates - method

We will output an object with the attributes: - measurement (Measurement type) - method - ref_temp (Temperature Measurement type)

condition_of_analysis: str = None
determine_min_max()

The value field in the Env. Canada measurement row can have relational annotations like ‘>N’ or ‘<N’. In these cases, we turn them into an interval pair.

determine_unit_type()
fix_unit()

Some units are in the form ‘X or Y’. We will just choose the first one.

Temperature units (e.g. ‘°C’) need to be stripped of the degree character

max_value: float = None
method: str = None
min_value: float = None
parse_temperature_string()

The temperature field can have varying content, like ‘15 °C’ or simply ‘15’, in which case we will assume it is Celsius.

property_group: str = None
property_name: str = None
replicates: float = None
standard_deviation: float = None
temperature: str = None
treat_any_bad_initial_values()
unit_of_measure: str = None
value: float = None
class adios_db.data_sources.env_canada.v2.parser.ECValueOnly(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v2.parser.ECViscosity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

What do we do when the value is ‘Too Viscous’?

value_attr = 'viscosity'
class adios_db.data_sources.env_canada.v2.parser.EnvCanadaCsvRecordParser(values)

Bases: ParserBase

A record class for the Env. Canada .csv flat data file. This is intended to be used with a set of data representing a single oil record from the data file. This set is in the form of a list containing dict objects, each representing a single measurement for the oil we are processing.

  • There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:

    • oil_id: ID of an oil record. This appears to be the camelcase name of the oil joined by an underscore with the ESTS oil ID. There is one common value per oil, but there are redundant copies of this field in every measurement.

    • ests: ESTS ID of an oil record with one or more sub-samples. There is one common value per oil, but there are redundant copies of this field in every measurement.

  • There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties. There is usually one actual field value per oil, but there are redundant copies in every measurement. Sometimes though, there are multiple names that show up in the measurements for an oil. Biodiesel records are an example of this.

    • oil_name

    • reference

    • date_sample_received

    • source:

    • comments:

  • There are a number of fields that would intuitively seem to be used to link a measurement to a sub-sample. There is usually one common value per sub-sample, but there are redundant copies in every measurement.

    • ests_id: ESTS ID of an oil sample

    • weathering_fraction

    • weathering_percent

    • weathering_method

  • And finally, we have a set of fields that are used uniquely for the measurement

    • value_id

    • property_id

    • property_group

    • property_name

    • unit_of_measure

    • temperature

    • condition_of_analysis

    • value

    • standard_deviation

    • replicates

    • method

property API

API Gravity needs to be stored as an oil property, but it is in fact a sub-sample scoped property. So we need to figure out the fresh sample ID and get that specific API gravity property.

Note: API for Biodiesels shows a weathering value of ‘None’,

but clearly it is the “fresh sample”. We need to allow it.

property fresh_sample_id
get_subsample(sample_id)
oil_common_props = ('oil_name', 'ests', 'source', 'date_sample_received', 'comments')
property product_type
prune_incoming(values)

The Incoming objects contain some unwanted garbage from the spreadsheet that would be better handled before we start parsing anything.

sample_id_field_name = 'ests_id'
property sample_ids

This function relies on dict having keys ordered by the sequence of insertion into the dict. This is true of Python 3.6, but could break in the future.

set_aggregate_oil_property(attr)

Oil scoped properties are redundantly stored in each measurement object in our list, so they need to be accumulated and treated in some way depending on the type of data we would like to set in the model.

  • Attributes to be treated as strings will have their values accumulated in a unique set to prune the redundant information, and then the unique strings in the set will be concatenated into a single string.

  • Attributes to be treated as integers will also be accumulated in a unique set to prune the redundant values. But multiple ints can not be stored in another int the same way a string can. So we issue a warning and then use the first one in the set. This isn’t perfect, but there are only a handful of oil scoped attributes and we can make an exception if there is an obvious problem.

set_aggregate_oil_props()

These are properties commonly associated with an oil.

There is a copy of this information inside every measurement, so we need to reconcile them in order to come up with an aggregate value with which to set the oil properties.

set_aggregate_subsample_props()

These are properties commonly associated with a sub-sample. There is a copy of this information inside every measurement, so we need to reconcile them to determine the identifying properties of each sub-sample.

Sub-sample properties:

  • ests_id: One common value per sub-sample. This could be numeric,

    so we force it to be a string.

  • weathering_fraction: One value per sub-sample. These values

    look like some kind of code that EC uses. Probably not useful to us.

  • weathering_percent: One common value per sub-sample. These

    values are mostly a string in the format ‘N.N%’. We will convert to a structure suitable for a Measurement type.

  • weathering_method: One common value per sub-sample. This is

    information that might be good to save, but it doesn’t fit into the Adios oil model.

set_measurement_property(obj_in)

Set a single measurement from an incoming measurement object

Basically we need to decide how to apply the property to our record

  • oil scoped properties are applied to the oil object.

  • sample scoped properties can are applied to a particular sub-sample determined by the object

The properties that describe the measurement are:

  • value_id: This is a concatenation of the ests and property_id

    fields delimited with underscores ‘_’.

  • property_id: This is a concatenation of the camel cased

    property_name and, as far as I can tell, the index value of the sequence in which the property appears.

  • property_group: This is the name of a group or category with

    which a set of measurements might be associated.

  • property_name: The prose name of the property that is measured.

  • unit_of_measure: The units for which the measurement describes

    a quantity.

  • temperature: The temperature at which the measurement was taken.

  • condition_of_analysis: A reasonably free-form line of text that

    describes some special condition of the measurement, such as a prerequisite for measurement, a specification on the type of measurement, or its result.

  • value: A number representing the quantity of the measurement

  • standard_deviation: The amount of variation in the set of

    measurements taken.

  • replicates: A number representing the quantity of repeated

    experiments where measurements were taken.

  • method: A line of text showing the name of the testing method.

set_measurement_props()

All objects in the incoming list have the primary function of describing a particular measurement of an oil. Here we iterate over these objects.

value_is_invalid(value)

adios_db.data_sources.env_canada.v2.reader module

class adios_db.data_sources.env_canada.v2.reader.EnvCanadaCsvFile(name, **kwargs)

Bases: CsvFile

A file reader for the Env. Canada .csv (.txt, actually) flat datafile.

  • The original source had a comma separated format with actual commas (‘,’) as field separators. This was insufficient, as some fields, notably the reference field, contained commmas in their content.

  • Each row represents a single measurement

  • There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:

    • oil_id: ID of an oil record. This appears to be the camelcase

      name of the oil joined by an underscore with the ESTS oil ID.

    • ests: ESTS ID of an oil record with one or more sub-samples

  • There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties.

    • oil_name

    • date_sample_received

    • source

    • comments

    • reference

  • There are a number of fields that would intuitively seem to be used to link a measurement to a sub-sample

    • ests_id: ESTS ID of an oil sample

    • weathering_fraction

    • weathering_percent

    • weathering_method

  • And finally, we have a set of fields that are used uniquely for the measurement

    • value_id

    • property_id

    • property_group

    • property_name

    • unit_of_measure

    • temperature

    • condition_of_analysis

    • value

    • standard_deviation

    • replicates

    • method

get_records()

This is the API that the oil import processes expect

A ‘record’ coming out of our reader is a list of rows representing the data for a single oil.

number_of_columns = 23
oil_id_field_name = 'ests'
exception adios_db.data_sources.env_canada.v2.reader.InvalidFileError

Bases: Exception

Error trying to open a file that is non-compliant with the Env. Canada .csv format.