adios_db.data_sources.env_canada.v3 package

Submodules

adios_db.data_sources.env_canada.v3.mapper module

adios_db.data_sources.env_canada.v3.parser module

class adios_db.data_sources.env_canada.v3.parser.BPCumulativeWeightFraction(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

ref_temp_attr = 'vapor_temp'
value_attr = 'fraction'
class adios_db.data_sources.env_canada.v3.parser.BPTemperatureDistribution(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

final_bp = False
py_json()
ref_temp_attr = 'vapor_temp'
value_attr = 'fraction'
class adios_db.data_sources.env_canada.v3.parser.ECAdhesion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

value_attr = 'adhesion'
class adios_db.data_sources.env_canada.v3.parser.ECCompound(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECCompoundUngrouped

py_json()
class adios_db.data_sources.env_canada.v3.parser.ECCompoundUngrouped(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

determine_unit_type()
py_json()
class adios_db.data_sources.env_canada.v3.parser.ECDensity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

value_attr = 'density'
class adios_db.data_sources.env_canada.v3.parser.ECDispersibility(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECValueOnly

py_json()
value_attr = 'effectiveness'
class adios_db.data_sources.env_canada.v3.parser.ECEmulsion(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v3.parser.ECEvaporationEq(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v3.parser.ECInterfacialTension(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

What do we do when the value is ‘Too Viscous’?

value_attr = 'tension'
class adios_db.data_sources.env_canada.v3.parser.ECMeasurement(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurementDataclass

classmethod from_obj(obj)
py_json()
ref_temp_attr = 'ref_temp'
value_attr = 'measurement'
class adios_db.data_sources.env_canada.v3.parser.ECMeasurementDataclass(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: object

An incoming density will have the attributes: - value - unit_of_measure - temperature - condition_of_analysis - standard_deviation - replicates - method

We will output an object with the attributes: - measurement (Measurement type) - method - ref_temp (Temperature Measurement type)

condition_of_analysis: str = None
determine_unit_type()
fix_unit()

Some units are in the form ‘X or Y’. We will just choose the first one.

Temperature units (e.g. ‘°C’) need to be stripped of the degree character

Illegible unicode for ‘^2’ needs to be corrected.

fix_value_if_min_max()

There are cases where our self.value contains a string indicating an interval representing a min-max. If this is the case, fix it by splitting it into the self.min_value & self.max_values.

There are a lot of various ways the interval data is represented in the datasheet, so here are the cases I have found:

  • Case ‘N-N’: Split on the ‘-’. This makes it ambiguous whether the

    second number is negative or not, but we accept the data as it comes in from the data sheet.

  • Case ‘N-N-N’: Split on the ‘-’. Take the first two items.

  • Case ‘N - N’: Split on the ‘-’. Ignore the spaces.

  • Case ‘N to N’: Split on the ‘to’. Ignore the spaces.

  • Case ‘N, N’: Split on the ‘,’. Ignore the spaces.

  • Case ‘-N’: Single negative number, don’t do anything.

  • Case ‘>N’: min_value = N, max_value = None

  • Case ‘<N’: min_value = None, max_value = N

  • Case ‘N (min.)’: min_value = N

  • Case ‘N (max.)’: max_value = N

  • Case ‘N1 (min.), N2 (max.)’: Split on the ‘,’.

    min_value = N1, max_value = N2

  • Case ‘min N1, max N2’: Split on the ‘,’.

    min_value = N1, max_value = N2

max_value: float = None
method: str = None
min_value: float = None
parse_temperature_string()

The temperature field can have varying content, like ‘15 °C’ or simply ‘15’, in which case we will assume it is Celsius.

property_group: str = None
property_name: str = None
replicates: float = None
set_ranged_values(min_value, max_value, *args)
  • each value passed in is part of a min/max interval.

  • There are also a few cases where the min/max quality of the value is annotated with a ‘ (min.)’ or a ‘ (max.)’ suffix.

set_value(value)
split_value_on_nospace_dashes()

cases like ‘N-N’ and ‘N-N-N’ are problematic when parsed by regex because the ‘-’ character is also used as a sign indicator for the numbers.

A possible algorithm is to split the string on all dashes, and the empty items in our resulting list will indicate a dash was used as a sign character.

Ex. ‘5-4’ -> [‘5’, ‘4’]

Ex. ‘-5–4’ -> [‘’, ‘5’, ‘’, ‘4’]

^ ^

sign sign

Note: We will not try to turn these items into float values here, but

we would like them to be numeric. Otherwise, we don’t have an interval.

standard_deviation: float = None
temperature: str = None
treat_any_bad_initial_values()
unit_of_measure: str = None
value: float = None
class adios_db.data_sources.env_canada.v3.parser.ECValueOnly(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()
class adios_db.data_sources.env_canada.v3.parser.ECViscosity(property_group: str = None, property_name: str = None, value: float = None, min_value: float = None, max_value: float = None, unit_of_measure: str = None, temperature: str = None, condition_of_analysis: str = None, standard_deviation: float = None, replicates: float = None, method: str = None)

Bases: ECMeasurement

py_json()

What do we do when the value is ‘Too Viscous’?

value_attr = 'viscosity'
class adios_db.data_sources.env_canada.v3.parser.EnvCanadaCsvRecordParser1999(values)

Bases: ParserBase

A record class for the Env. Canada .csv flat data file. This is intended to be used with a set of data representing a single oil record from the data file. This set is in the form of a list containing dict objects, each representing a single measurement for the oil we are processing.

  • There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:

    • oil_id: ID of an oil record. This appears to be the camelcase name of the oil joined by an underscore with the ECCC oil ID. There is one common value per oil, but there are redundant copies of this field in every measurement.

    • oil_index: ID of an oil record with one or more sub-samples. Similar to the ests field in the other EC data set. There is one common value per oil, but there are redundant copies of this field in every measurement.

  • There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties. There is usually one actual field value per oil, but there are redundant copies in every measurement. Sometimes though, there are multiple names that show up in the measurements for an oil. Biodiesel records are an example of this.

    • oil_name

    • referenceID: A code to be used to look up the reference title in the

      EC Reference document “References-Catalogue_of_Crude_Oil_and_Oil_Product_Properties_(1999)-Revised_2022_En_and_Fr.pdf”

    • sample_reference: The name of the reference. This almost always

      matches the lookup value correlated with the referenceID, so there is some redundancy here.

    • reference: Would have expected something like the title of a document

      here, but mostly this contains a ‘N/A’ value. The fields that are actually filled in contain the same referenceID code.

    • comments:

    • origin: The location where the oil originates.

    • synonyms

  • There are a number of fields that would intuitively seem to associated a measurement with a sub-sample. There is usually one common value per sub-sample, but there are redundant copies in every measurement.

    • index_id: ID of an oil sample. This appears to be the concatenation

      of the oil_index and a sample number separated by a period “.”.

    • weathering_fraction: This appears to be a percentage value in the

      range 0-100.

    • grade

  • And finally, we have a set of fields that are used uniquely for the measurement

    • property_name

    • property_group

    • property_id

    • value

    • value_id

    • unit_of_measure

    • temperature

    • condition_of_analysis

    • note

property API

API Gravity needs to be stored as an oil property, but it is in fact a sub-sample scoped property. So we need to figure out the fresh sample ID and get that specific API gravity property.

Note: API for Biodiesels shows a weathering value of ‘None’,

but clearly it is the “fresh sample”. We need to allow it.

property fresh_sample_id
get_subsample(sample_id)
oil_common_props = ('oil_name', 'oil_index', 'synonyms', 'origin', 'referenceID', 'comments')
property product_type
prune_incoming(values)

The Incoming objects contain some unwanted garbage from the spreadsheet that would be better handled before we start parsing anything.

sample_id_field_name = 'index_id'
property sample_ids

This function relies on dict having keys ordered by the sequence of insertion into the dict. This is true of Python 3.6, but could break in the future.

set_aggregate_oil_property(attr)

Oil scoped properties are redundantly stored in each measurement object in our list, so they need to be accumulated and treated in some way depending on the type of data we would like to set in the model.

  • Attributes to be treated as strings will have their values accumulated in a unique set to prune the redundant information, and then the unique strings in the set will be concatenated into a single string.

  • Attributes to be treated as integers will also be accumulated in a unique set to prune the redundant values. But multiple ints can not be stored in another int the same way a string can. So we issue a warning and then use the first one in the set. This isn’t perfect, but there are only a handful of oil scoped attributes and we can make an exception if there is an obvious problem.

set_aggregate_oil_props()

These are properties commonly associated with an oil.

There is a copy of this information inside every measurement, so we need to reconcile them in order to come up with an aggregate value with which to set the oil properties.

set_aggregate_subsample_props()

These are properties commonly associated with a sub-sample. There is a copy of this information inside every measurement, so we need to reconcile them to determine the identifying properties of each sub-sample.

Sub-sample properties:

  • ests_id: One common value per sub-sample. This could be numeric,

    so we force it to be a string.

  • weathering_fraction: One value per sub-sample. These values

    look like some kind of code that EC uses. Probably not useful to us.

  • weathering_percent: One common value per sub-sample. These

    values are mostly a string in the format ‘N.N%’. We will convert to a structure suitable for a Measurement type.

  • weathering_method: One common value per sub-sample. This is

    information that might be good to save, but it doesn’t fit into the Adios oil model.

set_measurement_property(obj_in)

Set a single measurement from an incoming measurement object

Basically we need to decide how to apply the property to our record

  • oil scoped properties are applied to the oil object.

  • sample scoped properties can are applied to a particular sub-sample determined by the object

The properties that describe the measurement are:

  • value_id: This is a concatenation of the ests and property_id

    fields delimited with underscores ‘_’.

  • property_id: This is a concatenation of the camel cased

    property_name and, as far as I can tell, the index value of the sequence in which the property appears.

  • property_group: This is the name of a group or category with

    which a set of measurements might be associated.

  • property_name: The prose name of the property that is measured.

  • unit_of_measure: The units for which the measurement describes

    a quantity.

  • temperature: The temperature at which the measurement was taken.

  • condition_of_analysis: A reasonably free-form line of text that

    describes some special condition of the measurement, such as a prerequisite for measurement, a specification on the type of measurement, or its result.

  • value: A number representing the quantity of the measurement

  • standard_deviation: The amount of variation in the set of

    measurements taken.

  • replicates: A number representing the quantity of repeated

    experiments where measurements were taken.

  • method: A line of text showing the name of the testing method.

set_measurement_props()

All objects in the incoming list have the primary function of describing a particular measurement of an oil. Here we iterate over these objects.

value_is_invalid(value)
class adios_db.data_sources.env_canada.v3.parser.SeparatorString

Bases: str

This is simply a way to specify a custom separator into our mapping list items that are of type str.

sep = '|'

adios_db.data_sources.env_canada.v3.reader module

class adios_db.data_sources.env_canada.v3.reader.EnvCanadaCsvFile1999(name, encoding='utf-8', **kwargs)

Bases: EnvCanadaCsvFile

A file reader for the Env. Canada .csv flat data file referencing the data from the year 1999. This is reasonably similar to the previous data set we received from them, but some of the columns are different and there are some minor differences in the data.

The name of the file is: Catalogue_of_Crude_Oil_and_Oil_Product_Properties_(1999)-Revised_2022_En.csv

  • The fields are comma separated ‘,’. This may prove to be problematic, as some fields, notably the reference field, could contain commmas as well.

  • Each row represents a single measurement

  • There are a number of reference fields, i.e. fields that associate a particular measurement to an oil. They are:

    • oil_id: ID of an oil record. This appears to be the camelcase

      name of the oil joined by an underscore with its oil ID.

    • oil_index: ECCC ID of an oil record with one or more sub-samples.

      This is similar to the ests field of the other data set.

  • There are also a number of fields that would not normally be used to link a measurement to an oil, but are clearly oil general properties.

    • oil_name

    • referenceID

    • reference

    • sample_reference

    • comments

    • origin

    • synonyms

  • There are a number of fields that would intuitively seem to be used to link a measurement to a sub-sample

    • index_id: ID of an oil sample. This appears to be the concatenation

      of the oil_index and a sample number separated by a period “.”.

    • weathering_fraction: This appears to be a percentage value in the

      range 0-100.

    • grade

  • And finally, we have a set of fields that are used uniquely for the measurement

    • property_name

    • property_group

    • property_id

    • value

    • value_id

    • unit_of_measure

    • temperature

    • condition_of_analysis

    • note

number_of_columns = 22
oil_id_field_name = 'oil_index'

adios_db.data_sources.env_canada.v3.refcode_lu module