Skip to content
Snippets Groups Projects

Ensemble Tools Feature Detection: enstools.feature

This package is a module of the enstools python package, developed within the framework of Waves to Weather - Transregional Collaborative Research Project (SFB/TRR165). enstools can be fetched from the public github repo here.

enstools.feature is a modular framework for identificaton and tracking of meteorological structures with the aims of providing easy-to-use interfaces for automatic parallelization and unified and readable output. For the latter one, this framework uses protobuf as description typed structures, where users can simply define descriptions for to-detected structures, and using them for further statistical analyses.

Installation instructions

We recommend using a conda environment. Install instructions from the public enstools repo serve as base and have been adapted and extended.

conda create --name enstools-feature python=3.7
conda activate enstools-feature

# install requirements listed in given venv_setup.sh
pip install --upgrade pip
pip install wheel numpy==1.20.0

# integrate enstools
pip install -e git+https://github.com/wavestoweather/enstools.git@main#egg=enstools

# install requirements for enstools-feature, and install enstools in this environment
conda install --file requirements.txt
pip install -e .

Additionally, depending on the used feature identification strategies, additional packages may be required. // TODO

Usage: Applying existing techniques

Here is a usage example, if you want to apply existing techniques in the code base to your data set. First, we need some imports, namely the

  • FeaturePipeline, which executes the identification pipeline

  • IdentificationTemplate, this is the identification technique, edit this accordingly

  • TrackingTemplate, this is the tracking technique, edit this accordingly

  • template_pb2, this is the on run auto-generated protobuf python file from the set description. Use the one that matches your detection strategy. They are named *_pb2, where * is the name of the identification module. -> TODO: should not really need to set the template here, is specific to identification strategy!

    from enstools.feature.pipeline import FeaturePipeline from enstools.feature.identification.template import IdentificationTemplate from enstools.feature.tracking.template import TrackingTemplate from enstools.feature.identification._proto_gen import template_pb2

Then, we initialize the pipeline with the protobuf description and optional the processing mode. For 3D data, this resembles if identification should be performed individual on 2D (latlon) or 3D subsets.

pipeline = FeaturePipeline(template_pb2, processing_mode='2d')

Then, we initialize and set our strategies. The tracking can be set to None to be ignored.

i_strat = IdentificationTemplate(some_parameter='foo')
t_strat = TrackingCompareTemplate()
pipeline.set_identification_strategy(i_strat)
pipeline.set_tracking_strategy(t_strat) # or None as argument if no tracking

Next, set the data to process.

pipeline.set_data_path(path)

Then, the pipeline can be executed, starting the identification and subsequently the tracking.

pipeline.execute()
# or separated...
# pipeline.execute_identification()
# pipeline.execute_tracking()

This generates an object description based on the set protobuf format. If tracking has been used, tracks based on a default simple heuristic can be generated. See docstrings for further details. The object description holds the objects, and if tracking has been executed a graph structure and the generated tracks respectively.

pipeline.generate_tracks()
od = pipeline.get_object_desc()

The output data set and description can be saved:

pipeline.save_result(description_type='json', description_path=..., dataset_path=...)

Some of the identification techniques we provide include:

  • african_easterly_waves: Identify AEWs based on an approach similar to [https://doi.org/10.1002/gdj3.40](Belanger%20et%20al.%20(2016))
  • overlap_example: Simple starting point to identify objects which should later be tracked via overlap. It creates a new field and writes i at positions where object with ID i has been identified.
  • pv_streamer: Identify PV anomalies in 2D (streamers) or 3D, see [https://doi.org/10.5194/gmd-2021-424](Fischer%20et%20al.%20(2022))
  • template is the starting template for use. If you want to identify areas and track them via overlap, you can use overlap_example instead.

Some of the tracking techniques we provide include:

  • african_easterly_waves: Tracking of AEWs, by comparing location of line strings.
  • overlap_tracking: General overlap tracking. It takes the name of the DataArray as parameter, ideally one where the values represents the object's id at the location. It works well together with the overlap_example identification.
  • template_object_compare: Template for tracking, where the tracking strategy is solely based on pairwise comparison of object descriptions from consecutive timesteps.
  • template: Template for a fallback tracking technique which requires more complex heuristics than above mentioned ones.

Usage: Adding techniques

We provide some template files, which we recommend as a starting point for your own identification strategy. If you want to add your own identification (and tracking) strategy to the framework, you need to:

  • Copy over the template folder and rename it and the files accordingly. If you implement a tracking method, which relys on pairwise comparison of objects from consecutive timesteps, you can use the template_object_compare
  • In the __init__.py, rename the class name to your identification strategy.
  • In the *.proto file, define the variables each of the detected objects should have. They follow the protobuf protocol, see here. The template file also provides a useful example. proto-files are compiled automatically on running the identification.
  • In the identification.py (tracking.py), implement your identification (tracking) strategy. See the template again for a useful example. There are a few methods:
  • __init__ gets called from the run script, so the user can set parameters for the algorithm here.
  • precompute is called once for the entire data set. The data set can be altered here (temporally and spatially). Also if the strategy should return an additional field (DataArray), it should be initialized here as shown in the template.
  • In identify goes your identification technique. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained using o = self.get_new_object(id=obj_id), returning an object o with the set ID o.id and the object properties defined via the protobuf description at o.properties.
  • postprocess is called once for the entire data set after identification. The data set and the object description can be changed here.
  • TODO tracking

Acknowledgment and license

enstools.feature is a collaborative development within Waves to Weather (SFB/TRR165) project, and funded by the German Research Foundation (DFG).

A full list of code contributors can CONTRIBUTORS.md. TODO

The code is released under an Apache-2.0 licence.