enstools.feature
Ensemble Tools Feature Detection: This package is a module of the enstools python package, developed within the framework of Waves to Weather - Transregional Collaborative Research Project (SFB/TRR165). enstools can be fetched from the public github repo here.
enstools.feature
is a modular framework for identificaton and tracking of
meteorological structures with the aims of providing easy-to-use interfaces
for automatic parallelization and unified and readable output. For the latter
one, this framework uses protobuf
as description typed structures, where
users can simply define descriptions for to-detected structures, and using them
for further statistical analyses.
Installation instructions
We recommend using a conda environment. Install instructions from the public enstools repo serve as base and have been adapted and extended.
conda create --name enstools-feature python=3.7
conda activate enstools-feature
# install requirements listed in given venv_setup.sh
pip install --upgrade pip
pip install wheel numpy==1.20.0
# integrate enstools
pip install -e git+https://github.com/wavestoweather/enstools.git@main#egg=enstools
# install requirements for enstools-feature, and install enstools in this environment
conda install --file requirements.txt
pip install -e .
Additionally, depending on the used feature identification strategies, additional packages may be required. // TODO
Usage: Applying existing techniques
Here is a usage example, if you want to apply existing techniques in the code base to your data set. First, we need some imports, namely the
-
FeaturePipeline
, which executes the identification pipeline -
IdentificationTemplate
, this is the identification technique, edit this accordingly -
TrackingTemplate
, this is the tracking technique, edit this accordingly -
template_pb2
, this is the on run auto-generated protobuf python file from the set description. Use the one that matches your detection strategy. They are named *_pb2, where * is the name of the identification module. -> TODO: should not really need to set the template here, is specific to identification strategy!from enstools.feature.pipeline import FeaturePipeline from enstools.feature.identification.template import IdentificationTemplate from enstools.feature.tracking.template import TrackingTemplate from enstools.feature.identification._proto_gen import template_pb2
Then, we initialize the pipeline with the protobuf description and optional the processing mode. For 3D data, this resembles if identification should be performed individual on 2D (latlon) or 3D subsets.
pipeline = FeaturePipeline(template_pb2, processing_mode='2d')
Then, we initialize and set our strategies. The tracking can be set to None
to be ignored.
i_strat = IdentificationTemplate(some_parameter='foo')
t_strat = TrackingCompareTemplate()
pipeline.set_identification_strategy(i_strat)
pipeline.set_tracking_strategy(t_strat) # or None as argument if no tracking
Next, set the data to process.
pipeline.set_data_path(path)
Then, the pipeline can be executed, starting the identification and subsequently the tracking.
pipeline.execute()
# or separated...
# pipeline.execute_identification()
# pipeline.execute_tracking()
This generates an object description based on the set protobuf format. If tracking has been used, tracks based on a default simple heuristic can be generated. See docstrings for further details. The object description holds the objects, and if tracking has been executed a graph structure and the generated tracks respectively.
pipeline.generate_tracks()
od = pipeline.get_object_desc()
The output data set and description can be saved:
pipeline.save_result(description_type='json', description_path=..., dataset_path=...)
Some of the identification techniques we provide include:
-
african_easterly_waves
: Identify AEWs based on an approach similar to [https://doi.org/10.1002/gdj3.40](Belanger%20et%20al.%20(2016)) -
overlap_example
: Simple starting point to identify objects which should later be tracked via overlap. It creates a new field and writesi
at positions where object with IDi
has been identified. -
pv_streamer
: Identify PV anomalies in 2D (streamers) or 3D, see [https://doi.org/10.5194/gmd-2021-424](Fischer%20et%20al.%20(2022)) -
template
is the starting template for use. If you want to identify areas and track them via overlap, you can useoverlap_example
instead.
Some of the tracking techniques we provide include:
-
african_easterly_waves
: Tracking of AEWs, by comparing location of line strings. -
overlap_tracking
: General overlap tracking. It takes the name of theDataArray
as parameter, ideally one where the values represents the object's id at the location. It works well together with theoverlap_example
identification. -
template_object_compare
: Template for tracking, where the tracking strategy is solely based on pairwise comparison of object descriptions from consecutive timesteps. -
template
: Template for a fallback tracking technique which requires more complex heuristics than above mentioned ones.
Usage: Adding techniques
We provide some template files, which we recommend as a starting point for your own identification strategy. If you want to add your own identification (and tracking) strategy to the framework, you need to:
- Copy over the template folder and rename it and the files accordingly. If you implement a tracking method, which relys on pairwise comparison of objects from consecutive timesteps, you can use the
template_object_compare
- In the
__init__.py
, rename the class name to your identification strategy. - In the
*.proto
file, define the variables each of the detected objects should have. They follow the protobuf protocol, see here. The template file also provides a useful example. proto-files are compiled automatically on running the identification. - In the
identification.py
(tracking.py
), implement your identification (tracking) strategy. See the template again for a useful example. There are a few methods: -
__init__
gets called from the run script, so the user can set parameters for the algorithm here. -
precompute
is called once for the entire data set. The data set can be altered here (temporally and spatially). Also if the strategy should return an additional field (DataArray
), it should be initialized here as shown in the template. - In
identify
goes your identification technique. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained usingo = self.get_new_object(id=obj_id)
, returning an objecto
with the set IDo.id
and the object properties defined via the protobuf description ato.properties
. -
postprocess
is called once for the entire data set after identification. The data set and the object description can be changed here.
- TODO tracking
Acknowledgment and license
enstools.feature
is a collaborative development within
Waves to Weather (SFB/TRR165) project, and funded by the
German Research Foundation (DFG).
A full list of code contributors can CONTRIBUTORS.md. TODO
The code is released under an Apache-2.0 licence.