README.md

# Ensemble Tools Feature Detection: `enstools.feature`

This package is a module of the enstools python package, developed within
the framework of [Waves to Weather - Transregional Collaborative Research 
Project (SFB/TRR165)](https://wavestoweather.de). enstools can be fetched 
from the public github repo [here](https://github.com/wavestoweather/enstools).

`enstools.feature` is a modular framework for identificaton and tracking of 
meteorological structures with the aims of providing easy-to-use interfaces
for automatic parallelization and unified and readable output. For the latter 
one, this framework uses `protobuf` as description typed structures, where 
users can simply define descriptions for to-detected structures, and using them
for further statistical analyses.

# Installation instructions

We recommend using a conda environment. Install instructions from the public
enstools repo serve as base and have been adapted and extended.


    conda create --name enstools-feature python=3.7
    conda activate enstools-feature

    # install requirements listed in given venv_setup.sh
    pip install --upgrade pip
    pip install wheel numpy==1.20.0

    # integrate enstools
    pip install -e git+https://github.com/wavestoweather/enstools.git@main#egg=enstools

    # install requirements for enstools-feature, and install enstools in this environment
    conda install --file requirements.txt
    pip install -e .


Additionally, depending on the used feature identification strategies, additional packages may be required. // TODO

# Usage: Applying existing techniques

Here is a usage example, if you want to apply existing techniques in the code base to your data set.
First, we need some imports, namely the 
* `FeaturePipeline`, which executes the identification pipeline
* `IdentificationTemplate`, this is the identification technique, edit this accordingly
* `TrackingTemplate`, this is the tracking technique, edit this accordingly
* `template_pb2`, this is the on run auto-generated protobuf python file from the set description. Use the one that matches your detection strategy. They are named *_pb2, where * is the name of the identification module. 
-> TODO: should not really need to set the template here, is specific to identification strategy!

    from enstools.feature.pipeline import FeaturePipeline
    from enstools.feature.identification.template import IdentificationTemplate
    from enstools.feature.tracking.template import TrackingTemplate
    from enstools.feature.identification._proto_gen import template_pb2

Then, we initialize the pipeline with the protobuf description and optional the processing mode. For 3D data, this resembles if identification should be performed individual on 2D (latlon) or 3D subsets.

    pipeline = FeaturePipeline(template_pb2, processing_mode='2d')

Then, we initialize and set our strategies. The tracking can be set to `None` to be ignored.

    i_strat = IdentificationTemplate(some_parameter='foo')
    t_strat = TrackingCompareTemplate()
    pipeline.set_identification_strategy(i_strat)
    pipeline.set_tracking_strategy(t_strat) # or None as argument if no tracking

Next, set the data to process.

    pipeline.set_data_path(path)

Then, the pipeline can be executed, starting the identification and subsequently the tracking.

    pipeline.execute()
    # or separated...
    # pipeline.execute_identification()
    # pipeline.execute_tracking()

This generates an object description based on the set protobuf format. If tracking has been used, tracks based on a default simple heuristic can be generated. See docstrings for further details. The object description holds the objects, and if tracking has been executed a graph structure and the generated tracks respectively.

    pipeline.generate_tracks()
    od = pipeline.get_object_desc()

The output data set and description can be saved:

    pipeline.save_result(description_type='json', description_path=..., dataset_path=...)


Some of the identification techniques we provide include:
- `african_easterly_waves`: Identify AEWs based on an approach similar to [https://doi.org/10.1002/gdj3.40](Belanger et al. (2016))
- `overlap_example`: Simple starting point to identify objects which should later be tracked via overlap. It creates a new field and writes `i` at positions where object with ID `i` has been identified.
- `pv_streamer`: Identify PV anomalies in 2D (streamers) or 3D, see [https://doi.org/10.5194/gmd-2021-424](Fischer et al. (2022))
- `template` is the starting template for use. If you want to identify areas and track them via overlap, you can use `overlap_example` instead.

Some of the tracking techniques we provide include:
- `african_easterly_waves`: Tracking of AEWs, by comparing location of line strings.
- `overlap_tracking`: General overlap tracking. It takes the name of the `DataArray` as parameter, ideally one where the values represents the object's id at the location. It works well together with the `overlap_example` identification.
- `template_object_compare`: Template for tracking, where the tracking strategy is solely based on pairwise comparison of object descriptions from consecutive timesteps.
- `template`: Template for a fallback tracking technique which requires more complex heuristics than above mentioned ones.

# Usage: Adding techniques

We provide some template files, which we recommend as a starting point for your own identification strategy. If you want to add your own identification (and tracking) strategy to the framework, you need to:
- Copy over the template folder and rename it and the files accordingly. If you implement a tracking method, which relys on pairwise comparison of objects from consecutive timesteps, you can use the `template_object_compare`
- In the `__init__.py`, rename the class name to your identification strategy.
- In the `*.proto` file, define the variables each of the detected objects should have. They follow the protobuf protocol, see [here](https://developers.google.com/protocol-buffers/docs/proto). The template file also provides a useful example. proto-files are compiled automatically on running the identification.
- In the `identification.py` (`tracking.py`), implement your identification (tracking) strategy. See the template again for a useful example. There are a few methods:
 - `__init__` gets called from the run script, so the user can set parameters for the algorithm here.
 - `precompute` is called once for the entire data set. The data set can be altered here (temporally and spatially). Also if the strategy should return an additional field (`DataArray`), it should be initialized here as shown in the template.
 - In `identify` goes your identification technique. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained using `o = self.get_new_object(id=obj_id)`, returning an object `o` with the set ID `o.id` and the object properties defined via the protobuf description at `o.properties`.
 - `postprocess` is called once for the entire data set after identification. The data set and the object description can be changed here.

* TODO tracking

# Acknowledgment and license

`enstools.feature` is a collaborative development within
Waves to Weather (SFB/TRR165) project, and funded by the
German Research Foundation (DFG).

A full list of code contributors can [CONTRIBUTORS.md](./CONTRIBUTORS.md). TODO

The code is released under an [Apache-2.0 licence](./LICENSE).