Compare revisions

Christoph.Fischer · Christoph.Fischer · Christoph Fischer · Christoph Fischer · Christopher Polster · Christopher Polster
--- a/README.md
+++ b/README.md
@@ -35,13 +35,13 @@ enstools repo serve as base and have been adapted and extended.
 Additionally, depending on the used feature identification strategies, additional packages may be required. // TODO
-# Usage: Applying existing techniques
+# Usage: Applying existing strategies
-Here is a usage example, if you want to apply existing techniques in the code base to your data set.
+Here is a usage example, if you want to apply existing strategies in the code base to your data set.
 First, we need some imports, namely the 
 * `FeaturePipeline`, which executes the identification pipeline
-* `IdentificationTemplate`, this is the identification technique, edit this accordingly
+* `IdentificationTemplate`, this is the identification strategy, edit this accordingly
-* `TrackingTemplate`, this is the tracking technique, edit this accordingly
+* `TrackingTemplate`, this is the tracking strategy, edit this accordingly
 * `template_pb2`, this is the on run auto-generated protobuf python file from the set description. Use the one that matches your detection strategy. They are named *_pb2, where * is the name of the identification module. 
 -> TODO: should not really need to set the template here, is specific to identification strategy!
@@ -82,19 +82,19 @@ The output data set and description can be saved:
    pipeline.save_result(description_type='json', description_path=..., dataset_path=...)
-Some of the identification techniques we provide include:
+Some of the identification strategies we provide include:
 - `african_easterly_waves`: Identify AEWs based on an approach similar to [https://doi.org/10.1002/gdj3.40](Belanger et al. (2016))
 - `overlap_example`: Simple starting point to identify objects which should later be tracked via overlap. It creates a new field and writes `i` at positions where object with ID `i` has been identified.
 - `pv_streamer`: Identify PV anomalies in 2D (streamers) or 3D, see [https://doi.org/10.5194/gmd-2021-424](Fischer et al. (2022))
 - `template` is the starting template for use. If you want to identify areas and track them via overlap, you can use `overlap_example` instead.
-Some of the tracking techniques we provide include:
+Some of the tracking strategies we provide include:
 - `african_easterly_waves`: Tracking of AEWs, by comparing location of line strings.
 - `overlap_tracking`: General overlap tracking. It takes the name of the `DataArray` as parameter, ideally one where the values represents the object's id at the location. It works well together with the `overlap_example` identification.
 - `template_object_compare`: Template for tracking, where the tracking strategy is solely based on pairwise comparison of object descriptions from consecutive timesteps.
- `template`: Template for a fallback tracking technique which requires more complex heuristics than above mentioned ones.
+- `template`: Template for a fallback tracking strategy which requires more complex heuristics than above mentioned ones.
-# Usage: Adding techniques
+# Usage: Adding strategies
 We provide some template files, which we recommend as a starting point for your own identification strategy. If you want to add your own identification (and tracking) strategy to the framework, you need to:
 - Copy over the template folder and rename it and the files accordingly. If you implement a tracking method, which relys on pairwise comparison of objects from consecutive timesteps, you can use the `template_object_compare`
@@ -103,7 +103,7 @@ We provide some template files, which we recommend as a starting point for your
 - In the `identification.py` (`tracking.py`), implement your identification (tracking) strategy. See the template again for a useful example. There are a few methods:
 - `__init__` gets called from the run script, so the user can set parameters for the algorithm here.
 - `precompute` is called once for the entire data set. The data set can be altered here (temporally and spatially). Also if the strategy should return an additional field (`DataArray`), it should be initialized here as shown in the template.
- - In `identify` goes your identification technique. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained using `o = self.get_new_object(id=obj_id)`, returning an object `o` with the set ID `o.id` and the object properties defined via the protobuf description at `o.properties`.
+ - In `identify` goes your identification strategy. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained using `o = self.get_new_object(id=obj_id)`, returning an object `o` with the set ID `o.id` and the object properties defined via the protobuf description at `o.properties`.
 - `postprocess` is called once for the entire data set after identification. The data set and the object description can be changed here.
 * TODO tracking

--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+.PHONY: help Makefile
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/make.bat
+++ b/docs/make.bat
+@ECHO OFF
+pushd %~dp0
+REM Command file for Sphinx documentation
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+if "%1" == "" goto help
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+:end
+popd
--- a/docs/source/_static/w2w.css
+++ b/docs/source/_static/w2w.css
+div.related {
+    background-color: #2D338A;
+}
+div.related > *, .footer {
+    max-width: 1180px;
+}
+div.related a {
+    color: #EEE;
+}
+div.body {
+    max-width: 900px;
+}
+div.body h1, div.body h2 {
+    background-color: #DDD;
+}
+p.logo {
+    margin-top: 0;
+}
+a {
+    color: #2D338A;
+}
+a:hover {
+    color: #4261AC;
+}
+.field-list {
+    max-width: 700px;
+}
+div.seealso {
+    background-color: #EEE;
+    border-color: #CCC;
+}
--- a/docs/source/_static/w2w.png
+++ b/docs/source/_static/w2w.png
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+# -- Project information -----------------------------------------------------
+project = 'enstools-feature'
+copyright = '2022, Waves To Weather'
+author = 'Waves To Weather'
+# -- General configuration ---------------------------------------------------
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+]
+autodoc_default_options = {
+    "members": True,
+    "method": True,
+}
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'nature'
+html_css_files = [
+    "w2w.css"
+]
+html_logo = "_static/w2w.png"
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
--- a/docs/source/identification.rst
+++ b/docs/source/identification.rst
+Identification
+==============
+Some of the identification strategies we provide include:
+.. toctree::
+   :maxdepth: 1
+   identification_african_easterly_waves
+   identification_pv_streamer
+   identification_threshold
+   identification_overlap_example
+   identification_template
+Output data format
+------------------
+Protobuf, json, etc.
--- a/docs/source/identification_african_easterly_waves.rst
+++ b/docs/source/identification_african_easterly_waves.rst
+African Easterly Waves
+======================
+Identify AEWs based on an approach similar to `Belanger et al. (2016) <https://doi.org/10.1002/gdj3.40>`_.
+.. autoclass:: enstools.feature.identification.african_easterly_waves.AEWIdentification
--- a/docs/source/identification_overlap_example.rst
+++ b/docs/source/identification_overlap_example.rst
+Overlap Example
+===============
+Simple starting point to identify objects which should later be tracked via overlap. It creates a new field and writes `i` at positions where object with ID `i` has been identified.
+.. autoclass:: enstools.feature.identification.overlap_example.OverlapIdentificationExample
--- a/docs/source/identification_pv_streamer.rst
+++ b/docs/source/identification_pv_streamer.rst
+PV Streamer
+===========
+Identify PV anomalies in 2D (streamers) or 3D, see `Fischer et al. (2022) <https://doi.org/10.5194/gmd-2021-424>`_.
+.. autoclass:: enstools.feature.identification.pv_streamer.PVIdentification
--- a/docs/source/identification_template.rst
+++ b/docs/source/identification_template.rst
+Custom Identification
+=====================
+The starting template for use. If you want to identify areas and track them via overlap, you can use `overlap_example` instead.
+We provide some template files, which we recommend as a starting point for your own identification strategy. If you want to add your own identification (and tracking) strategy to the framework, you need to:
+- Copy over the template folder and rename it and the files accordingly. If you implement a tracking method, which relys on pairwise comparison of objects from consecutive timesteps, you can use the `template_object_compare`
+- In the `__init__.py`, rename the class name to your identification strategy.
+- In the `*.proto` file, define the variables each of the detected objects should have. They follow the `protobuf protocol <https://developers.google.com/protocol-buffers/docs/proto>`_. The template file also provides a useful example. proto-files are compiled automatically on running the identification.
+- In the `identification.py` (`tracking.py`), implement your identification (tracking) strategy. See the template again for a useful example. There are a few methods:
+    - `__init__` gets called from the run script, so the user can set parameters for the algorithm here.
+    - `precompute` is called once for the entire data set. The data set can be altered here (temporally and spatially). Also if the strategy should return an additional field (`DataArray`), it should be initialized here as shown in the template.
+    - In `identify` goes your identification strategy. This method is called in parallel, and should return a list of objects. See the template and the docstrings for more information. It returns the provided subset (which can be modified in terms of values), and a list of objects. New (empty) objects can be obtained using `o = self.get_new_object(id=obj_id)`, returning an object `o` with the set ID `o.id` and the object properties defined via the protobuf description at `o.properties`.
+    - `postprocess` is called once for the entire data set after identification. The data set and the object description can be changed here.
+Template class
+--------------
+.. autoclass:: enstools.feature.identification.template.IdentificationTemplate
--- a/docs/source/identification_threshold.rst
+++ b/docs/source/identification_threshold.rst
+Threshold-based Features
+========================
+Identifiy features that fall above or below a or multiple threshold(s).
+.. note::
+    The threshold-based identification strategies require `numba <https://numba.pydata.org/>`_ to be installed.
+Identification Strategies
+-------------------------
+.. autoclass:: enstools.feature.identification.threshold.DoubleThresholdIdentification
+.. autoclass:: enstools.feature.identification.threshold.SingleThresholdIdentification
+Utilities
+---------
+.. autofunction:: enstools.feature.identification.threshold.proto_to_mask
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+Welcome to the enstools-feature documentation!
+==============================================
+:py:mod:`enstools.feature` is a modular framework for the identificaton and tracking of meteorological structures.
+It comes with an easy-to-use interface and automatic parallelization.
+For consistent, compatible and readable descriptions of the output data structures, this framework uses `protobuf`.
+This package is an extension module of the :py:mod:`enstools` Python package and is developed within the framework of `Waves to Weather <https://wavestoweather.de>`_.
+More information on :py:mod:`enstools` is available on its `GitHub page <https://github.com/wavestoweather/enstools>`_.
+Content
+-------
+.. toctree::
+   :maxdepth: 1
+   installation
+   pipeline
+   identification
+   tracking
+A quickstart example is provided in the description of the feature pipeline.
+Output formats are described on the overview pages of the identification and tracking sub-modules.
+Acknowledgment and license
+--------------------------
+:py:mod:`enstools.feature` is a collaborative development within Waves to Weather (SFB/TRR165) project, and funded by the German Research Foundation (DFG).
+Full list of code contributors: TODO
+The code is released under the Apache-2.0 license.
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
+Installation
+============
+.. warning::
+    TODO
+Install from PyPI:
+.. code-block:: text
+    pip install enstools-feature
+Dependencies:
+- TODO
+- TODO
+- TODO
+Depending on the used feature identification strategies, additional packages may be required.
+Installing in a new conda environment
+-------------------------------------
+Install protobuf compiler, e.g. from your operating system repositories
+.. code-block:: text
+    apt-get install protobuf-compiler
+We recommend using a conda environment.
+Create and activate the environment in the enstools-package folder
+.. code-block:: text
+    conda create --name enstools-feature python=3.7
+    conda activate enstools-feature
+    conda config --add channels conda-forge
+    conda config --set channel_priority strict
+Install requirements listed in given venv_setup.sh
+.. code-block:: text
+    pip install --upgrade pip
+    pip install wheel numpy==1.20.0
+Integrate enstools
+.. code-block:: text
+    pip install -e git+https://github.com/wavestoweather/enstools.git@main#egg=enstools
+Install requirements for enstools-feature
+.. code-block:: text
+    conda install --file requirements.txt
+    pip install -e .
--- a/docs/source/pipeline.rst
+++ b/docs/source/pipeline.rst
+Feature Pipeline
+================
+TODO: Workflow description here.
+Usage Example
+-------------
+TODO
+.. code-block:: python
+    from enstools.feature import FeaturePipeline
+    from enstools.feature.identification.STRATEGY import SomeIdentification
+    from enstools.feature.tracking.STRATEGY import SomeTracking
+    # Configure suitable identification and tracking strategies
+    ident_tech = SomeIdentification(...)
+    track_tech = SomeTracking(...)
+    # Configure the feature pipeline: apply the strategies chosen above and
+    # provide an input dataset
+    pipeline = FeaturePipeline(processing_mode="2d")
+    pipeline.set_identification_strategy(ident_tech)
+    pipeline.set_tracking_strategy(track_tech) # omit or set to None if no tracking desired
+    pipeline.set_data_path("path/to/input/data.nc") # or directly provide a Dataset with set_data
+    # Run the identification and tracking on the data
+    pipeline.execute()
+    # TODO: output handling
+Quickstart
+----------
+Here is a usage example, if you want to apply existing strategies in the code base to your data set.
+First, we need some imports, namely the
+- :py:class:`enstools.feature.pipeline.FeaturePipeline`, which executes the identification pipeline
+- :py:class:`enstools.feature.identification.template.IdentificationTemplate`, this is the identification strategy, edit this accordingly
+- :py:class:`enstools.feature.tracking.template.TrackingTemplate`, this is the tracking strategy, edit this accordingly
+- :py:class:`enstools.feature._proto_gen.template_pb2`, this is the on run auto-generated protobuf python file from the set description. Use the one that matches your detection strategy. They are named \*_pb2, where \* is the name of the identification module.
+.. warning::
+    [TODO] should not really need to set the template here, is specific to identification strategy!
+.. code-block:: python
+    from enstools.feature.pipeline import FeaturePipeline
+    from enstools.feature.identification.template import IdentificationTemplate
+    from enstools.feature.tracking.template import TrackingTemplate
+    from enstools.feature.identification._proto_gen import template_pb2
+Then, we initialize the pipeline with the protobuf description and optional the processing mode. For 3D data, this resembles if identification should be performed individual on 2D (latlon) or 3D subsets.
+.. code-block:: python
+    pipeline = FeaturePipeline(template_pb2, processing_mode='2d')
+Then, we initialize and set our strategies. The tracking can be set to `None` to be ignored.
+.. code-block:: python
+    i_strat = IdentificationTemplate(some_parameter='foo')
+    t_strat = TrackingCompareTemplate()
+    pipeline.set_identification_strategy(i_strat)
+    pipeline.set_tracking_strategy(t_strat) # or None as argument if no tracking
+Next, set the data to process.
+.. code-block:: python
+    pipeline.set_data_path(path)
+Then, the pipeline can be executed, starting the identification and subsequently the tracking.
+.. code-block:: python
+    pipeline.execute()
+    # or separated...
+    # pipeline.execute_identification()
+    # pipeline.execute_tracking()
+This generates an object description based on the set protobuf format. If tracking has been used, tracks based on a default simple heuristic can be generated. See docstrings for further details. The object description holds the objects, and if tracking has been executed a graph structure and the generated tracks respectively.
+.. code-block:: python
+    pipeline.generate_tracks()
+    od = pipeline.get_object_desc()
+The output data set and description can be saved:
+.. code-block:: python
+    pipeline.save_result(description_type='json', description_path=..., dataset_path=...)
+Class Documentation
+-------------------
+.. autoclass:: enstools.feature.pipeline.FeaturePipeline
--- a/docs/source/tracking.rst
+++ b/docs/source/tracking.rst
+Tracking
+========
+Some of the tracking strategies we provide include:
+.. toctree::
+   :maxdepth: 1
+   tracking_african_easterly_waves
+   tracking_overlap
+   tracking_overlap_double_threshold
+   tracking_template_object_compare
+   tracking_template
+Output data format
+------------------
+.. autoclass:: enstools.feature.util.graph.DataGraph
--- a/docs/source/tracking_african_easterly_waves.rst
+++ b/docs/source/tracking_african_easterly_waves.rst
+African Easterly Waves
+======================
+Tracking of AEWs, by comparing location of line strings.
+.. autoclass:: enstools.feature.tracking.african_easterly_waves.AEWTracking
--- a/docs/source/tracking_overlap.rst
+++ b/docs/source/tracking_overlap.rst
+Overlap Tracking
+================
+General overlap tracking.
+It takes the name of the :py:class:`DataArray` as parameter, ideally one where the values represents the object's id at the location.
+It works well together with the :py:mod:`overlap_example` identification.
+.. autoclass:: enstools.feature.tracking.overlap_tracking.OverlapTracking
--- a/docs/source/tracking_overlap_double_threshold.rst
+++ b/docs/source/tracking_overlap_double_threshold.rst
+Overlap with Double Threshold
+=============================
+...
+.. autoclass:: enstools.feature.tracking.overlap_double_threshold_tracking.OverlapDoubleThresholdTracking
--- a/docs/source/tracking_template.rst
+++ b/docs/source/tracking_template.rst
+Custom Tracking
+===============
+Template for a fallback tracking strategy which requires more complex heuristics than above mentioned ones.
+Template class
+--------------
+.. autoclass:: enstools.feature.tracking.template.TrackingTemplate
No results found