Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • Eric.Schanet/KerasROOTClassification
  • Nikolai.Hartmann/KerasROOTClassification
2 results
Show changes
Commits on Source (280)
Showing
with 480 additions and 168 deletions
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
.DS_Store
.idea/
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# dotenv
.env
# virtualenv
.venv
venv/
ENV/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
outputs/
setup.sh
run.py
*.swp
*.pyc
*.pdf
*.root
# Extract the activation maps of your Keras models
[![license](https://img.shields.io/badge/License-Apache_2.0-brightgreen.svg)](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE) [![dep1](https://img.shields.io/badge/Tensorflow-1.2+-blue.svg)](https://www.tensorflow.org/) [![dep2](https://img.shields.io/badge/Keras-2.0+-blue.svg)](https://keras.io/)
# KerasROOTClassification
*Short code and useful examples to show how to get the activations for each layer for Keras.*
This is an attempt to simplify the training of Keras models from ROOT TTree input.
**-> Works for any kind of model (recurrent, convolutional, residuals...). Not only for images!**
The recommended usage is to put this module in your python path and
create run scripts to define and train your model.
## Example of MNIST
For example:
Shapes of the activations (one sample) on Keras CNN MNIST:
```
----- activations -----
(1, 26, 26, 32)
(1, 24, 24, 64)
(1, 12, 12, 64)
(1, 12, 12, 64)
(1, 9216)
(1, 128)
(1, 128)
(1, 10) # softmax output!
```
Shapes of the activations (batch of 200 samples) on Keras CNN MNIST:
```
----- activations -----
(200, 26, 26, 32)
(200, 24, 24, 64)
(200, 12, 12, 64)
(200, 12, 12, 64)
(200, 9216)
(200, 128)
(200, 128)
(200, 10)
```
<p align="center">
<img src="assets/0.png" width="50">
<br><i>A random seven from MNIST</i>
</p>
```python
import numpy as np
import logging
from KerasROOTClassification import ClassificationProject
<p align="center">
<img src="assets/1.png">
<br><i>Activation map of CONV1 of LeNet</i>
</p>
logging.basicConfig()
logging.getLogger("KerasROOTClassification").setLevel(logging.INFO)
<p align="center">
<img src="assets/2.png" width="200">
<br><i>Activation map of FC1 of LeNet</i>
</p>
c = ClassificationProject("my_project", # this will also be the name of the project directory
signal_trees = [(filename1, treename1)],
bkg_trees = [(filename2, treename2),
(filename3, treename3),
],
optimizer="Adam",
selection="some-selection-expression",
branches = ["var1", "var2", "var3"],
weight_expr = "some-weight-expression",
identifiers = ["var4", "var5"], # variables that identify which events were used for training
step_bkg = 10, # take every 10th bkg event for training
step_sig = 2, # take every second sig event for training
)
c.train(epochs=20)
```
<p align="center">
<img src="assets/3.png" width="300">
<br><i>Activation map of Softmax of LeNet. <b>Yes it's a seven!</b></i>
</p>
Previously created projects can be inspected in iypthon like
<hr/>
```
ipython -i -m KerasROOTClassification.browse <project-dir>
```
The function for visualizing the activations is in the script [read_activations.py](https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py)
Or in a script you can initialise a project by just specifying the path to the project directory. This is especially useful when you want to compare different projects:
Inputs:
- `model`: Keras model
- `model_inputs`: Model inputs for which we want to get the activations (for example 200 MNIST images)
- `print_shape_only`: If set to True, will print the entire activations arrays (might be very verbose!)
- `layer_name`: Will retrieve the activations of a specific layer, if the name matches one of the existing layers of the model.
```python
from KerasROOTClassification import ClassificationProject
from KerasROOTClassification.compare import overlay_ROC, overlay_loss
Outputs:
- returns a list of each layer (by order of definition) and its corresponding activations.
c1 = ClassificationProject("path/to/project1")
c2 = ClassificationProject("path/to/project2")
I provide a simple example to see how it works with the MNIST model. I separated the training and the visualizations because if the two were to be done sequentially, we would have to re-train the model every time we would like to visualize the activations! Not very practical! Here are the main steps:
overlay_ROC("ROC_overlay.pdf", c1, c2)
overlay_loss("loss_overlay.pdf", c1, c2)
```
Running `python model_train.py` will do:
# Conda setup
- define the model
- if no checkpoints are detected:
- train the model
- save the best model in checkpoints/
- load the model from the best checkpoint
- read the activations
An example for a mini conda setup that contains the nescessary packages:
`model_multi_inputs_train.py` contains very simple examples to visualize activations with multi inputs models.
```sh
conda install keras pandas matplotlib scikit-learn pydot graphviz jupyter
pip install root_numpy
```
from .toolkit import *
from .compare import *
from .add_friend import *
#!/usr/bin/env python
import ROOT
def add_friend(infile, intree, outfile, outtree):
root_outfile = ROOT.TFile.Open(outfile, "UPDATE")
root_infile = ROOT.TFile.Open(infile)
friend_name = outtree+"_friend_"+intree
for k in root_outfile.GetListOfKeys():
if k.GetName() == friend_name:
print("Tree with name {} already exists in outputfile - writing new cycle!".format(intree))
root_outfile.cd()
root_outtree = root_outfile.Get(outtree)
if not root_outtree:
raise KeyError("Tree {} not found in file {}".format(outtree, outfile))
if root_outtree.GetListOfFriends():
for k in root_outtree.GetListOfFriends():
if k.GetName() == friend_name:
print("Tree with name {} is already friend of {} - writing new cycle!".format(intree, outtree))
root_infile.cd()
root_intree = root_infile.Get(intree)
if not root_intree:
raise KeyError("Tree {} not found in file {}".format(intree, infile))
# Add friend and write friend tree and original tree to outfile
root_outfile.cd()
clonetree = root_intree.CloneTree(-1, "fast")
clonetree.SetName(friend_name)
clonetree.SetTitle(friend_name)
clonetree.Write(friend_name)
root_outtree.AddFriend(clonetree)
root_outtree.Write(root_outtree.GetName())
root_infile.Close()
root_outfile.Close()
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description='add a friend tree to a tree in another file')
parser.add_argument("infile", help="input file that contains the friend tree")
parser.add_argument("intree", help="name of the friend tree")
parser.add_argument("outfile", help="output file where the friend tree should be added")
parser.add_argument("outtree", help="name of the tree (in output file) to which the friend should be added")
args = parser.parse_args()
add_friend(args.infile, args.intree, args.outfile, args.outtree)
import sys
import logging
import numpy as np
import matplotlib.pyplot as plt
from KerasROOTClassification import *
logging.basicConfig()
logging.getLogger("KerasROOTClassification").setLevel(logging.INFO)
c = load_from_dir(sys.argv[1])
cs = []
cs.append(c)
if len(sys.argv) > 2:
for project_name in sys.argv[2:]:
cs.append(load_from_dir(project_name))
#!/usr/bin/env python
import logging
logger = logging.getLogger(__name__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from .toolkit import ClassificationProject
from .plotting import save_show
"""
A few functions to compare different setups
"""
def overlay_ROC(filename, *projects, **kwargs):
xlim = kwargs.pop("xlim", (0,1))
ylim = kwargs.pop("ylim", (0,1))
plot_thresholds = kwargs.pop("plot_thresholds", False)
threshold_log = kwargs.pop("threshold_log", True)
lumifactor = kwargs.pop("lumifactor", None)
tight_layout = kwargs.pop("tight_layout", False)
show_auc = kwargs.pop("show_auc", True)
if kwargs:
raise KeyError("Unknown kwargs: {}".format(kwargs))
logger.info("Overlay ROC curves for {}".format([p.name for p in projects]))
fig, ax = plt.subplots()
if plot_thresholds:
ax2 = ax.twinx()
ax2.set_ylabel("Thresholds")
if threshold_log:
ax2.set_yscale("log")
if lumifactor is not None:
ax_abs_b = ax.twinx()
ax_abs_s = ax.twiny()
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
for p, color in zip(projects, colors):
fpr, tpr, threshold = roc_curve(p.l_test, p.scores_test, sample_weight = p.w_test)
fpr = 1.0 - fpr
try:
roc_auc = auc(tpr, fpr)
except ValueError:
logger.warning("Got a value error from auc - trying to rerun with reorder=True")
roc_auc = auc(tpr, fpr, reorder=True)
ax.grid(color='gray', linestyle='--', linewidth=1)
if show_auc:
label = str(p.name+" (AUC = {:.3f})".format(roc_auc))
else:
label = p.name
ax.plot(tpr, fpr, label=label, color=color)
if plot_thresholds:
ax2.plot(tpr, threshold, "--", color=color)
if lumifactor is not None:
sumw_b = p.w_test[p.l_test==0].sum()*lumifactor
sumw_s = p.w_test[p.l_test==1].sum()*lumifactor
ax_abs_b.plot(tpr, (1.-fpr)*sumw_b, alpha=0)
ax_abs_b.invert_yaxis()
ax_abs_s.plot(tpr*sumw_s, fpr, alpha=0)
if xlim is not None:
ax.set_xlim(*xlim)
if ylim is not None:
ax.set_ylim(*ylim)
if lumifactor is not None:
ax_abs_b.set_ylim((1-ax.get_ylim()[0])*sumw_b, (1-ax.get_ylim()[1])*sumw_b)
ax_abs_b.set_xlim(*ax.get_xlim())
ax_abs_s.set_xlim(ax.get_xlim()[0]*sumw_s, ax.get_xlim()[1]*sumw_s)
ax_abs_s.set_ylim(*ax.get_ylim())
ax_abs_b.set_ylabel("Number of background events")
ax_abs_s.set_xlabel("Number of signal events")
# plt.xticks(np.arange(0,1,0.1))
# plt.yticks(np.arange(0,1,0.1))
ax.legend(loc='lower left', framealpha=1.0)
if lumifactor is None:
ax.set_title('Receiver operating characteristic')
ax.set_ylabel("Background rejection")
ax.set_xlabel("Signal efficiency")
if plot_thresholds or tight_layout:
# to fit right y-axis description
fig.tight_layout()
return save_show(plt, fig, filename)
def overlay_loss(filename, *projects, **kwargs):
xlim = kwargs.pop("xlim", None)
ylim = kwargs.pop("ylim", None)
log = kwargs.pop("log", False)
if kwargs:
raise KeyError("Unknown kwargs: {}".format(kwargs))
logger.info("Overlay loss curves for {}".format([p.name for p in projects]))
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
fig, ax = plt.subplots()
for p,color in zip(projects,colors):
hist_dict = p.csv_hist
ax.plot(hist_dict['loss'], linestyle='--', label="Training Loss "+p.name, color=color)
ax.plot(hist_dict['val_loss'], label="Validation Loss "+p.name, color=color)
ax.set_ylabel('loss')
ax.set_xlabel('epoch')
if log:
ax.set_yscale("log")
if xlim is not None:
ax.set_xlim(*xlim)
if ylim is not None:
ax.set_ylim(*ylim)
ax.legend(loc='upper right')
return save_show(plt, fig, filename)
if __name__ == "__main__":
import os
logging.basicConfig()
#logging.getLogger("KerasROOTClassification").setLevel(logging.INFO)
logging.getLogger("KerasROOTClassification").setLevel(logging.DEBUG)
filename = "/project/etp4/nhartmann/trees/allTrees_m1.8_NoSys.root"
data_options = dict(signal_trees = [(filename, "GG_oneStep_1705_1105_505_NoSys")],
bkg_trees = [(filename, "ttbar_NoSys"),
(filename, "wjets_Sherpa221_NoSys")
],
selection="lep1Pt<5000", # cut out a few very weird outliers
branches = ["met", "mt"],
weight_expr = "eventWeight*genWeight",
identifiers = ["DatasetNumber", "EventNumber"],
step_bkg = 100)
example1 = ClassificationProject("test_sgd",
optimizer="SGD",
optimizer_opts=dict(lr=1000., decay=1e-6, momentum=0.9),
**data_options)
example2 = ClassificationProject("test_adam",
optimizer="Adam",
**data_options)
if not os.path.exists("outputs/test_sgd/scores_test.h5"):
example1.train(epochs=20)
if not os.path.exists("outputs/test_adam/scores_test.h5"):
example2.train(epochs=20)
overlay_ROC("overlay_ROC.pdf", example1, example2)
overlay_loss("overlay_loss.pdf", example1, example2)
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
.DS_Store
.idea/
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# dotenv
.env
# virtualenv
.venv
venv/
ENV/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
File moved
# Extract the activation maps of your Keras models
[![license](https://img.shields.io/badge/License-Apache_2.0-brightgreen.svg)](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE) [![dep1](https://img.shields.io/badge/Tensorflow-1.2+-blue.svg)](https://www.tensorflow.org/) [![dep2](https://img.shields.io/badge/Keras-2.0+-blue.svg)](https://keras.io/)
*Short code and useful examples to show how to get the activations for each layer for Keras.*
**-> Works for any kind of model (recurrent, convolutional, residuals...). Not only for images!**
## Example of MNIST
Shapes of the activations (one sample) on Keras CNN MNIST:
```
----- activations -----
(1, 26, 26, 32)
(1, 24, 24, 64)
(1, 12, 12, 64)
(1, 12, 12, 64)
(1, 9216)
(1, 128)
(1, 128)
(1, 10) # softmax output!
```
Shapes of the activations (batch of 200 samples) on Keras CNN MNIST:
```
----- activations -----
(200, 26, 26, 32)
(200, 24, 24, 64)
(200, 12, 12, 64)
(200, 12, 12, 64)
(200, 9216)
(200, 128)
(200, 128)
(200, 10)
```
<p align="center">
<img src="assets/0.png" width="50">
<br><i>A random seven from MNIST</i>
</p>
<p align="center">
<img src="assets/1.png">
<br><i>Activation map of CONV1 of LeNet</i>
</p>
<p align="center">
<img src="assets/2.png" width="200">
<br><i>Activation map of FC1 of LeNet</i>
</p>
<p align="center">
<img src="assets/3.png" width="300">
<br><i>Activation map of Softmax of LeNet. <b>Yes it's a seven!</b></i>
</p>
<hr/>
The function for visualizing the activations is in the script [read_activations.py](https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py)
Inputs:
- `model`: Keras model
- `model_inputs`: Model inputs for which we want to get the activations (for example 200 MNIST images)
- `print_shape_only`: If set to True, will print the entire activations arrays (might be very verbose!)
- `layer_name`: Will retrieve the activations of a specific layer, if the name matches one of the existing layers of the model.
Outputs:
- returns a list of each layer (by order of definition) and its corresponding activations.
I provide a simple example to see how it works with the MNIST model. I separated the training and the visualizations because if the two were to be done sequentially, we would have to re-train the model every time we would like to visualize the activations! Not very practical! Here are the main steps:
Running `python model_train.py` will do:
- define the model
- if no checkpoints are detected:
- train the model
- save the best model in checkpoints/
- load the model from the best checkpoint
- read the activations
`model_multi_inputs_train.py` contains very simple examples to visualize activations with multi inputs models.
File moved
File moved
File moved
File moved
File moved