"Following example taken from: \n",
"**[Data science from Scratch](http://shop.oreilly.com/product/0636920033400.do)** by Joel Grus"
## k-Nearest Neighbors
We use k-Nearest Neighbors (kNN) as detailed showcase for an ML model, i.e.
* we will not just show how to use it in one of the standard ML packages
* but discuss in some detail the implementation in Python functions
kNN is conceptually simple:
* need a sample with known classifications
* for new data look at elements from known sample in **neighborhood**
* requires some metric to define **distance**
* classify according to **majority classification** of these neighbors
**Real world example -- elections**
Elections results, i.e. which party is most popular strongly varies between regions. So if you want to predict how a specific person votes then the place where a person lives and how the neighbors voted provides useful information.
Examples from Bundestagswahl 2017:
* Wahlkreis Jachenau (Bad Tölz) ~62% CSU
* Wahlbezirk Nürnberg-4553 ~45% SPD
* though extreme cases, many "Wahl-Bezirke" rather balanced
Of course other information might be more important to predict voting decision:
*education, income, profession, hobbies, ...*
In the following we discuss an example kNN implementation adapted from the book *Data Science from Scratch*
What's needed:
* toy data:
* artificial poll data of person's programming language preference and geographic location (longitude vs latitude)
**metric* for distance:
* simply geographical distance
**list of neighbors* sorted by distance
