# How to check the accuracy of your classification model

## Use accuracy score, confusion matrix, and F1-score to check how accurate your classification model is.

In the previous post we built a decision tree model with scikit-learn. It attempts to predict which customers had life insurance based on their income and property status.

Step 7 of the development process is to **check the accuracy of the model**. Which is what we’ll look at here.

You can follow along by downloading the **Jupyter Notebook** and data from Github.

Fork on Github

We already split our data into train and test sets before fitting the model. we can now use the **test set** to see how well the model performs.

For demonstration purposes, the **output test set** `y_test`

has 6 records. These are the actual output values which correspond to the **input test set** `X_test`

. Let’s take a look:

### y_test

has_life_insurance | |
---|---|

1 | 0 |

2 | 0 |

3 | 1 |

4 | 1 |

5 | 0 |

6 | 1 |

Now we want to see what the model predicted for the test input `X_test`

. That is, what the model predicted for `has_life_insurance`

given its inputs `income_usd`

and `property_status`

. We can do this using `predict()`

:

```
y_predicted = model.predict(X_test)
```

### y_predicted

has_life_insurance | |
---|---|

1 | 0 |

2 | 0 |

3 | 0 |

4 | 1 |

5 | 0 |

6 | 1 |

A quick glance at the data shows that the model predicted **5 out of 6 cases correctly**, with only row 3 being incorrectly classified.

## Accuracy score

Accuracy score is the number of **correct predictions** divided by the **total number of predictions**.

It’s an intuitive measure that’s easy to understand. In fact, we’ve already calculated it above when we said that the model predicted **5 out of 6**, or **83.3%** of cases correctly.

An easy way to calculate this in Python is with `accuracy_score()`

.

```
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, y_predicted)
---
0.8333
```

A low accuracy score is a sign that there are some issues with the model. You may want to increase your sample size or retrain using a different algorithm.

## Confusion matrix

A confusion matrix simply shows you where any differences are coming from, i.e. are there certain areas of your model which are misclassifying?

Let’s take a look at our example:

```
matrix = pd.DataFrame(
confusion_matrix(y_test, y_predicted),
index=['actual: 0', 'actual: 1'],
columns=['predicted: 0', 'predicted: 1']
)
```

predicted | ||
---|---|---|

actual |
0 |
1 |

0 |
3 | 0 |

1 |
1 | 2 |

The confusion matrix above shows **how many observations** were in each predicted and actual classification. The cells highlighted in yellow are where the model predicted the `has_life_insurance`

flag correctly.

Each cell in the matrix can be classified like this:

predicted | ||
---|---|---|

actual |
0 |
1 |

0 |
True negative | False positive |

1 |
False negative | True positive |

If there are a high number of false negatives or false positives then you can focus your attention on fixing those cases in your model.

## F1-score

The **F1-score** for a model is calculated using **precision** and **recall**.

For a binary classifier like we have here, the precision, recall, and F1-score for the model can be calculated like this.

### Precision

Precision is the percentage of positive predictions which were correct.

I.e. true positives as a percentage of all the positive predictions - column `predicted = 1`

in the confusion matrix.

```
Precision = True positive / (True positive + False positive)
= 2 / (2 + 0)
= 1.00
```

### Recall

Recall is the percentage of actual positive values which were predicted correctly.

I.e. the percentage of true positives in the row where `actual = 1`

.

```
Recall = True positive / (True positive + False negative)
= 2 / (2 + 1)
= 0.67
```

### F1-score

The F1-score ranges from **0 to 1**, with 1 being the best and 0 being the worst.

F1-score is the harmonic mean of **precision** and **recall**.

```
F1-score = 2 * (Precision * Recall) / (Precision + Recall)
= 2 * (1.00 * 0.67) / (1.00 + 0.67)
= 0.80
```

For a model which classifies into more than one category, the precision, recall, and F1-score of each class can be calculated. Then an average of these F1-scores can be used as a score for the entire model - more on this below.

## Classification report

If all these calculations seem laborious, don’t worry! These all get calculated for you with `classification_report()`

.

```
from sklearn.metrics import classification_report
print(classification_report(y_test, y_predicted))
---
precision recall f1-score support
0 0.75 1.00 0.86 3
1 1.00 0.67 0.80 3
accuracy 0.83 6
macro avg 0.88 0.83 0.83 6
weighted avg 0.88 0.83 0.83 6
```

For our case, where we only have a binary output, `0`

or `1`

, you should read the row with `1`

for the model’s overall precision, recall, and F1-score.

You can see that the model has an F1-score of 0.8, which is what we calculated manually above.

If we had a multi-class model (i.e. more output options than just `0`

and `1`

), then you could use the weighted average of each class’s F1-score as a score for the model. This takes into account the number of actual observations for each class, which is shown under the `support`

column. This is calculated for you in the `weighted avg`

row.

## Accuracy score vs. F1-score

One of the benefits of using accuracy score is that it’s easy to interpret. If a model predicts 95% of the classifications correctly, then the accuracy score will be 95%.

However, this can be a problem for cases where the model is predicting something that actually happens 95% of the time.

For example, the probability that a patient is healthy. If the model simply says that every patient patient is healthy regardless of the inputs, it would still have a high accuracy score of 95% as it would be correct 95% of the time!

This is where F1-score comes in useful, as it takes into account how the data is distributed between true/false positives and negatives. Therefore, it’s a good idea to use F1-score when we can see a large imbalance between the groups on a confusion matrix.