Model Performance Testing

March 27, 2021

Model Performance Testing - 1 [ML]

You have to keep it,

Statistics only works with data, we define what is bad and what is good. And this good-bad totally depends on how we use the model.

There are many theories, let's see a little implementation.

How is it performing on the data that has been trained

We divided the data into two parts before starting work, one training the other testing. I have to memorize this so many times. However, we will now see how the data that has been trained is predicted if it is fed that data.

It's a lot like the example of that name,

If it is training data:

3 x 1 = 3

3 x 2 = 6

...

3 x 10 = 30

Then I would like to know what kind of trend has been done in the training data,

3 x 1 =?

Run the following code to try on your model, (although not mentioned before, continue from what you have done on Jupyter Notebook so far)

# This returns array of predicted results

prediction_from_trained_data = nb_model.predict (X_train)

Now the array is assigned to the prediction_from_trained_data variable. If we want, we can now sit down with the notebook pen and see the resulting key in each observation on the dataset and the predicted result of the model we have created in each observation on the dataset.

Or another thing that can be done, we can check with the built-in module of the Psychic-Learn Library how many of our models could catch correct diabetes and how many could not.

If you have a Jupiter notebook open instead of a notebook pen, start writing there,

# performance metrics library

from sklearn import metrics

# get current accuracy of the model

accuracy = metrics.accuracy_score (y_train, prediction_from_trained_data)

print ("Accuracy of our naive bayes model is: 0: .4f}". format (accuracy))

If you have forgotten

We split the dataset into four variables, X_train, y_train, X_test, y_test

Where,

X_train = input values to train [no_of_preg, insulin, glucose ... etc] (80% of whole dataset)

y_train = X_train's corresponding output [diabetes -> yes / no] (since the corresponding value of y_train, this means 80%)

X_test = Testing input values [30% of the whole dataset and 30% of this training data do not exist]

y_test = Corresponding output of test input

Since we see the accuracy of the trend data, it is not normal that in the metrics.accuracy_score function we have the output of the model in X_train (prediction_from_trained_data) and the actual output of X_train (y_train).

The output of the previous code snippet

The output of the previous code snippet is,

The accuracy of our naive Bayes model is: 0.7542

The target in our solution statement was to predict at 80 or more accuracy. But here we can see that the accuracy is about 75%.

Stop, there is nothing to celebrate before. This accuracy score but on the trained data, I mean by training him with this data I am testing the prediction on that data again. That is, the contents of the syllabus were asked.

Performance in testing data

Now if I tell you to write the code of what will be the performance in the testing data, then notice if the code below matches what you used to do,

# this returns an array of predicted results from test_data

prediction_from_test_data = nb_model.predict (X_test)

accuracy = metrics.accuracy_score (y_test, prediction_from_test_data)

print ("Accuracy of our naive bayes model is: {0: 0.4f}". format (accuracy))

Output

The accuracy of our naive Bayes model is: 0.7000

This means that even if he asks a question from outside the syllabus, he can answer it with 80% accuracy, so 80% of his answer is correct and the rest is wrong.

This is what we wanted, that is, if we input the data of a newly tested person in this trend model, the probability of his answer being correct is 83%. If the model says the new person may have diabetes, his likelihood is 80%.

But

Yeah Al that sounds pretty crap to me, Looks like BT ain't for me either. The next painful task of data collection is performance testing and making the necessary changes.

Performance testing of classification type problems: Confusion matrix

Our problem is the classification type and there are some different measurements for it to test the performance. Not to mention the Confusion Matrix. There is nothing to be confused by the name. In addition to writing the code, we will learn more about it.

For now, let us know with the Confusion Matrix how our model performs. Then enter the following code,

print ("Confusion Matrix")

# labels for set 1 = True to upper left and 0 = False to lower right

print ("0". format (metrics.confusion_matrix (y_test, prediction_from_test_data, labels = [1, 0])))

Confusion matrix

Predicted True (col 0)

Predicted False (col 1)

Actual True row -> 0

52 (TP)

28 (FP)

Actual False row -> 1

33 (FN)

118 (TN)

We can express table numbers by TP, FP, FN and TN. Where,

TP = the actual output is 1 or there is a possibility of diabetes ** and ** the model we made also predicted 1

FP = the actual output is 0 or not likely to be diabetic ** but ** the model we created is predicting 1

FN = the actual output is 1 or there is a possibility of diabetes ** but ** the model we created is predicting 0

TN = the actual output is 0 and our model is also predicting 0

If the Confusion Matrix seems a little confusing, well think againDo and try to understand the table with your thoughts.

In shortcuts,

TP = Detected how many incidents occurred and as occurred

FP = Detected how many incidents did not occur but occurred

FN = Detected how many incidents occurred but did not occur

TN = How many incidents did not occur and were not detected

Let's see again, then according to the above statistics,

52 cases detected as diabetes and 52 people actually have diabetes <- TP

28 cases detected as diabetes but those 28 are not actually diabetic <- FP

33 cases not detected as diabetes but those 33 people are actually diabetic <- FN

118 cases not detected as diabetes and those 118 people are not diabetic <- TN

If our model was 100% accurate then what would be its confusion matrix?

It is easy to understand that in case of 100% Accurate Model, FP = 0 and FN = 0. Then the confusion matrix will be like this,

Predicted True

Predicted False

Actual True

80 (TP)

0 (FP)

Actual False

0 (FN)

151 (TN)

Confusion Matrix Review: Classification Report

We can see some more statistics reports to find out the model accuracy through the Confusion matrix. In addition to looking at the formula, we will also look at how to find out with the built-in function of the bicycle.

The classification report is actually generated on top of the data in the Confusion matrix. Run the following statement to view the classification report,

print ("Classification Report")

# labels for set 1 = True to upper left and 0 = False to lower right

print ("{0}". format (metrics.classification_report (y_test, prediction_from_test_data, labels = [1, 0])))

Report output

Classification Report

precision recall f1-score support

1 0.61 0.65 0.63 80

0 0.81 0.78 0.79 151

avg / total 0.74 0.74 0.74 231

Here we will discuss two topics, one is Precision and the other is Recall.

Precision extraction formula

$$ Precision = \ frac {TP} {TP + FP}

That is, for Perfect Precision we know, FP = 0, so of 100% Accurate Model

$$ Precision = \ frac {TP} {TP + 0} = \ frac {TP} {TP} = 1 $$ means the larger the value of Precision, the better. Our target will be to increase the value of Precision as much as possible. ##### `Recall` Output Formula $$ Recall = \ frac {TP} {TP + FN

Similarly, in the case of 100% Accurate Model

$$ Recall = \ frac {TP} {TP + 0} = \ frac {TP} {TP} = 1

Again, we aim to increase the value of the recall as much as possible.

Precision - 0.61 & Recall - 0.65 Not bad, but its value can be further increased. We will make that effort.

What are the ways to improve performance?

We can increase the performance of the model through the following methods

Adjust or modify the algorithm I am in

Gathering more data or improving the data frame

Trying to improve training

Algorithm change

Let's change the algorithm instead: Random Forest

Why look at Random Forest? Because

It is Ensemble Algorithm (Simply Advanced and Complex).

Data subsets can have many trees

Average tree results so that overfitting is under control and performance is good

Nothing needs to be done with our data, since we have already done the preprocessing work. I will just create new models and train with data and test performance with test data.

Enter the code below,

from sklearn.ensemble import RandomForestClassifier

# Create a RandomForestClassifier object

rf_model = RandomForestClassifier (random_state = 42)

rf_model.fit (X_train, y_train.ravel ())

By typing this, if you enter any of the following output, you will understand that the training is over, now it is time to test the performance.

Output:

RandomForestClassifier (bootstrap = True, class_weight = None, criterion = 'gini',

max_depth = None, max_features = 'auto', max_leaf_nodes = None,

min_samples_leaf = 1, min_samples_split = 2,

min_weight_fraction_leaf = 0.0, n_estimators = 10, n_jobs = 1,

oob_score = False, random_state = 42, verbose = 0, warm_start = False)

Random Forest Performance Testing: Predict Training Data

Code as before,

rf_predict_train = rf_model.predict (X_train)

#get accuracy

rf_accuracy = metrics.accuracy_score (y_train, rf_predict_train)

#print accuracy

print ("Accuracy: 0: .4f}". format (rf_accuracy))

Output:

Accuracy: 0.9870

Unstable! Isn't it? He memorized the dataset well. Let's see how the performance in the testing data!

Random Forest Performance Testing: Predict Testing Data

rf_predict_test = rf_model.predict (X_test)

#get accuracy

rf_accuracy_testdata = metrics.accuracy_score (y_test, rf_predict_test)

#print accuracy

print ("Accuracy: 0: .4f}". format (rf_accuracy_testdata))

Output:

Accuracy: 0.7000

Accuracy is 60% in testing data and 98% in training data. In other words, the answer to the question of the syllabus is good, but if you ask a question from outside, the answer is quite bad. This means that he cannot predict well in real world data, but he can predict well from what he has learned through that dataset.

Our Naive Bayes model was working better than this! We need good accuracy in our testing data.

Let's see how the classification report of the Random Forest model! We are subI can change the code borrowed from the r and put it!

print ("Confusion Matrix for Random Forest")

# labels for set 1 = True to upper left and 0 = False to lower right

print ("ric 0". format (metrics.confusion_matrix (y_test, rf_predict_test, labels = [1, 0])))

print ("")

print ("Classification Report \ n")

# labels for set 1 = True to upper left and 0 = False to lower right

print ("ric 0}". format (metrics.classification_report (y_test, rf_predict_test, labels = [1, 0])))

Output:

Confusion Matrix for Random Forest

[[43 37]

[30 121]

Classification Report

precision recall f1-score support

1 0.59 0.54 0.56 80

0 0.77 0.80 0.78 151

avg / total 0.70 0.71 0.71 231

See here, the quality of precision and recall is also worse than our previous Naive Bayes.

Search This Blog

Everything of Artificial Intelligence (A.I)

Model Performance Testing - 1 [ML]

Comments

Post a Comment