Learn about two important metrics, precision and recall, when it comes to evaluating a machine learning model beyond just accuracy and error percentage.
Precision and recall are two important measures of accuracy in a machine learning model. Precision is the ratio of a model’s classification of all positive classifications as actually positive. Recall (also known as the true positive rate) is the ratio of all actual positives classified correctly as positives. Precision and recall help in classification problems where data falls into one class more often than the other class.
Explore the difference between precision versus recall in machine learning, the uses of each metric, their advantages and limitations, and how they work together to give information on how a machine learning model works.
Precision assesses how many positives the model correctly guessed, which helps reduce the number of false positives (FPs) in a model. This makes up a ratio of true positives (TPs) to the actual number of positives in the data set. The formula to calculate precision is as follows:
Precision = true positives / true positives + false positives = TP / TP + FP
For example, a computer vision model searches for birds in a series of images. Precision measures the number of images properly classified as birds (TP) compared to the total number of images classified as birds throughout the entire data set (TP + FP). In a perfect model, the ratio would be 1, indicating no false positives.
You use precision in machine learning to gauge the accuracy of your model’s positive predictions on the data set. Precision is a useful metric for imbalanced classification problems where one type of sample is more prominent than the other. The dominant sample is the target majority class, and the less common sample is the minority class. Precision is useful because it does well at identifying the majority class in the data set.
The main advantage of a model with high precision is its ability to bias for accuracy in the classification of true positives against false positives. When a model can’t afford to have false positives, optimizing it for precision is a tactic to improve its performance because having more accurate positives is more important than missing some positives in the model. A common example of this is spam email detection, where it is more important that your user doesn’t have an important email classified as spam rather than having some spam emails end up in their inbox.
When you focus solely on precision, you miss out on classifying false negatives in your model because precision only evaluates how a model performs at accurately classifying true positives. In the case above, if you optimize the spam email detection model only for precision, the user may get too many spam emails in their inbox, making the model not as useful. Or, with the example of the computer vision model, birds in the user’s image may be missed entirely if the model focuses too heavily on the accuracy of its positives.
This leads the discussion to recall, which is the other side of identifying accuracy in machine learning models.
Recall, also known as the true positive rate, evaluates how a machine learning model classifies all true positives in the data set. This metric tells you how many positives evaluated are actually true positives compared to all positives in the data set. The formula to calculate recall is as follows:
Recall = true positives / true positives + false negatives = TP / TP + FN
A false negative is a positive that has been incorrectly classified as a negative. In the computer vision example, recall is the total number of birds correctly classified as birds by the model. A perfect model would indicate a ratio of zero, indicating no false negatives.
The most important use of recall is when accurately identifying all positives in an application is critical. When you need to identify all possible instances of a positive in a machine learning model, you use recall even at the risk of identifying false positives. This makes recall useful when a false negative is more concerning than a false positive.
The main advantage of high recall is its bias for detecting positives when evaluating a machine learning model. This produces useful advantages in critical systems applications like detecting cyber threats or fraud, where missing a true positive is more costly than further examining any false positives the model may produce.
Take a bank fraud detection system, for example. You would want your algorithm to flag as many true positives as possible, even at the risk of producing false positives because a false negative (a situation where the system says bank fraud didn't occur when it did) is much more costly than examining false positives of bank fraud.
When you focus solely on recall, you create models that produce many false positives. This limitation becomes an issue in a model where the cost of false positives is higher than false negatives. For example, if you create an email spam detection model that focuses solely on recall, many non-spam emails go into the spam folder, requiring the user to review the spam folder more, potentially missing an important email.
In machine learning, you need to examine the precision and recall in your model because increasing precision can potentially decrease recall and vice versa. When creating a model ask yourself, you should ask at the onset:
Does the model need to detect all positive samples at the risk of a false negative? If yes, then optimize for better recall.
Does the model need to accurately classify a sample as a true positive in order to reduce the number of false positives? If yes, then optimize for better precision.
You may use a precision-recall curve to examine the relationship between precision and recall in machine learning, which plots the precision versus recall on a graph. The first step to visualizing precision and recall is to examine a model’s confusion matrix, which is a table that lists the true positives, false positives, true negatives, and false negatives—which are all elements that you need to calculate the precision and recall values.
The F1 score (or F-measure) combines precision and recall into one metric so that you can optimize for the best precision and recall at the same time. Once you calculate your precision and recall from the confusion matrix, you can calculate your F1 score using the formula:
F1 score= 2*precision*recall / precision + recall
This value represents the harmonic mean between precision and recall and is a common metric in imbalance classification problems. A perfect F1 score is 1.0, indicating perfect precision and recall, while the worst score possible is 0.0. When optimizing a machine learning model for precision and recall, you want to maximize your F1 score to achieve this balance.
When it comes to precision versus recall in machine learning, you often want to find a balance in imbalanced classification models. To learn more about precision and recall in machine learning try the Machine Learning: Classification course from the University of Washington, which is one part of their Machine Learning Specialization on Coursera. If you want another option to start in machine learning, try the Machine Learning Specialization from Deep.AI and Stanford to gain in-demand skills in the field, which is also available on Coursera.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.