Supervised and Unsupervised Machine Learning

Published August 5, 2021

Doug Rose

Author | Agility | Artificial Intelligence | Data Ethics

Like people, machines can learn through supervised and unsupervised machine learning, but human learning differs from machine learning. With humans, supervised learning consists of formal education. An instructor presents the material, students study it and are tested on it, and areas of weakness are addressed, hopefully to the point at which students achieve mastery in that given subject area. Unsupervised learning is experiential, often referred to as "common sense." You venture out in the world and engage in daily activities, learning on your own and from making mistakes.

Machine learning differs in that it involves only a couple forms of learning, and those are determined by what you want the machine to do:

Supervised learning if you want to use the machine for classification (assigning items to different labeled classes) or regression (identifying the connection between an independent and a dependent variable).
Unsupervised learning if you want to use the machine for clustering (creating groups of like things) and association (identifying associations between things).

Supervised Learning

With supervised learning, a human trainer labels items in a small data set often referred to as the training data set. The machine has an advantage of knowing how the human trainer has classified the data. For example, suppose you want to train a machine to be able to distinguish between spam email messages and not-spam email messages. You feed several examples of spam messages into the machine and tell the machine, "These are spam." Then, you feed several examples of not-spam messages into the machine and tell it, "These are not spam."

The machine identifies patterns in both message groups — certain patterns that are characteristic of spam and other patterns characteristic of not-spam. Now, when you feed a message into the machine that is not labeled spam or not-spam, the machine should be able to tell whether the message is or is not spam.

Unfortunately, machines make mistakes. A certain message may not have a clear pattern that characterizes it as either spam or not-spam, so it may send some messages that are not spam to the Spam folder and allow some spam messages to reach your Inbox. Your machine clearly needs more training.

Additional training occurs when you mark a message in the Spam folder as "not spam," or when you move a spam message from your inbox to the spam folder. This provides the machine with valuable feedback that enables it to fine-tune its neural network, increasing its accuracy.

Supervised learning tends to be more useful than unsupervised learning in the following applications:

Classification: Assigning items to different classes, such as spam and not-spam.
Regression: Identifying the connection between an independent and a dependent variable; for example, a customer's spending habits and their ability to pay the mortgage.

Classification and regression are both considered to be predictive because they can be used to forecast the probability that a given input will result in a given output. For example, if you use a regression algorithm to identify a relationship between family income and high school graduation rates, you can use that relationship to predict a student's likelihood of graduating by looking at the student's family income.

Unsupervised Learning

With supervised learning, you feed the machine a data set and instruct it to group like items without providing it with labeled groups; the machine must determine the groups based on similarities and differences among the items in the data set.

For example, you might feed 1000 medical images into a machine and have it group the images based on patterns it detects in those images. The machine creates 10 groups and assigns images to each group. A doctor can then examine the different groups in an attempt to figure out why the machine grouped the images as it did. The benefit here is that the machine may identify patterns that doctors never thought to look for — patterns that may provide insights into diagnosis and treatment options.

Unsupervised learning tends to be more useful than supervised learning in the following applications:

Clustering: Splitting the data set into groups based on their similarities, as in the medical images example provided above.
Identifying associations: Identifying two or more patterns that tend to occur together in a data set. For example, a retailer may use unsupervised learning to identify products that are often purchased together, in order to develop marketing strategies to increase the sales of those products.

Clustering and association are considered to be descriptive, as opposed to predictive, because they identify patterns that reveal insights into the data.

When starting your own artificial intelligence project, carefully consider the available data and what you want to do with that data. If you already have well defined categories that you want the machine to use to classify input, you probably want to stick with supervised learning. If you’re unsure how to group and categorize the data or you want to look at the data in a new way, unsupervised learning is probably the better approach because it enables the computer to identify similarities and differences you may have never considered otherwise.

Supervised and Unsupervised Machine Learning

Supervised Learning

Unsupervised Learning

Quick Links

Contact

Follow Me On