6. Supervised Learning

🎯 Learning Goals

Define supervised learning and explain how it differs from other types of machine learning

Identify key components of a supervised learning model (features, labels, training data, and testing data)

Understand and differentiate between classification and linear regression problems

📗 Technical Vocabulary

Supervised Learning

Inputs

Features

Labels

Classification

Linear regression

🌤️

Warm-Up

Imagine you’ve received a mysterious thank you note from a friend. The note is unsigned, but you know it was written by one of three people. Before you examine the note itself, take a closer look at these three handwriting samples.

Beyonce

Taylor

Olivia

Consider the following questions as you examine the samples:

What distinct features does each handwriting sample have?

If you were to predict who might have written the mysterious note, what clues would you look for in the note to match it with one of these samples?

Jot down your observations and when you’re ready, reveal the note below and use your notes to predict who wrote the note!

Mysterious Thank You Note

Who wrote the mysterious note? If you guessed Beyonce, you’re correct! What features did you notice that gave it away? The uppercase R, the cursive f, and the curved line across the lowercase t stood out to me!

Just like in this exercise, supervised machine learning works by learning from labeled examples. Here, the handwriting samples were our training data with known "labels" (the names of the people who wrote them). The mysterious note represents new, unseen data. By comparing its features with those in our training set, we can make an informed prediction about who wrote it. Similarly, machine learning models can be trained to classify new information based on patterns learned from past examples!

The Machine Learning Process

Let’s explore some key terms in machine learning using the warm-up example.

Input — The data provided to the model

Features — Characteristics of the input that help the model learn

Labels — The category or class the model tries to predict

In the previous example, you acted as the model. You were given three labeled handwriting samples as input to help you learn and make predictions. When studying the samples, you identified features—distinct handwriting patterns—that helped you distinguish between the authors. By linking these features to each label (the writer’s identity), you learned how to make accurate predictions.

When you received a new input—the mysterious note—you examined its features and predicted the label (who wrote it).

This mirrors how machine learning models are trained and how they make predictions in the real world!

Supervised Learning

Supervised learning is a way for computers to learn by example. We give the algorithm lots of examples where it can see both the input (X) and the correct output (Y). After enough practice, the algorithm learns to take just the input and predict the output on its own. Let’s take a look at some real-world examples:

Spam Filter: The input is an email (X), and the output is whether it’s spam or not (Y).

Speech Recognition: The input is an audio clip (X), and the output is the text transcript (Y).

Language Translation: The input is English (X), and the output is the translated text in a new language like Spanish, Chinese, or any other language (Y).

Online Ads: The input is data about an ad and a user (X), and the output is whether the user will click on it or not (Y).

What about Unsupervised Learning?

Supervised learning works with labeled data (where we already know the correct answers), but what if we don’t have labels? That’s where unsupervised learning comes in. Instead of being told what to look for, the algorithm identifies patterns and relationships on its own. Many generative AI models are initially trained with unsupervised learning and later with supervised learning to help them become experts in a specific area. For now, we’ll continue to focus on supervised machine learning.

Linear Regression

Let’s dive deeper into a specific example of supervised learning. Say you want to predict a student’s exam score based on the number of hours they studied. You’ve collected some data from past students, and when you plot the data, it looks like this:

The horizontal axis (X-axis) represents the number of hours studied.

The vertical axis (Y-axis) represents the exam score (out of 100).

Now, imagine your friend studied for 5 hours and wants to know what score they might get. How can a machine learning algorithm help?

One approach is to fit a straight line to the data. Looking at this line, we might estimate that a student who studies for 5 hours is likely to score around 96 points.

However, a straight line isn’t always the best fit. Instead of a straight line, we might decide that a curve better represents the data. If we use a curve to make our prediction, it suggests that a student who studies for 5 hours is likely to score closer to 92 points.

This curved line approach is an example of polynomial linear regression, which is still considered a form of linear regression despite producing a curved line. This might seem confusing at first - how can a curved line be "linear"? The key insight is that "linear" in this context refers to the relationship between the model's parameters (coefficients) and the output, not the shape of the resulting curve.

This is an example of supervised learning because we’re giving the algorithm a dataset where the "right answers" (the actual exam scores) are provided for each input (hours studied). The goal of the algorithm is to learn from this data so it can predict scores for new students who haven’t been seen before.

This type of supervised learning is called linear regression because we’re predicting a continuous number—in this case, an exam score, which could be 75, 82, 90, or any value in between.

There’s also another major type of supervised learning called classification, where instead of predicting a continuous number, the model predicts a category. For example, instead of predicting an exact score, a classification model might predict whether a student will pass or fail the exam.

Classification

Let’s say you have a dataset of students who studied for different amounts of time before an exam. Each student is labeled as either Pass (1) or Fail (0) based on their exam results. We can plot this data on a graph:

Red circles represent students who failed (score below 70).

Green crosses represent students who passed (score 70 or above).

Unlike linear regression, where we predict a continuous number (like an exact exam score), here we’re predicting a category—either "Pass" or "Fail." This is what makes it a classification problem because we are choosing from a small, finite set of possible outputs rather than any number in between.

Now, let’s say a new student walks in and tells you they studied for 2.5 hours before their test. The question is: Will our system predict that they will pass or fail?

We can use a machine learning algorithm to find a decision boundary—a line (or curve) that separates "Pass" students from "Fail" students based on their study hours.

The decision boundary is set at 2 hours studied, meaning that students who studied less than 2 hours are more likely to fail, while students who studied 2 hours or more are more likely to pass. Based on this decision boundary, would we expect a student who studied for 2.5 hours to pass or fail? Pass!

More Than Two Categories?

Classification isn’t limited to just two outcomes. For example, instead of just "Pass" or "Fail," we could have a system that predicts:

Fail (0)

Pass with a C (1)

Pass with a B (2)

Pass with an A (3)

In this case, the model has four possible output categories or labels. These labels are also sometimes called output classes or categories, and these terms can be used interchangeably in machine learning.

Using More Than One Input

So far, we’ve only used one input: hours studied. But classification models can use multiple inputs to improve accuracy!

For example, instead of just knowing the number of hours studied, what if we also considered:

The student’s GPA

The number of practice tests taken

Now, we have a dataset with two or three inputs instead of one. In real-world machine learning problems, models often use many more inputs to make better predictions. For example, an AI system predicting college admissions chances might look at SAT scores, GPA, extracurricular activities, and essay strength to make a decision.

Classification or Linear Regression?

You’ve been hired as a machine learning expert! Analyze the following real-world problems and decide whether linear regression or classification is the best machine learning approach. Be prepared to explain your reasoning!

A real estate company wants to create a machine learning model that predicts the price of a house based on features like square footage, number of bedrooms, and neighborhood location.

A hospital is developing an AI system to help doctors diagnose whether a patient has diabetes based on medical test results.

A logistics company wants to predict how long a package will take to be delivered based on distance, weather conditions, and traffic data.

📝

Practice | Supervised Learning

Movie Reviews | Imagine you are building a machine learning model that predicts whether a movie review is positive or negative based on the words used in the review.

Part 1: Read these short movie reviews. Identify at least three features (words or phrases) that could help classify the review as positive or negative.

"This movie was absolutely amazing! The acting was fantastic, and I loved the storyline."
"I hated this film. The plot was dull, and the characters were annoying."
"It was okay. Some parts were exciting, but the ending was disappointing."

Part 2: Predict whether each review is positive or negative based on the features you identified.

Part 3: Answer the following questions.

Is this a linear regression or classification problem? Explain your answer.
What features did you use to classify the reviews?
How would a machine learning model learn from many reviews to improve its accuracy?

Monthly Electricity Bills | A utility company wants to predict a household’s electricity bill based on the number of kilowatt-hours (kWh) used. You have been given past data on electricity usage and bills.

kWh Used	Electricity Bill ($)
500	60
700	85
900	110
1100	135
1300	160

Part 1: Use the pattern in the data to estimate/predict the bill for a household that uses 1000 kWh.

Part 2: Answer the following questions.

Is this a classification or linear regression problem? Explain your answer.
What other factors (features) might improve the accuracy of this prediction?

🤖 AI Connection

Now that you can tell the difference between classification and linear regression, put an AI tool to the test! Ask: "Give me 3 real-world examples of supervised learning problems. For each one, tell me whether it's classification or linear regression and explain why." Do you agree with the AI's answers?

💼 Takeaways

In this lesson, you learned how machine learning models can be “trained” using examples to make predictions on new data. You explored key machine learning terms and even built simple models using Python!

Supervised learning maps input (X) to output (Y) using labeled data

The two main types of supervised learning are linear regression and classification

Linear regression predicts continuous values (like an exam score)

Classification predicts categories (like "Pass" or "Fail")

For a summary of this lesson, check out the 6. Supervised Learning One-Pager!