Machine learning: linear regression

Dhanushka sandakelum
4 min readAug 2, 2023

Today we are going to look at linear regression in machine learning. Linear regression is basically a regression task. Which means that based on the trained model, we are able to predict the results once we are given the inputs.

Let’s look at how to implement a linear regression machine learning model using Python. Here we are going to use the Anaconda spyder. If you haven’t installed anaconda or Python, please download those using following links:

Anaconda3–4.1.1: https://repo.anaconda.com/archive/

Python 3.5: https://www.python.org/downloads/release/python-350/

Once you have set up the development environment, we are good to go.

Ok let’s start coding.

We are going to use a sample dataset. You can download it from

Download Dataset: https://github.com/DhanushkaSandakelum/machine-learning-python/tree/main/2%20Linear%20regression

There are two columns in the dataset: Years of experience and salary. Basically, we must train a machine learning model that is able to predict the salary once a certain amount of experience has been gained.

In this problem, The independent variable (X) is Years of experience, and the dependent variable (y) is Salary.

First, we have to import several libraries.

Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Then we have to import the data set (Salary_Data.csv). Initially, we read the data file using the read_csv() function provided by the Pandas library. Then we have to separate independent variables from dependent variables. According to the dataset, all the columns except the last (0th index) are considered X, which are the independent variables. Then the last column (1st Index) is considered y, which is the dependent variable.

Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
Dataset

Here, first, I have split the dataset into a training set and a test set. A training set is used to train the ML model. A Test set is used to evaluate the strength of the predictions of that ML model.

Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
Training set and Test set

Then, by using X_train and Y_train, we can apply linear regression to train our model.

Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Then, finally, it can predict the results using X_test. Here it will predict a new set of output called y_pred. Then we can compare y_pred and y_test to see the strength of the chosen ML model.

Predicting the Test set result
y_pred = regressor.predict(X_test)
Predictions vs Test values

Now, in order to have a better understanding of our machine learning model, we can plot a graph and see the results. First, let’s see the Training set results. This basically implies how our mathematical model is trained. The Red dots are the training set data, and the blue line shows the regression line, which is constructed by the predictions of our machine learning mathematical model.

Visualizing the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue'))
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Visualizing the Training set results

Likewise, we can see the test set results in a plot as well. Here are the red dots, which show the distribution of training set data, and the blue line, which shows the regression line.

Visualizing the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue'))
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Visualizing the Test set results

Alright, So that’s how we can implement a machine learning model in order to perform a linear regression.

You can find the code syntax in following Git repository:

If you have any questions or suggestions, leave a comment. And if you find this medium post useful, give it a clap as well.

Thank you,

Dhanushka

--

--

Dhanushka sandakelum

Hi! I'm Dhanushka. Currently I'm an undergraduate at University of Colombo School of Computing (UCSC) following B.Sc. in Computer Science.