linear and multiple linear regression
Linear Regression
Linear regression is used to
predict the value of an outcome variable y on the basis of one or more input
predictor variables x. In other words, linear regression is used to establish a
linear relationship between the predictor and response variables.
In linear regression, predictor
and response variables are related through an equation in which the exponent of
both these variables is 1. Mathematically, a linear relationship denotes a
straight line, when plotted as a graph.
There is the following general
mathematical equation for linear regression:
1.
y = ax + b
Here,
- y is a response variable.
- x is a predictor variable.
- a and b are constants that are called the
coefficients.
Steps for establishing the Regression
The prediction of the weight of
a person when his height is known, is a simple example of regression. To
predict the weight, we need to have a relationship between the height and
weight of a person.
There are the following steps to
create the relationship:
- In the first step, we carry out the experiment
of gathering a sample of observed values of height and weight.
- After that, we create a relationship model
using the lm() function of R.
- Next, we will find the coefficient with the
help of the model and create the mathematical equation using this
coefficient.
- We will get the summary of the relationship
model to understand the average error in prediction, known as residuals.
- At last, we use the predict() function to
predict the weight of the new person.
There is the following syntax of
lm() function:
1.
lm(formula,data)
Here,
|
S.No |
Parameters |
Description |
|
1. |
Formula |
It
is a symbol that presents the relationship between x and y. |
|
2. |
Data |
It
is a vector on which we will apply the formula. |
Creating Relationship Model and Getting the Coefficients
Let's start performing the
second and third steps, i.e., creating a relationship model and getting the
coefficients. We will use the lm() function and pass the x and y input vectors
and store the result in a variable named relationship_model.
Example
#Creating input vector for lm() function
x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
# Applying the lm() function.
relationship_model<- lm(y~x)
#Printing the coefficient
print(relationship_model)
Getting Summary of Relationship Model
We will use the summary()
function to get a summary of the relationship model. Let's see an example to
understand the use of the summary() function.
Example
#Creating input vector for lm() function
x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
# Applying the lm() function.
relationship_model<- lm(y~x)
#Printing the coefficient
print(summary(relationship_model))
The predict() Function
Now, we will predict the weight
of new persons with the help of the predict() function. There is the following
syntax of predict function:
1.
predict(object, newdata)
Here,
|
S.No |
Parameter |
Description |
|
1. |
object |
It
is the formula that we have already created using the lm() function. |
|
2. |
Newdata |
It
is the vector that contains the new value for the predictor variable. |
Example
#Creating input vector for lm() function
x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
# Applying the lm() function.
relationship_model<- lm(y~x)
# Finding the weight of a person with height 170.
z <- data.frame(x = 160)
predict_result<- predict(relationship_model,z)
print(predict_result)
Plotting Regression
Now, we plot out prediction
results with the help of the plot() function. This function takes parameter x
and y as an input vector and many more arguments.
Example
#Creating input vector for lm() function
x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
relationship_model<- lm(y~x)
# Giving a name to the chart file.
png(file = "linear_regression.png")
# Plotting the chart.
plot(y,x,col = "red",main = "Height and Weight Regression",abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Saving the file.
dev.off()
Multiple Linear
Regression
Multiple linear regression is
the extension of the simple linear regression, which is used to predict the
outcome variable (y) based on multiple distinct predictor variables (x). With
the help of three predictor variables (x1, x2, x3), the prediction of y is
expressed using the following equation:
y=b0+b1*x1+b2*x2+b3*x3
The "b" values
represent the regression weights. They measure the association between the
outcome and the predictor variables. "
Multiple
linear regression is the extension of linear regression in the relationship
between more than two variables. In simple linear regression, we have one
predictor and one response variable. But in multiple regressions, we have more
than one predictor variable and one response variable.
There
is the following general mathematical equation for multiple regression -
y=b0+b1*x1+b2*x2+b3*x3+⋯bn*xn
Here,
- y is a response variable.
- b0,
b1, b2...bn are the coefficients.
- x1,
x2, ...xn are the predictor
variables.
In
R, we create the regression model with the help of the lm() function.
The model will determine the value of the coefficients with the help of the
input data. We can predict the value of the response variable for the set of
predictor variables using these coefficients.
There
is the following syntax of lm() function in multiple regression
Before
proceeding further, we first create our data for multiple regression. We will
use the "mtcars" dataset present in the R environment. The main task
of the model is to create the relationship between the "mpg" as a
response variable with "wt", "disp" and "hp" as
predictor variables.
data<-mtcars[,c("mpg","wt","disp","hp")]
print(head(data))
Creating Relationship Model and finding Coefficient
Now, we will use the data which
we have created before to create the Relationship Model. We will use the lm()
function, which takes two parameters i.e., formula and data. Let's start
understanding how the lm() function is used to create the Relationship Model.
Example
#Creating input data.
input <- mtcars[,c("mpg","wt","disp","hp")]
# Creating the relationship model.
Model <- lm(mpg~wt+disp+hp, data = input)
# Showing the Model.
print(Model)
From
the above output it is clear that our model is successfully setup. Now, our
next step is to find the coefficient with the help of the model.
b0<- coef(Model)[1]
print(b0)
x_wt<- coef(Model)[2]
x_disp<- coef(Model)[3]
x_hp<- coef(Model)[4]
print(x_wt)
print(x_disp)
print(x_hp)
The equation for the Regression Model
Now, we have coefficient values
and intercept. Let's start creating a mathematical equation that we will apply
for predicting new values. First, we will create an equation, and then we use
the equation to predict the mileage when a new set of values for weight,
displacement, and horsepower is provided.
Let's see an example in which we
predict the mileage for a car with weight=2.51, disp=211 and hp=82.
Example
#Creating input data.
input <-
mtcars[,c("mpg","wt","disp","hp")]
# Creating the relationship model.
Model <- lm(mpg~wt+disp+hp, data = input)
# Showing the Model.
print(Model)
b0<- coef(Model)[1]
print(b0)
x_wt<- coef(Model)[2]
x_disp<- coef(Model)[3]
x_hp<- coef(Model)[4]
print(x_wt)
print(x_disp)
print(x_hp)
#Applying equation for prediction new values
y=b0+x_wt*2.51+x_disp*211+x_hp*82
y
print(x_wt)
print(x_disp)
print(x_hp)
Comments
Post a Comment