SlideShare a Scribd company logo
Basics of
Regression Analysis
Presented By
Mahak Vijay
08-02-2017
1
•What is Regression Analysis?
•Population Regression Line
•Why do we use Regression Analysis?
•What are the types of Regression?
•Simple Linear Regression Model
•Least Square Estimation for parameters
•Least Square for Linear Regression
•References
08-02-2017
2
Outlines
 Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
 This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables.
 For example, relationship between rash driving and number of road accidents by a driver is
best studied through regression.
08-02-2017
3
What is Regression Analysis?
x
y
Regression Line
Actual
Estimated
Errors
08-02-2017
4
Population Regression Line
Independent Variables
DependentVariables
Study Time
EstimatedGrades
Population regression function =
𝑦 = 𝑏0+𝑏1x
𝑦 = Estimated Grades
x = Study Time
𝑏0= Intercept
𝑏1= Slope
Example
08-02-2017
5
Population Regression Line
𝑏0= Intercept
𝑏1= Slope
Regression Line
Typically, a regression analysis is used for these purposes:
(1) Prediction of the target variable (forecasting).
(2) Modelling the relationships between the dependent variable and the explanatory variable.
(3) Testing of hypotheses.
Benefits
1. It indicates the strength of impact of multiple independent variables on a dependent variable.
2. It indicates the significant relationships between dependent variable and independent variable.
These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set
of variables to be used for building predictive models.
08-02-2017
6
Why we need Regression Analysis?
Types of regression analysis:
Regression analysis is generally classified into two
kinds: simple and multiple.
Simple Regression:
It involves only two variables: dependent variable ,
explanatory (independent) variable.
A regression analysis may involve a linear model or
a nonlinear model.
The term linear can be interpreted in two different
ways:
1. Linear in variable
2. Linearity in the parameter
Regression
Analysis
Simple Multiple
Linear Non Linear
1 Explanatory
variable
2+ Explanatory
variable
08-02-2017
7
Types of Regression Analysis
Simple linear regression model is a model with a single regressor x that has a linear relationship with a
response y.
Simple linear regression model:
y = 𝑏0+𝑏1x + ɛ
Response variable Regressor variable
Intercept Slope Random error component
In this technique, the dependent variable is continuous
and random variable, independent variable(s) can
be continuous or discrete but it is not a random
variable, and nature of regression line is linear.
08-02-2017
8
Simple Linear Regression Model
Some basic assumption on the model:
Simple linear regression model:
yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)
 ɛi is a random variable with zero mean and variance σ2,i.e.
 ɛi and ɛj are uncorrelated for i ≠ j, i.e.
 ɛi is a normally distributed random variable with mean zero and variance σ2.
Ɛi ~𝑖𝑛𝑑 N (0, σ2).
E(ɛi )=0 ; V(ɛi )= σ2
cov(ɛi , ɛj )=0
08-02-2017
9
yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)
E(yi) = 𝐸(𝑏0+𝑏1xi + ɛi)= 𝑏0+𝑏1xi
V(yi) = 𝑉(𝑏0+𝑏1xi + ɛi)=V(ɛi )=σ2.
=> Ɛi ~𝒊𝒏𝒅 N (0, σ2)
=> Yi ~𝒊𝒏𝒅 N (𝒃 𝟎+𝒃 𝟏xi , σ2)
08-02-2017
10
NOTE : The dataset should satisfy the basic assumption.
E(ɛi )=0
The parameters 𝑏0 and 𝑏1are unknown and must be estimates using sample data:
(𝑥1,𝑦1), (𝑥2,𝑦2),……(𝑥 𝑛,𝑦𝑛)
x
y
𝑦 = 𝑏0+𝑏1x + ɛ
x
y
08-02-2017
11
Least Square Estimation for Parameters
𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi
The line fitted by least square is the one that makes the sum of squares of all vertical discrepancies
as small as possible.
x
y
We estimate the parameters so that sum of
squares of all the vertical difference between
the observation and fitted line is minimum.
S= 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖
2
(x1,y1)
(x1, 𝑦1)
(y1- 𝑦1)= ɛ1
08-02-2017
12
𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi
Minimizing the function requires to calculate the first order condition with respect to alpha and beta and
set them zero:
I:
𝜕𝑠
𝜕𝑏0
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
II:
𝜕𝑠
𝜕𝑏1
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0
We can mathematically solve for 𝑏0 𝑎𝑛𝑑 𝑏1:
I:
𝜕𝑠
𝜕𝑏0
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
𝑏0= 𝑖=1
𝑛
𝑦𝑖 − 𝑏1 𝑥𝑖
𝑏0= 𝑦- 𝑏1 𝑥
08-02-2017
13
Where 𝑦 =
𝑦𝑖
𝑛
𝑥 =
𝑥𝑖
𝑛
S= 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖
2
14
II:
𝜕𝑠
𝜕𝑏1
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖 − 𝑦+ 𝑏1 𝑥 − 𝑏1 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖(𝑦𝑖 − 𝑦) = 𝑏1 𝑖=1
𝑛
(xi− 𝑥) 𝑥𝑖
𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦) 𝑥𝑖
𝑖=1
𝑛 (𝑥𝑖
− 𝑥) 𝑥𝑖
𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦)( 𝑥𝑖
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2
𝑏1 =
𝐶𝑜𝑣(𝑥,𝑦)
𝑉𝑎𝑟(𝑥)
= 𝑋′
𝑋
−1 𝑋′
𝑦
Proof:
= 𝑖=1
𝑛
(𝑦𝑖 − 𝑦) 𝑥
= 𝑥 𝑦𝑖 − 𝑥 𝑦
=𝑛𝑥 𝑦 − 𝑛𝑥 𝑦
=0
𝑏0= 𝑦- 𝑏1 𝑥 ; 𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦)( 𝑥𝑖
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2
08-02-2017
08-02-2017
15
Example
𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦)( 𝑥𝑖
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2 =
6
10
= 0.6
𝑏0 = 𝑦- 𝑏1 𝑥 = 2.2
08-02-2017
16
Calculating R2 Using Regression Analysis
 R-squared is a statistical measure of how close the data are to the fitted regression line(For measuring the
goodness of fit ). It is also known as the coefficient of determination.
 Firstly we calculate distance between actual values and mean value and also calculate distance between estimated
value and mean value.
 Then compare both the distances.
08-02-2017
17
Example
18
Performance of Model
08-02-2017
The standard error of the estimate is a measure of the accuracy of predictions.
Note: The regression line is the line that minimizes the sum of squared deviations of prediction
(also called the sum of squares error).
The standard error of the estimate is closely related to this quantity and is defined below:
Where Y = actual value
Y’= Estimated Value
N = No. of observations
Standard error of the Estimate (Mean square error)
08-02-2017
19
X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
Sum 15.00 10.30 10.30 0.000 2.791
Example
08-02-2017
20
08-02-2017
21
Difference
Solve : Ax=b
The columns of A define a vector space range(A).
2a
1a
Ax 2211 aa xx
Ax is an arbitrary vector in range(A).
b is a vector in Rn and also in the column space of A so this has a solution.
b
08-02-2017
22
Least Square for Linear Regression
The columns of A define a vector space range(A).
2a
1a
Ax 2211 aa xx
Ax is an arbitrary vector in range(A).
b is a vector in Rn but not in the column space of A then it doesn’t has a solution.
b
Try to find out 𝒙 that makes A𝒙 as close to 𝒃 as possible and this is called least square solution of
our problem.
xAb ˆ
08-02-2017
23
08-02-2017
24
b
2a
1a
xA ˆ
xAb ˆ
A 𝑥 is the orthogonal projection of b onto range(A)
  bAxAAxAbA TTT
 ˆˆ 0
25
26
Matlab Implementation (Linear_Regression3.m)
27
Matlab Implementation (Linear_Regression3.m)
[1] Sykes, Alan O. "An introduction to regression analysis." (1993).
[2] Chatterjee, Samprit, and Ali S. Hadi. Regression analysis by example. John Wiley & Sons,
2015.
[3] Draper, Norman Richard, Harry Smith, and Elizabeth Pownell. Applied regression analysis.
Vol. 3. New York: Wiley, 1966.
[4] Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear
regression analysis. John Wiley & Sons, 2015.
[5] Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. John Wiley & Sons,
2012.
08-02-2017
28
Reference
THANK YOU
08-02-2017
29

More Related Content

What's hot (20)

PPTX
Presentation On Regression
alok tiwari
 
PPT
Regression
mandrewmartin
 
PDF
Multiple regression
Venkata Reddy Konasani
 
PPTX
Regression Analysis
Salim Azad
 
PPTX
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
PPTX
Regression Analysis
ASAD ALI
 
PPTX
Regression ppt
Shraddha Tiwari
 
PDF
Simple linear regression
Avjinder (Avi) Kaler
 
PPTX
Regression analysis.
sonia gupta
 
PPT
Linear regression
vermaumeshverma
 
PPT
HYPOTHESIS TESTING.ppt
sadiakhan783184
 
PDF
Multiple linear regression
Avjinder (Avi) Kaler
 
PPTX
Regression
Buddy Krishna
 
PPT
Linear regression
Karishma Chaudhary
 
PPTX
Statistical inference: Estimation
Parag Shah
 
PPTX
Multiple Linear Regression
Indus University
 
PDF
Application of ordinal logistic regression in the study of students’ performance
Alexander Decker
 
PPTX
Point and Interval Estimation
Shubham Mehta
 
PPT
Regression analysis ppt
Elkana Rorio
 
PDF
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Presentation On Regression
alok tiwari
 
Regression
mandrewmartin
 
Multiple regression
Venkata Reddy Konasani
 
Regression Analysis
Salim Azad
 
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
Regression Analysis
ASAD ALI
 
Regression ppt
Shraddha Tiwari
 
Simple linear regression
Avjinder (Avi) Kaler
 
Regression analysis.
sonia gupta
 
Linear regression
vermaumeshverma
 
HYPOTHESIS TESTING.ppt
sadiakhan783184
 
Multiple linear regression
Avjinder (Avi) Kaler
 
Regression
Buddy Krishna
 
Linear regression
Karishma Chaudhary
 
Statistical inference: Estimation
Parag Shah
 
Multiple Linear Regression
Indus University
 
Application of ordinal logistic regression in the study of students’ performance
Alexander Decker
 
Point and Interval Estimation
Shubham Mehta
 
Regression analysis ppt
Elkana Rorio
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 

Viewers also liked (19)

PDF
Regression Analysis
nadiazaheer
 
PPSX
Simple linear regression
Maria Theresa
 
ODP
Simple Linear Regression (simplified)
Haoran Zhang
 
PPTX
Linear regression
Tech_MX
 
PPT
Simple linear regression (final)
Harsh Upadhyay
 
PDF
Correlation and Simple Regression
Venkata Reddy Konasani
 
PPTX
Regression analysis
Dr.ammara khakwani
 
PDF
Introduction to correlation and regression analysis
Farzad Javidanrad
 
PDF
C2.1 intro
Daniel LIAO
 
PPT
Chapter13
Richard Ferreria
 
PPTX
Simple linear regression analysis
Norma Mingo
 
PPTX
Time series
Haitham Ahmed
 
PDF
Timeseries forecasting
Venkata Reddy Konasani
 
PPT
Simple Linier Regression
dessybudiyanti
 
PPT
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Philip Tiongson
 
PPT
Lesson 8 Linear Correlation And Regression
Sumit Prajapati
 
PPTX
time series analysis
SACHIN AWASTHI
 
PPS
Correlation and regression
Khalid Aziz
 
PDF
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Regression Analysis
nadiazaheer
 
Simple linear regression
Maria Theresa
 
Simple Linear Regression (simplified)
Haoran Zhang
 
Linear regression
Tech_MX
 
Simple linear regression (final)
Harsh Upadhyay
 
Correlation and Simple Regression
Venkata Reddy Konasani
 
Regression analysis
Dr.ammara khakwani
 
Introduction to correlation and regression analysis
Farzad Javidanrad
 
C2.1 intro
Daniel LIAO
 
Chapter13
Richard Ferreria
 
Simple linear regression analysis
Norma Mingo
 
Time series
Haitham Ahmed
 
Timeseries forecasting
Venkata Reddy Konasani
 
Simple Linier Regression
dessybudiyanti
 
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Philip Tiongson
 
Lesson 8 Linear Correlation And Regression
Sumit Prajapati
 
time series analysis
SACHIN AWASTHI
 
Correlation and regression
Khalid Aziz
 
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Ad

Similar to Basics of Regression analysis (20)

PPTX
Regression Analysis.pptx
ShivankAggatwal
 
PDF
Regression analysis
Awais Salman
 
PPTX
Regression
ramyaranjith
 
PDF
probability distribution term 1 IMI New Delhi.pdf
iabhimanyusingh1
 
PPTX
business Lesson-Linear-Regression-1.pptx
jazelace87
 
PPTX
regression analysis presentation slides.
nsnatraj23
 
PDF
Lecture 1.pdf
JamalBibi1
 
PDF
The normal presentation about linear regression in machine learning
dawasthi952
 
PDF
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
PPT
Simple Linear Regression.pptSimple Linear Regression.ppt
NersyPrincessBongoya
 
PPTX
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
PPTX
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
PPTX
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
PPTX
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
22eg105n11
 
PPTX
Simple egression.pptx
AbdalrahmanTahaJaya
 
PPTX
Regression refers to the statistical technique of modeling
AddisalemMenberu
 
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
PPT
15.Simple Linear Regression of case study-530 (2).ppt
martinjoseph1822
 
PPTX
Buoi 1.2-EMBS6Regre, Simple Lineae Regression,
DngKhnh40
 
Regression Analysis.pptx
ShivankAggatwal
 
Regression analysis
Awais Salman
 
Regression
ramyaranjith
 
probability distribution term 1 IMI New Delhi.pdf
iabhimanyusingh1
 
business Lesson-Linear-Regression-1.pptx
jazelace87
 
regression analysis presentation slides.
nsnatraj23
 
Lecture 1.pdf
JamalBibi1
 
The normal presentation about linear regression in machine learning
dawasthi952
 
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
Simple Linear Regression.pptSimple Linear Regression.ppt
NersyPrincessBongoya
 
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
22eg105n11
 
Simple egression.pptx
AbdalrahmanTahaJaya
 
Regression refers to the statistical technique of modeling
AddisalemMenberu
 
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
15.Simple Linear Regression of case study-530 (2).ppt
martinjoseph1822
 
Buoi 1.2-EMBS6Regre, Simple Lineae Regression,
DngKhnh40
 
Ad

Recently uploaded (20)

PDF
OT-cybersecurity-solutions-from-TXOne-Deployment-Model-Overview-202306.pdf
jankokersnik70
 
PDF
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
PPT
Total time management system and it's applications
karunanidhilithesh
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPT
Oxygen Co2 Transport in the Lungs(Exchange og gases)
SUNDERLINSHIBUD
 
PPTX
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PPTX
Coding about python and MySQL connectivity
inderjitsingh1985as
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PDF
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
PDF
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PDF
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
PDF
Number Theory practice session 25.05.2025.pdf
DrStephenStrange4
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
OT-cybersecurity-solutions-from-TXOne-Deployment-Model-Overview-202306.pdf
jankokersnik70
 
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
Total time management system and it's applications
karunanidhilithesh
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Oxygen Co2 Transport in the Lungs(Exchange og gases)
SUNDERLINSHIBUD
 
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
Coding about python and MySQL connectivity
inderjitsingh1985as
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
Number Theory practice session 25.05.2025.pdf
DrStephenStrange4
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 

Basics of Regression analysis

  • 1. Basics of Regression Analysis Presented By Mahak Vijay 08-02-2017 1
  • 2. •What is Regression Analysis? •Population Regression Line •Why do we use Regression Analysis? •What are the types of Regression? •Simple Linear Regression Model •Least Square Estimation for parameters •Least Square for Linear Regression •References 08-02-2017 2 Outlines
  • 3.  Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictor).  This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables.  For example, relationship between rash driving and number of road accidents by a driver is best studied through regression. 08-02-2017 3 What is Regression Analysis?
  • 5. Study Time EstimatedGrades Population regression function = 𝑦 = 𝑏0+𝑏1x 𝑦 = Estimated Grades x = Study Time 𝑏0= Intercept 𝑏1= Slope Example 08-02-2017 5 Population Regression Line 𝑏0= Intercept 𝑏1= Slope Regression Line
  • 6. Typically, a regression analysis is used for these purposes: (1) Prediction of the target variable (forecasting). (2) Modelling the relationships between the dependent variable and the explanatory variable. (3) Testing of hypotheses. Benefits 1. It indicates the strength of impact of multiple independent variables on a dependent variable. 2. It indicates the significant relationships between dependent variable and independent variable. These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set of variables to be used for building predictive models. 08-02-2017 6 Why we need Regression Analysis?
  • 7. Types of regression analysis: Regression analysis is generally classified into two kinds: simple and multiple. Simple Regression: It involves only two variables: dependent variable , explanatory (independent) variable. A regression analysis may involve a linear model or a nonlinear model. The term linear can be interpreted in two different ways: 1. Linear in variable 2. Linearity in the parameter Regression Analysis Simple Multiple Linear Non Linear 1 Explanatory variable 2+ Explanatory variable 08-02-2017 7 Types of Regression Analysis
  • 8. Simple linear regression model is a model with a single regressor x that has a linear relationship with a response y. Simple linear regression model: y = 𝑏0+𝑏1x + ɛ Response variable Regressor variable Intercept Slope Random error component In this technique, the dependent variable is continuous and random variable, independent variable(s) can be continuous or discrete but it is not a random variable, and nature of regression line is linear. 08-02-2017 8 Simple Linear Regression Model
  • 9. Some basic assumption on the model: Simple linear regression model: yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)  ɛi is a random variable with zero mean and variance σ2,i.e.  ɛi and ɛj are uncorrelated for i ≠ j, i.e.  ɛi is a normally distributed random variable with mean zero and variance σ2. Ɛi ~𝑖𝑛𝑑 N (0, σ2). E(ɛi )=0 ; V(ɛi )= σ2 cov(ɛi , ɛj )=0 08-02-2017 9
  • 10. yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n) E(yi) = 𝐸(𝑏0+𝑏1xi + ɛi)= 𝑏0+𝑏1xi V(yi) = 𝑉(𝑏0+𝑏1xi + ɛi)=V(ɛi )=σ2. => Ɛi ~𝒊𝒏𝒅 N (0, σ2) => Yi ~𝒊𝒏𝒅 N (𝒃 𝟎+𝒃 𝟏xi , σ2) 08-02-2017 10 NOTE : The dataset should satisfy the basic assumption. E(ɛi )=0
  • 11. The parameters 𝑏0 and 𝑏1are unknown and must be estimates using sample data: (𝑥1,𝑦1), (𝑥2,𝑦2),……(𝑥 𝑛,𝑦𝑛) x y 𝑦 = 𝑏0+𝑏1x + ɛ x y 08-02-2017 11 Least Square Estimation for Parameters 𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi
  • 12. The line fitted by least square is the one that makes the sum of squares of all vertical discrepancies as small as possible. x y We estimate the parameters so that sum of squares of all the vertical difference between the observation and fitted line is minimum. S= 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 2 (x1,y1) (x1, 𝑦1) (y1- 𝑦1)= ɛ1 08-02-2017 12 𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi
  • 13. Minimizing the function requires to calculate the first order condition with respect to alpha and beta and set them zero: I: 𝜕𝑠 𝜕𝑏0 = -2 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0 II: 𝜕𝑠 𝜕𝑏1 = -2 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0 We can mathematically solve for 𝑏0 𝑎𝑛𝑑 𝑏1: I: 𝜕𝑠 𝜕𝑏0 = -2 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0 𝑏0= 𝑖=1 𝑛 𝑦𝑖 − 𝑏1 𝑥𝑖 𝑏0= 𝑦- 𝑏1 𝑥 08-02-2017 13 Where 𝑦 = 𝑦𝑖 𝑛 𝑥 = 𝑥𝑖 𝑛 S= 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 2
  • 14. 14 II: 𝜕𝑠 𝜕𝑏1 = -2 𝑖=1 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0 𝑖=1 𝑛 𝑥𝑖 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0 𝑖=1 𝑛 𝑥𝑖 𝑦𝑖 − 𝑦+ 𝑏1 𝑥 − 𝑏1 𝑥𝑖 = 0 𝑖=1 𝑛 𝑥𝑖(𝑦𝑖 − 𝑦) = 𝑏1 𝑖=1 𝑛 (xi− 𝑥) 𝑥𝑖 𝑏1 = 𝑖=1 𝑛 (𝑦𝑖− 𝑦) 𝑥𝑖 𝑖=1 𝑛 (𝑥𝑖 − 𝑥) 𝑥𝑖 𝑏1 = 𝑖=1 𝑛 (𝑦𝑖− 𝑦)( 𝑥𝑖 − 𝑥) 𝑖=1 𝑛 (𝑥𝑖− 𝑥)2 𝑏1 = 𝐶𝑜𝑣(𝑥,𝑦) 𝑉𝑎𝑟(𝑥) = 𝑋′ 𝑋 −1 𝑋′ 𝑦 Proof: = 𝑖=1 𝑛 (𝑦𝑖 − 𝑦) 𝑥 = 𝑥 𝑦𝑖 − 𝑥 𝑦 =𝑛𝑥 𝑦 − 𝑛𝑥 𝑦 =0 𝑏0= 𝑦- 𝑏1 𝑥 ; 𝑏1 = 𝑖=1 𝑛 (𝑦𝑖− 𝑦)( 𝑥𝑖 − 𝑥) 𝑖=1 𝑛 (𝑥𝑖− 𝑥)2 08-02-2017
  • 15. 08-02-2017 15 Example 𝑏1 = 𝑖=1 𝑛 (𝑦𝑖− 𝑦)( 𝑥𝑖 − 𝑥) 𝑖=1 𝑛 (𝑥𝑖− 𝑥)2 = 6 10 = 0.6 𝑏0 = 𝑦- 𝑏1 𝑥 = 2.2
  • 16. 08-02-2017 16 Calculating R2 Using Regression Analysis  R-squared is a statistical measure of how close the data are to the fitted regression line(For measuring the goodness of fit ). It is also known as the coefficient of determination.  Firstly we calculate distance between actual values and mean value and also calculate distance between estimated value and mean value.  Then compare both the distances.
  • 19. The standard error of the estimate is a measure of the accuracy of predictions. Note: The regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error). The standard error of the estimate is closely related to this quantity and is defined below: Where Y = actual value Y’= Estimated Value N = No. of observations Standard error of the Estimate (Mean square error) 08-02-2017 19
  • 20. X Y Y' Y-Y' (Y-Y')2 1.00 1.00 1.210 -0.210 0.044 2.00 2.00 1.635 0.365 0.133 3.00 1.30 2.060 -0.760 0.578 4.00 3.75 2.485 1.265 1.600 5.00 2.25 2.910 -0.660 0.436 Sum 15.00 10.30 10.30 0.000 2.791 Example 08-02-2017 20
  • 22. Solve : Ax=b The columns of A define a vector space range(A). 2a 1a Ax 2211 aa xx Ax is an arbitrary vector in range(A). b is a vector in Rn and also in the column space of A so this has a solution. b 08-02-2017 22 Least Square for Linear Regression
  • 23. The columns of A define a vector space range(A). 2a 1a Ax 2211 aa xx Ax is an arbitrary vector in range(A). b is a vector in Rn but not in the column space of A then it doesn’t has a solution. b Try to find out 𝒙 that makes A𝒙 as close to 𝒃 as possible and this is called least square solution of our problem. xAb ˆ 08-02-2017 23
  • 24. 08-02-2017 24 b 2a 1a xA ˆ xAb ˆ A 𝑥 is the orthogonal projection of b onto range(A)   bAxAAxAbA TTT  ˆˆ 0
  • 25. 25
  • 28. [1] Sykes, Alan O. "An introduction to regression analysis." (1993). [2] Chatterjee, Samprit, and Ali S. Hadi. Regression analysis by example. John Wiley & Sons, 2015. [3] Draper, Norman Richard, Harry Smith, and Elizabeth Pownell. Applied regression analysis. Vol. 3. New York: Wiley, 1966. [4] Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear regression analysis. John Wiley & Sons, 2015. [5] Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. John Wiley & Sons, 2012. 08-02-2017 28 Reference

Editor's Notes

  • #4: The dependent variable is variously known as explained variables, predictand, response and endogenous variables. While the independent variable is known as explanatory, regressor and exogenous variable.
  • #27: load accidents x = hwydata(:,14); %Population of states y = hwydata(:,4); %Accidents per state format long b1 = x\y; yCalc1 = b1*x; scatter(x,y) hold on plot(x,yCalc1) xlabel('Population of state') ylabel('Fatal traffic accidents per state') title('Linear Regression Relation Between Accidents & Population') grid on X = [ones(length(x),1) x]; b = X\y; yCalc2 = X*b; plot(x,yCalc2,'--') legend('Data','Slope','Slope & Intercept','Location','best'); Rsq1 = 1 - sum((y - yCalc1).^2)/sum((y - mean(y)).^2); Rsq2 = 1 - sum((y - yCalc2).^2)/sum((y - mean(y)).^2);