SlideShare a Scribd company logo
Quantitative
    Data Analysis

Probability and basic statistics
probability
The most familiar way of thinking about probability is within a
framework of repeatable random experiments. In this view the
probability of an event is defined as the limiting proportion of times
the event would occur given many repetitions.
Probability
Instead of exclusively relying on knowledge of the proportion of times
an event occurs in repeated sampling, this approach allows the
incorporation of subjective knowledge, so-called prior probabilities,
that are then updated. The common name for this approach is
Bayesian statistics.
The Fundamental Rules of
             Probability
Rule 1: Probability is always positive
Rule 2: For a given sample space, the sum of probabilities is 1
Rule 3: For disjoint (mutually exclusive) events, P(AUB)=P (A)
+ P (B)
Counting
Permutations (order is important)



Combinations (order is not important)
Probability functions
The factorial function
   factorial(n)
   gamma(n+1)


Combinations can be calculated with
   choose(x,n)
Simple statistics
mean(x) arithmetic average of the values in x
median(x) median value in x
var(x) sample variance of x
cor(x,y) correlation between vectors x and y
quantile(x) vector containing the minimum, lower quartile, median,
upper quartile, and maximum of x
rowMeans(x) row means of dataframe or matrix x
colMeans(x) column means
cumulative probability function
The cumulative probability function is, for any value of x, the
 probability of obtaining a sample value that is less than or equal to
 x.




                 curve(pnorm(x),-3,3)
probability density function
The probability density is the slope of this curve (its
‘derivative’).




                          curve(dnorm(x),-3,3)
Continuous Probability
     Distributions
Continuous Probability
               Distributions
R has a wide range of built-in probability distributions, for each of
which four functions are available: the probability density function
(which has a d prefix); the cumulative probability (p); the quantiles of
the distribution (q); and random numbers generated from the
distribution (r).
Normal distribution
par(mfrow=c(2,2))
x<-seq(-3,3,0.01)
y<-exp(-abs(x))
plot(x,y,type="l")
y<-exp(-abs(x)^2)
plot(x,y,type="l")
y<-exp(-abs(x)^3)
plot(x,y,type="l")
y<-exp(-abs(x)^8)
plot(x,y,type="l")
Normal distribution




                      norm.R
Exercise
Suppose we have measured the heights of 100 people. The mean
height was 170 cm and the standard deviation was 8 cm. We can ask
three sorts of questions about data like these: what is the probability
that a randomly selected individual will be:
shorter than a particular height?
 taller than a particular height?
 between one specified height and another?
Exercise




           normal.R
The central limit theorem
If you take repeated samples from a population with finite variance
and calculate their averages, then the averages will be normally
distributed.
Checking normality




                     fishes.R
Checking normality
The gamma distribution
The gamma distribution is useful for describing a wide range of
processes where the data are positively skew (i.e. non-normal, with a
long tail on the right).
The gamma distribution
x<-seq(0.01,4,.01)
par(mfrow=c(2,2))
y<-dgamma(x,.5,.5)
plot(x,y,type="l")
y<-dgamma(x,.8,.8)
plot(x,y,type="l")
y<-dgamma(x,2,2)
plot(x,y,type="l")
y<-dgamma(x,10,10)
plot(x,y,type="l")




                     gammas.R
The gamma distribution
 α is the shape parameter and β −1 is the scale parameter. Special
cases of the gamma distribution are the exponential =1 and chi-
squared =/2, =2.
The mean of the distribution is αβ , the variance is αβ 2, the
skewness is 2/√α and the kurtosis is 6/α.
The gamma distribution




                    gammas.R
Exercise
Exercise




           fishes2.R
The exponential distribution
Quantitative
Data Analysis

 Hypothesis testing
cumulative probability function
The cumulative probability function is, for any value of x, the
 probability of obtaining a sample value that is less than or equal to
 x.




                 curve(pnorm(x),-3,3)
probability density function
The probability density is the slope of this curve (its
‘derivative’).




                          curve(dnorm(x),-3,3)
Exercise
Suppose we have measured the heights of 100 people. The mean
height was 170 cm and the standard deviation was 8 cm. We can ask
three sorts of questions about data like these: what is the probability
that a randomly selected individual will be:
shorter than a particular height?
 taller than a particular height?
 between one specified height and another?
Exercise




           normal.R
Why Test?
Statistics is an experimental science, not really a branch of
mathematics.
It’s a tool that can tell you whether data are accidentally or really
similar.
It does not give you certainty.
Steps in hypothesis testing!
1.    Set the null hypothesis and the alternative hypothesis.
2.    Calculate the p-value.
3.    Decision rule: If the p-value is less than 5% then reject the null
      hypothesis otherwise the null hypothesis remains valid. In any
      case, you must give the p-value as a justification for your
      decision.
Types of Errors…
A Type I error occurs when we reject a true null hypothesis (i.e.
Reject H0 when it is TRUE)

                           H0        T     F

                          Reject     I

                          Reject          II
A Type II error occurs when we don’t reject a false null hypothesis
(i.e. Do NOT reject H0 when it is FALSE)




                                                                    11.33
Critical regions and power
  The table shows schematically relation between relevant probabilities
  under null and alternative hypothesis.




                           do not reject        reject

Null hypothesis is true    1-                   (Type I error)

Null hypothesis is false    (Type II error)    1- 
Significance
It is common in hypothesis testing to set probability of Type I error, 
to some values called the significance levels. These levels usually set
to 0.1, 0.05 and 0.01. If null hypothesis is true and probability of
observing value of the current test statistic is lower than the
significance levels then hypothesis is rejected.
Sometimes instead of setting pre-defined significance level, p-value is
reported. It is also called observed significance level.
36
n
 e
 e
n
 e
 p
pt
                  Significance Level
©
A
 i   When we reject the null hypothesis there is a risk of drawing a wrong
Ta   conclusion
a
ni   Risk of drawing a wrong conclusion (called p-value or observed
 a   significance level) can be calculated
     Researcher decides the maximum risk (called significance level) he is
     ready to take
     Usual significance level is 5%
P-value
We start from the basic assumption: The null hypothesis is true
P-value is the probability of getting a value equal to or more extreme
than the sample result, given that the null hypothesis is true
Decision rule: If p-value is less than 5% then reject the null
hypothesis; if p-value is 5% or more then the null hypothesis remains
valid
In any case, you must give the p-value as a justification for your
decision.
Interpreting the p-value…
    Overwhelming Evidence
    (Highly Significant)

                     Strong Evidence
                     (Significant)


                                  Weak Evidence
                                  (Not Significant)


                                                            No Evidence
                                                            (Not Significant)


0                   .01                .05            .10
Power analysis
The power of a test is the probability of rejecting the null hypothesis
when it is false.
It has to do with Type II errors: β is the probability of accepting the
null hypothesis when it is false. In an ideal world, we would obviously
make as small as possible.
The smaller we make the probability of committing a Type II error, the
greater we make the probability of committing a Type I error, and
rejecting the null hypothesis when, in fact, it is correct.
Most statisticians work with α=0.05 and β =0.2. Now the power of a
test is defined as 1− β =0.8
Confidence
A confidence interval with a particular confidence level is
intended to give the assurance that, if the statistical model is correct,
then taken over all the data that might have been obtained, the
procedure for constructing the interval would deliver a confidence
interval that included the true value of the parameter the proportion
of the time set by the confidence level.
Don't Complicate Things

Use the classical tests:
var.test to compare two variances (Fisher's F)
t.test to compare two means (Student's t)
wilcox.test to compare two means with non-
normal errors (Wilcoxon's rank test)
prop.test (binomial test) to compare two
proportions
cor.test (Pearson's or Spearman's rank
correlation) to correlate two variables
chisq.test (chi-square test) or fisher.test
(Fisher's exact test) to test for independence
in contingency tables
Comparing Two Variances
Before comparing means, verify that the variances are not
significantly different.
    var.text(set1, set2)
This performs Fisher's F test
If the variances are significantly different, you can transform the
output (y) variable to equalise variances, or you can still use the
t.test (Welch's modified test).
Comparing Two Means
Student's t-test (t.test) assumes the samples
are independent, the variances constant,
and the errors normally distributed. It will
use the Welch-Satterthwaite approximation
(default, less power) if the variances are
different. This test can also be used for paired
data.
Wilcoxon rank sum test (wilcox.test) is used
for independent samples, errors not normally
distributed. If you do a transform to get
constant variance, you will probably have to
use this test.
Student’s t
The test statistic is the number of standard errors by which the two
sample means are separated:
Power analysis
So how many replicates do we need in each of two samples to detect
a difference of 10% with power =80% when the mean is 20 (i.e. delta
=20) and the standard deviation is about 3.5?
    power.t.test(delta=2,sd=3.5,power=0.8)
You can work out what size of difference your sample of 30 would
allow you to detect, by specifying n and omitting delta:
    power.t.test(n=30,sd=3.5,power=0.8)
Paired Observations
The measurements will not be independent.
Use the t.test with paired=T. Now you’re doing a single sample test
of the differences against 0.
When you can do a paired t.test, you should always do the paired
test. It’s more powerful.
Deals with blocking, spatial correlation, and temporal correlation.
Sign Test
Used when you can't measure a difference but can see it.
Use the binomial test (binom.test) for this.
Binomial tests can also be used to compare proportions. prop.test
Chi-squared contingency tables
the contingencies are all the events that could possibly happen. A
contingency table shows the counts of how many times each of the
contingencies actually happened in a particular sample.
Chi-square Contingency Tables
Deals with count data.
Suppose there are two characteristics (hair colour and eye colour).
The null hypothesis is that they are uncorrelated.
Create a matrix that contains the data and apply
chisq.test(matrix).
This will give you a p-value for matrix values given the assumption of
independence.
Fisher's Exact Test
Used for analysis of contingency tables when one or more of the
expected frequencies is less than 5.
Use fisher.test(x)
compare two proportions
It turns out that 196 men were promoted out of 3270 candidates,
compared with 4 promotions out of only 40 candidates for the
women.
     prop.test(c(4,196),c(40,3270))
Correlation and covariance



covariance is a measure of how much two variables change
together
the Pearson product-moment correlation coefficient
(sometimes referred to as the PMCC, and typically denoted by r) is a
measure of the correlation (linear dependence) between two
variables
Correlation and Covariance
Are two parameters correlated significantly?
Create and attach the data.frame
Apply cor(data.frame)
To determine the significance of a
correlation, apply cor.test(data.frame)
You have three options: Kendall's tau
(method = "k"), Spearman's rank (method =
"s"), or (default) Pearson's product-moment
correlation (method = "p")
Kolmogorov-Smirnov Test
Are two sample distributions significantly different?
or
Does a sample distribution arise from a specific distribution?


ks.test(A,B)
Probability and basic statistics with R
Ad

More Related Content

What's hot (20)

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
Avjinder (Avi) Kaler
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
Abdullah al Mamun
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
muhammad raza
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
swapnac12
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
safi Ullah
 
Scatterplots, Correlation, and Regression
Scatterplots, Correlation, and RegressionScatterplots, Correlation, and Regression
Scatterplots, Correlation, and Regression
Long Beach City College
 
Testing a Claim About a Proportion
Testing a Claim About a ProportionTesting a Claim About a Proportion
Testing a Claim About a Proportion
Long Beach City College
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
Edureka!
 
Basic probability concept
Basic probability conceptBasic probability concept
Basic probability concept
Mmedsc Hahm
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
Sumit Prajapati
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
Avjinder (Avi) Kaler
 
Linear regression
Linear regression Linear regression
Linear regression
Vani011
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
nszakir
 
Discrete Random Variables And Probability Distributions
Discrete Random Variables And Probability DistributionsDiscrete Random Variables And Probability Distributions
Discrete Random Variables And Probability Distributions
mathscontent
 
Probability In Discrete Structure of Computer Science
Probability In Discrete Structure of Computer ScienceProbability In Discrete Structure of Computer Science
Probability In Discrete Structure of Computer Science
Prankit Mishra
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
Ananda Swarup
 
Probability And Probability Distributions
Probability And Probability Distributions Probability And Probability Distributions
Probability And Probability Distributions
Sahil Nagpal
 
Parameter estimation
Parameter estimationParameter estimation
Parameter estimation
RAVI PRASAD K.J.
 
Random Variables
Random VariablesRandom Variables
Random Variables
Tomoki Tsuchida
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
mathscontent
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
muhammad raza
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
swapnac12
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
safi Ullah
 
Scatterplots, Correlation, and Regression
Scatterplots, Correlation, and RegressionScatterplots, Correlation, and Regression
Scatterplots, Correlation, and Regression
Long Beach City College
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
Edureka!
 
Basic probability concept
Basic probability conceptBasic probability concept
Basic probability concept
Mmedsc Hahm
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
Sumit Prajapati
 
Linear regression
Linear regression Linear regression
Linear regression
Vani011
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
nszakir
 
Discrete Random Variables And Probability Distributions
Discrete Random Variables And Probability DistributionsDiscrete Random Variables And Probability Distributions
Discrete Random Variables And Probability Distributions
mathscontent
 
Probability In Discrete Structure of Computer Science
Probability In Discrete Structure of Computer ScienceProbability In Discrete Structure of Computer Science
Probability In Discrete Structure of Computer Science
Prankit Mishra
 
Probability And Probability Distributions
Probability And Probability Distributions Probability And Probability Distributions
Probability And Probability Distributions
Sahil Nagpal
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
mathscontent
 

Viewers also liked (20)

Analyzing Statistical Results
Analyzing Statistical ResultsAnalyzing Statistical Results
Analyzing Statistical Results
oehokie82
 
Data analysis, statistics, and probability review
Data analysis, statistics, and probability reviewData analysis, statistics, and probability review
Data analysis, statistics, and probability review
Institute of Applied Technology
 
Data Analysis And Probability Pp
Data Analysis And Probability PpData Analysis And Probability Pp
Data Analysis And Probability Pp
Portland State University
 
Data Analysis And Probability
Data Analysis And ProbabilityData Analysis And Probability
Data Analysis And Probability
guest048a607
 
NCTM Data Analysis
NCTM Data AnalysisNCTM Data Analysis
NCTM Data Analysis
Bill
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Neny Isharyanti
 
Basics of html5, data_storage, css3
Basics of html5, data_storage, css3Basics of html5, data_storage, css3
Basics of html5, data_storage, css3
Sreejith Nair
 
Basic Data Storage
Basic Data StorageBasic Data Storage
Basic Data Storage
neptonia
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
Ritvvij Parrikh
 
Sample Standard Deviation
Sample Standard DeviationSample Standard Deviation
Sample Standard Deviation
ccooking
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
Basics of storage Technology
Basics of storage TechnologyBasics of storage Technology
Basics of storage Technology
Lopamudra Das
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
Norahim Ibrahim
 
Decision Analysis
Decision AnalysisDecision Analysis
Decision Analysis
s junaid
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
Anand Arora
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
Rajesh Piryani
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Venkata Reddy Konasani
 
Chi square test
Chi square test Chi square test
Chi square test
Dr.Syam Chandran.C
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
kemdoby
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aiden Yeh
 
Analyzing Statistical Results
Analyzing Statistical ResultsAnalyzing Statistical Results
Analyzing Statistical Results
oehokie82
 
Data Analysis And Probability
Data Analysis And ProbabilityData Analysis And Probability
Data Analysis And Probability
guest048a607
 
NCTM Data Analysis
NCTM Data AnalysisNCTM Data Analysis
NCTM Data Analysis
Bill
 
Basics of html5, data_storage, css3
Basics of html5, data_storage, css3Basics of html5, data_storage, css3
Basics of html5, data_storage, css3
Sreejith Nair
 
Basic Data Storage
Basic Data StorageBasic Data Storage
Basic Data Storage
neptonia
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
Ritvvij Parrikh
 
Sample Standard Deviation
Sample Standard DeviationSample Standard Deviation
Sample Standard Deviation
ccooking
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
Basics of storage Technology
Basics of storage TechnologyBasics of storage Technology
Basics of storage Technology
Lopamudra Das
 
Decision Analysis
Decision AnalysisDecision Analysis
Decision Analysis
s junaid
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
Anand Arora
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
Rajesh Piryani
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
kemdoby
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aiden Yeh
 
Ad

Similar to Probability and basic statistics with R (20)

More Statistics
More StatisticsMore Statistics
More Statistics
mandrewmartin
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Sathish Rajamani
 
Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01
mandrewmartin
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
shoffma5
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
Sanjay Basukala
 
Testing of Hypothesis.pptx
Testing of Hypothesis.pptxTesting of Hypothesis.pptx
Testing of Hypothesis.pptx
hemamalini398951
 
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
DevangshuMitra2
 
Hypothesis testing - Environmental Data analysis
Hypothesis testing - Environmental Data analysisHypothesis testing - Environmental Data analysis
Hypothesis testing - Environmental Data analysis
Vitor Vieira Vasconcelos
 
312320.pptx
312320.pptx312320.pptx
312320.pptx
YogeshPatel28169
 
Test signal for the patient and the rest of the week after Christmas
Test signal for the patient and the rest of the week after ChristmasTest signal for the patient and the rest of the week after Christmas
Test signal for the patient and the rest of the week after Christmas
NajmudinAbdirahman
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaimrc_Rss_Jd
 
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesisTesting of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
svmmcradonco1
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
Jags Jagdish
 
Probability
ProbabilityProbability
Probability
Neha Raikar
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
heencomm
 
TEST OF SIGNIFICANCE.pptx
TEST OF SIGNIFICANCE.pptxTEST OF SIGNIFICANCE.pptx
TEST OF SIGNIFICANCE.pptx
muthukrishnaveni anand
 
Tests of significance
Tests of significanceTests of significance
Tests of significance
Shubhanshu Gupta
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Maria Theresa
 
hypothesis-tesing.pdf
hypothesis-tesing.pdfhypothesis-tesing.pdf
hypothesis-tesing.pdf
Sandeep Phogat
 
importance of P value and its uses in the realtime Significance
importance of P value and its uses in the realtime Significanceimportance of P value and its uses in the realtime Significance
importance of P value and its uses in the realtime Significance
SukumarReddy43
 
Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01
mandrewmartin
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
shoffma5
 
Testing of Hypothesis.pptx
Testing of Hypothesis.pptxTesting of Hypothesis.pptx
Testing of Hypothesis.pptx
hemamalini398951
 
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
DevangshuMitra2
 
Hypothesis testing - Environmental Data analysis
Hypothesis testing - Environmental Data analysisHypothesis testing - Environmental Data analysis
Hypothesis testing - Environmental Data analysis
Vitor Vieira Vasconcelos
 
Test signal for the patient and the rest of the week after Christmas
Test signal for the patient and the rest of the week after ChristmasTest signal for the patient and the rest of the week after Christmas
Test signal for the patient and the rest of the week after Christmas
NajmudinAbdirahman
 
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesisTesting of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
svmmcradonco1
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
Jags Jagdish
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
heencomm
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Maria Theresa
 
importance of P value and its uses in the realtime Significance
importance of P value and its uses in the realtime Significanceimportance of P value and its uses in the realtime Significance
importance of P value and its uses in the realtime Significance
SukumarReddy43
 
Ad

More from Alberto Labarga (20)

El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017
Alberto Labarga
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob Dylan
Alberto Labarga
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challenges
Alberto Labarga
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativa
Alberto Labarga
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin Street
Alberto Labarga
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligente
Alberto Labarga
 
jpd5 big data
jpd5 big datajpd5 big data
jpd5 big data
Alberto Labarga
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015
Alberto Labarga
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9
Alberto Labarga
 
myHealthHackmedicine
myHealthHackmedicinemyHealthHackmedicine
myHealthHackmedicine
Alberto Labarga
 
Big Data y Salud
Big Data y SaludBig Data y Salud
Big Data y Salud
Alberto Labarga
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motores
Alberto Labarga
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con Arduino
Alberto Labarga
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon Dice
Alberto Labarga
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con Arduino
Alberto Labarga
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014
Alberto Labarga
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Alberto Labarga
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Alberto Labarga
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3D
Alberto Labarga
 
Vidas Contadas
Vidas ContadasVidas Contadas
Vidas Contadas
Alberto Labarga
 
El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017
Alberto Labarga
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob Dylan
Alberto Labarga
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challenges
Alberto Labarga
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativa
Alberto Labarga
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin Street
Alberto Labarga
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligente
Alberto Labarga
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015
Alberto Labarga
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9
Alberto Labarga
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motores
Alberto Labarga
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con Arduino
Alberto Labarga
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon Dice
Alberto Labarga
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con Arduino
Alberto Labarga
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014
Alberto Labarga
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Alberto Labarga
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Alberto Labarga
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3D
Alberto Labarga
 

Recently uploaded (20)

BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
LDMMIA Reiki Yoga S5 Daily Living Workshop
LDMMIA Reiki Yoga S5 Daily Living WorkshopLDMMIA Reiki Yoga S5 Daily Living Workshop
LDMMIA Reiki Yoga S5 Daily Living Workshop
LDM Mia eStudios
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
dynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south Indiadynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south India
PrachiSontakke5
 
Rock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian HistoryRock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian History
Virag Sontakke
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
Dr. Nasir Mustafa
 
Junction Field Effect Transistors (JFET)
Junction Field Effect Transistors (JFET)Junction Field Effect Transistors (JFET)
Junction Field Effect Transistors (JFET)
GS Virdi
 
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast BrooklynBridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
i4jd41bk
 
spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)
Mohamed Rizk Khodair
 
Drugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdfDrugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdf
crewot855
 
Ancient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian HistoryAncient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian History
Virag Sontakke
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18
Celine George
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
Myopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduateMyopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduate
Mohamed Rizk Khodair
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
Tax evasion, Tax planning & Tax avoidance.pptx
Tax evasion, Tax  planning &  Tax avoidance.pptxTax evasion, Tax  planning &  Tax avoidance.pptx
Tax evasion, Tax planning & Tax avoidance.pptx
manishbaidya2017
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
LDMMIA Reiki Yoga S5 Daily Living Workshop
LDMMIA Reiki Yoga S5 Daily Living WorkshopLDMMIA Reiki Yoga S5 Daily Living Workshop
LDMMIA Reiki Yoga S5 Daily Living Workshop
LDM Mia eStudios
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
dynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south Indiadynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south India
PrachiSontakke5
 
Rock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian HistoryRock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian History
Virag Sontakke
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
Dr. Nasir Mustafa
 
Junction Field Effect Transistors (JFET)
Junction Field Effect Transistors (JFET)Junction Field Effect Transistors (JFET)
Junction Field Effect Transistors (JFET)
GS Virdi
 
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast BrooklynBridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
Bridging the Transit Gap: Equity Drive Feeder Bus Design for Southeast Brooklyn
i4jd41bk
 
spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)
Mohamed Rizk Khodair
 
Drugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdfDrugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdf
crewot855
 
Ancient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian HistoryAncient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian History
Virag Sontakke
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18
Celine George
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
Myopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduateMyopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduate
Mohamed Rizk Khodair
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
Tax evasion, Tax planning & Tax avoidance.pptx
Tax evasion, Tax  planning &  Tax avoidance.pptxTax evasion, Tax  planning &  Tax avoidance.pptx
Tax evasion, Tax planning & Tax avoidance.pptx
manishbaidya2017
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 

Probability and basic statistics with R

  • 1. Quantitative Data Analysis Probability and basic statistics
  • 2. probability The most familiar way of thinking about probability is within a framework of repeatable random experiments. In this view the probability of an event is defined as the limiting proportion of times the event would occur given many repetitions.
  • 3. Probability Instead of exclusively relying on knowledge of the proportion of times an event occurs in repeated sampling, this approach allows the incorporation of subjective knowledge, so-called prior probabilities, that are then updated. The common name for this approach is Bayesian statistics.
  • 4. The Fundamental Rules of Probability Rule 1: Probability is always positive Rule 2: For a given sample space, the sum of probabilities is 1 Rule 3: For disjoint (mutually exclusive) events, P(AUB)=P (A) + P (B)
  • 5. Counting Permutations (order is important) Combinations (order is not important)
  • 6. Probability functions The factorial function factorial(n) gamma(n+1) Combinations can be calculated with choose(x,n)
  • 7. Simple statistics mean(x) arithmetic average of the values in x median(x) median value in x var(x) sample variance of x cor(x,y) correlation between vectors x and y quantile(x) vector containing the minimum, lower quartile, median, upper quartile, and maximum of x rowMeans(x) row means of dataframe or matrix x colMeans(x) column means
  • 8. cumulative probability function The cumulative probability function is, for any value of x, the probability of obtaining a sample value that is less than or equal to x. curve(pnorm(x),-3,3)
  • 9. probability density function The probability density is the slope of this curve (its ‘derivative’). curve(dnorm(x),-3,3)
  • 10. Continuous Probability Distributions
  • 11. Continuous Probability Distributions R has a wide range of built-in probability distributions, for each of which four functions are available: the probability density function (which has a d prefix); the cumulative probability (p); the quantiles of the distribution (q); and random numbers generated from the distribution (r).
  • 14. Exercise Suppose we have measured the heights of 100 people. The mean height was 170 cm and the standard deviation was 8 cm. We can ask three sorts of questions about data like these: what is the probability that a randomly selected individual will be: shorter than a particular height? taller than a particular height? between one specified height and another?
  • 15. Exercise normal.R
  • 16. The central limit theorem If you take repeated samples from a population with finite variance and calculate their averages, then the averages will be normally distributed.
  • 17. Checking normality fishes.R
  • 19. The gamma distribution The gamma distribution is useful for describing a wide range of processes where the data are positively skew (i.e. non-normal, with a long tail on the right).
  • 21. The gamma distribution α is the shape parameter and β −1 is the scale parameter. Special cases of the gamma distribution are the exponential =1 and chi- squared =/2, =2. The mean of the distribution is αβ , the variance is αβ 2, the skewness is 2/√α and the kurtosis is 6/α.
  • 24. Exercise fishes2.R
  • 27. cumulative probability function The cumulative probability function is, for any value of x, the probability of obtaining a sample value that is less than or equal to x. curve(pnorm(x),-3,3)
  • 28. probability density function The probability density is the slope of this curve (its ‘derivative’). curve(dnorm(x),-3,3)
  • 29. Exercise Suppose we have measured the heights of 100 people. The mean height was 170 cm and the standard deviation was 8 cm. We can ask three sorts of questions about data like these: what is the probability that a randomly selected individual will be: shorter than a particular height? taller than a particular height? between one specified height and another?
  • 30. Exercise normal.R
  • 31. Why Test? Statistics is an experimental science, not really a branch of mathematics. It’s a tool that can tell you whether data are accidentally or really similar. It does not give you certainty.
  • 32. Steps in hypothesis testing! 1. Set the null hypothesis and the alternative hypothesis. 2. Calculate the p-value. 3. Decision rule: If the p-value is less than 5% then reject the null hypothesis otherwise the null hypothesis remains valid. In any case, you must give the p-value as a justification for your decision.
  • 33. Types of Errors… A Type I error occurs when we reject a true null hypothesis (i.e. Reject H0 when it is TRUE) H0 T F Reject I Reject II A Type II error occurs when we don’t reject a false null hypothesis (i.e. Do NOT reject H0 when it is FALSE) 11.33
  • 34. Critical regions and power The table shows schematically relation between relevant probabilities under null and alternative hypothesis. do not reject reject Null hypothesis is true 1-  (Type I error) Null hypothesis is false  (Type II error) 1- 
  • 35. Significance It is common in hypothesis testing to set probability of Type I error,  to some values called the significance levels. These levels usually set to 0.1, 0.05 and 0.01. If null hypothesis is true and probability of observing value of the current test statistic is lower than the significance levels then hypothesis is rejected. Sometimes instead of setting pre-defined significance level, p-value is reported. It is also called observed significance level.
  • 36. 36 n e e n e p pt Significance Level © A i When we reject the null hypothesis there is a risk of drawing a wrong Ta conclusion a ni Risk of drawing a wrong conclusion (called p-value or observed a significance level) can be calculated Researcher decides the maximum risk (called significance level) he is ready to take Usual significance level is 5%
  • 37. P-value We start from the basic assumption: The null hypothesis is true P-value is the probability of getting a value equal to or more extreme than the sample result, given that the null hypothesis is true Decision rule: If p-value is less than 5% then reject the null hypothesis; if p-value is 5% or more then the null hypothesis remains valid In any case, you must give the p-value as a justification for your decision.
  • 38. Interpreting the p-value… Overwhelming Evidence (Highly Significant) Strong Evidence (Significant) Weak Evidence (Not Significant) No Evidence (Not Significant) 0 .01 .05 .10
  • 39. Power analysis The power of a test is the probability of rejecting the null hypothesis when it is false. It has to do with Type II errors: β is the probability of accepting the null hypothesis when it is false. In an ideal world, we would obviously make as small as possible. The smaller we make the probability of committing a Type II error, the greater we make the probability of committing a Type I error, and rejecting the null hypothesis when, in fact, it is correct. Most statisticians work with α=0.05 and β =0.2. Now the power of a test is defined as 1− β =0.8
  • 40. Confidence A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the data that might have been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the parameter the proportion of the time set by the confidence level.
  • 41. Don't Complicate Things Use the classical tests: var.test to compare two variances (Fisher's F) t.test to compare two means (Student's t) wilcox.test to compare two means with non- normal errors (Wilcoxon's rank test) prop.test (binomial test) to compare two proportions cor.test (Pearson's or Spearman's rank correlation) to correlate two variables chisq.test (chi-square test) or fisher.test (Fisher's exact test) to test for independence in contingency tables
  • 42. Comparing Two Variances Before comparing means, verify that the variances are not significantly different. var.text(set1, set2) This performs Fisher's F test If the variances are significantly different, you can transform the output (y) variable to equalise variances, or you can still use the t.test (Welch's modified test).
  • 43. Comparing Two Means Student's t-test (t.test) assumes the samples are independent, the variances constant, and the errors normally distributed. It will use the Welch-Satterthwaite approximation (default, less power) if the variances are different. This test can also be used for paired data. Wilcoxon rank sum test (wilcox.test) is used for independent samples, errors not normally distributed. If you do a transform to get constant variance, you will probably have to use this test.
  • 44. Student’s t The test statistic is the number of standard errors by which the two sample means are separated:
  • 45. Power analysis So how many replicates do we need in each of two samples to detect a difference of 10% with power =80% when the mean is 20 (i.e. delta =20) and the standard deviation is about 3.5? power.t.test(delta=2,sd=3.5,power=0.8) You can work out what size of difference your sample of 30 would allow you to detect, by specifying n and omitting delta: power.t.test(n=30,sd=3.5,power=0.8)
  • 46. Paired Observations The measurements will not be independent. Use the t.test with paired=T. Now you’re doing a single sample test of the differences against 0. When you can do a paired t.test, you should always do the paired test. It’s more powerful. Deals with blocking, spatial correlation, and temporal correlation.
  • 47. Sign Test Used when you can't measure a difference but can see it. Use the binomial test (binom.test) for this. Binomial tests can also be used to compare proportions. prop.test
  • 48. Chi-squared contingency tables the contingencies are all the events that could possibly happen. A contingency table shows the counts of how many times each of the contingencies actually happened in a particular sample.
  • 49. Chi-square Contingency Tables Deals with count data. Suppose there are two characteristics (hair colour and eye colour). The null hypothesis is that they are uncorrelated. Create a matrix that contains the data and apply chisq.test(matrix). This will give you a p-value for matrix values given the assumption of independence.
  • 50. Fisher's Exact Test Used for analysis of contingency tables when one or more of the expected frequencies is less than 5. Use fisher.test(x)
  • 51. compare two proportions It turns out that 196 men were promoted out of 3270 candidates, compared with 4 promotions out of only 40 candidates for the women. prop.test(c(4,196),c(40,3270))
  • 52. Correlation and covariance covariance is a measure of how much two variables change together the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typically denoted by r) is a measure of the correlation (linear dependence) between two variables
  • 53. Correlation and Covariance Are two parameters correlated significantly? Create and attach the data.frame Apply cor(data.frame) To determine the significance of a correlation, apply cor.test(data.frame) You have three options: Kendall's tau (method = "k"), Spearman's rank (method = "s"), or (default) Pearson's product-moment correlation (method = "p")
  • 54. Kolmogorov-Smirnov Test Are two sample distributions significantly different? or Does a sample distribution arise from a specific distribution? ks.test(A,B)