Lect 3 background mathematics for Data Mininghktripathy
The document discusses various statistical measures used to describe data, including measures of central tendency and dispersion.
It introduces the mean, median, and mode as common measures of central tendency. The mean is the average value, the median is the middle value, and the mode is the most frequent value. It also discusses weighted means.
It then discusses various measures of data dispersion, including range, variance, standard deviation, quartiles, and interquartile range. The standard deviation specifically measures how far data values typically are from the mean and is important for describing the width of a distribution.
The document discusses basic statistical descriptions of data including measures of central tendency (mean, median, mode), dispersion (range, variance, standard deviation), and position (quartiles, percentiles). It explains how to calculate and interpret these measures. It also covers estimating these values from grouped frequency data and identifying outliers. The key goals are to better understand relationships within a data set and analyze data at multiple levels of precision.
This document provides an overview of statistical methods used in research. It discusses descriptive statistics such as frequency distributions and measures of central tendency. It also covers inferential statistics including hypothesis testing, choice of statistical tests, and determining sample size. Various types of variables, measurement scales, charts, and distributions are defined. Inferential topics include correlation, regression, and multivariate techniques like multiple regression and factor analysis.
This document provides an overview of statistics concepts including measures of central tendency (mean, median, mode), calculating these measures, outliers and their effect, trimmed means, weighted means, and percentiles. It includes examples and step-by-step solutions for calculating various statistical measures. Key topics covered are finding the mean, median, and mode of data sets, how outliers impact these measures, calculating trimmed and weighted means, and an introduction to percentiles.
This document discusses evaluating the normality of data distributions. It covers probability, normal distributions, z-scores, empirical rules, and tests for skewness and kurtosis. Normal distributions are symmetric and bell-shaped. The normality of data can be determined using z-scores and empirical rules. Skewness measures asymmetry in a distribution, while kurtosis measures tail weight. Normality tests like Shapiro-Wilk can determine if a dataset comes from a normal distribution.
Frequencies provides statistics and graphical displays to describe variables. It can order values by ascending/descending order or frequency. Key outputs include mean, median, mode, quartiles, standard deviation, variance, skewness, and kurtosis. Quartiles divide data into four equal groups. Skewness measures asymmetry while kurtosis measures clustering around the mean. Charts like pie charts, bar charts, and histograms can visualize the data distribution. Crosstabs forms two-way and multi-way tables to analyze relationships between variables.
This document provides an overview of descriptive statistics techniques used to organize and summarize data. It discusses frequency distributions, measures of central tendency including the mean, median, and mode, and measures of variability such as the range, interquartile range, variance and standard deviation. Graphs are presented as tools for visualizing distributions. The appropriate techniques depend on the scale of measurement and characteristics of the data.
A teacher calculated the standard deviation of test scores to see how close students scored to the mean grade of 65%. She found the standard deviation was high, indicating outliers pulled the mean down. An employer also calculated standard deviation to analyze salary fairness, finding it slightly high due to long-time employees making more. Standard deviation measures dispersion from the mean, with low values showing close grouping and high values showing a wider spread. It is calculated using the variance formula of summing the squared differences from the mean divided by the number of values.
This document provides an overview of key concepts in sampling and descriptive statistics. It defines populations, samples, parameters, and statistics. It explains why samples are used instead of whole populations for research. Common sampling methods like simple random and systematic sampling are also described. The document then covers descriptive statistics, including frequency distributions, measures of central tendency, and measures of dispersion. It discusses the normal distribution and how the central limit theorem applies. Key terms are defined, such as standard deviation, variance, and standardized scores.
Statistical analysis is an important tool for researchers to analyze collected data. There are two major areas of statistics: descriptive statistics which develops indices to describe data, and inferential statistics which tests hypotheses and generalizes findings. Descriptive statistics measures central tendency (mean, median, mode), dispersion (range, standard deviation), and skewness. Relationship between variables is measured using correlation and regression analysis. Statistical tools help summarize large datasets, identify patterns, and make reliable inferences.
This document discusses various measures of central tendency including the mean, median, and mode. It provides definitions and formulas for calculating each measure. The mean is the sum of all values divided by the number of values and is the most widely used measure. The median is the middle value when data is arranged from lowest to highest. The mode is the value that occurs most frequently. Examples are given demonstrating how to calculate each measure for both individual values and grouped data.
This document provides an overview of descriptive statistics techniques for summarizing categorical and quantitative data. It discusses frequency distributions, measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for visualizing data through charts, graphs, and other displays. The goal of descriptive statistics is to organize and describe the characteristics of data through counts, averages, and other summaries.
Statistical concepts and their applications in various fields:
- Statistics involves collecting and analyzing numerical data to draw valid conclusions. It requires careful research planning and design.
- Descriptive statistics summarize data through measures of central tendency (mean, median, mode) and variability (range, standard deviation).
- Inferential statistics test hypotheses and make estimates about populations based on samples.
- Biostatistics is applied in community medicine, public health, cancer research, pharmacology, and demography to study disease trends, treatment effectiveness, and population attributes. It is also used in advanced biomedical technologies and ecology.
This document provides an introduction to measures of central tendency and dispersion used in descriptive statistics. It defines and explains key terms including mean, median, mode, range, standard deviation, variance, percentiles, and distributions. Examples are given using a fictional dataset on professors' weights to demonstrate how to calculate and interpret these descriptive statistics. Different ways of organizing and visually presenting data through tables, graphs, histograms, pie charts and scatter plots are also outlined.
This document provides an introduction to measures of central tendency and dispersion used in descriptive statistics. It defines and explains key terms including mean, median, mode, range, standard deviation, variance, percentiles, and distributions. Examples are given using a fictional dataset on professors' weights to demonstrate how to calculate and interpret these descriptive statistics. Different ways of organizing and visually presenting data through tables, graphs, histograms, pie charts and scatter plots are also outlined.
This document discusses the normal distribution and how to standardize data. It explains that normally distributed data forms a bell curve around a central mean. It also describes how the standard deviation measures how spread out data is, with 68% of values within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. The document demonstrates how to convert a value into a z-score or standard score by subtracting the mean and dividing by the standard deviation, effectively standardizing the data.
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
Descriptive statistics are used to summarize and describe characteristics of a data set. They include measures of central tendency like the mean, median, and mode as well as measures of variability such as range, standard deviation, and variance. Descriptive statistics help analyze and understand patterns in data through tables, charts, and summaries without drawing inferences about the underlying population.
This document provides an overview of descriptive and inferential statistics concepts. It discusses parameters versus statistics, descriptive versus inferential statistics, measures of central tendency (mean, median, mode), variability (standard deviation, range), distributions (normal, positively/negatively skewed), z-scores, correlations, hypothesis testing, t-tests, ANOVA, chi-square tests, and presenting results. Key terms like alpha levels, degrees of freedom, effect sizes, and probabilities are also introduced at a high level.
This ppt includes basic concepts about data types, levels of measurements. It also explains which descriptive measure, graph and tests should be used for different types of data. A brief of Pivot tables and charts is also included.
The document discusses measures of variability in statistics including range, interquartile range, standard deviation, and variance. It provides examples of calculating each measure using sample data sets. The range is the difference between the highest and lowest values, while the interquartile range is the difference between the third and first quartiles. The standard deviation represents the average amount of dispersion from the mean, and variance is the average of the squared deviations from the mean. Both standard deviation and variance increase with greater variability in the data set.
Frequencies provides statistics and graphical displays to describe variables. It can order values by ascending/descending order or frequency. Key outputs include mean, median, mode, quartiles, standard deviation, variance, skewness, and kurtosis. Quartiles divide data into four equal groups. Skewness measures asymmetry while kurtosis measures clustering around the mean. Charts like pie charts, bar charts, and histograms can visualize the data distribution. Crosstabs forms two-way and multi-way tables to analyze relationships between variables.
This document provides an overview of descriptive statistics techniques used to organize and summarize data. It discusses frequency distributions, measures of central tendency including the mean, median, and mode, and measures of variability such as the range, interquartile range, variance and standard deviation. Graphs are presented as tools for visualizing distributions. The appropriate techniques depend on the scale of measurement and characteristics of the data.
A teacher calculated the standard deviation of test scores to see how close students scored to the mean grade of 65%. She found the standard deviation was high, indicating outliers pulled the mean down. An employer also calculated standard deviation to analyze salary fairness, finding it slightly high due to long-time employees making more. Standard deviation measures dispersion from the mean, with low values showing close grouping and high values showing a wider spread. It is calculated using the variance formula of summing the squared differences from the mean divided by the number of values.
This document provides an overview of key concepts in sampling and descriptive statistics. It defines populations, samples, parameters, and statistics. It explains why samples are used instead of whole populations for research. Common sampling methods like simple random and systematic sampling are also described. The document then covers descriptive statistics, including frequency distributions, measures of central tendency, and measures of dispersion. It discusses the normal distribution and how the central limit theorem applies. Key terms are defined, such as standard deviation, variance, and standardized scores.
Statistical analysis is an important tool for researchers to analyze collected data. There are two major areas of statistics: descriptive statistics which develops indices to describe data, and inferential statistics which tests hypotheses and generalizes findings. Descriptive statistics measures central tendency (mean, median, mode), dispersion (range, standard deviation), and skewness. Relationship between variables is measured using correlation and regression analysis. Statistical tools help summarize large datasets, identify patterns, and make reliable inferences.
This document discusses various measures of central tendency including the mean, median, and mode. It provides definitions and formulas for calculating each measure. The mean is the sum of all values divided by the number of values and is the most widely used measure. The median is the middle value when data is arranged from lowest to highest. The mode is the value that occurs most frequently. Examples are given demonstrating how to calculate each measure for both individual values and grouped data.
This document provides an overview of descriptive statistics techniques for summarizing categorical and quantitative data. It discusses frequency distributions, measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for visualizing data through charts, graphs, and other displays. The goal of descriptive statistics is to organize and describe the characteristics of data through counts, averages, and other summaries.
Statistical concepts and their applications in various fields:
- Statistics involves collecting and analyzing numerical data to draw valid conclusions. It requires careful research planning and design.
- Descriptive statistics summarize data through measures of central tendency (mean, median, mode) and variability (range, standard deviation).
- Inferential statistics test hypotheses and make estimates about populations based on samples.
- Biostatistics is applied in community medicine, public health, cancer research, pharmacology, and demography to study disease trends, treatment effectiveness, and population attributes. It is also used in advanced biomedical technologies and ecology.
This document provides an introduction to measures of central tendency and dispersion used in descriptive statistics. It defines and explains key terms including mean, median, mode, range, standard deviation, variance, percentiles, and distributions. Examples are given using a fictional dataset on professors' weights to demonstrate how to calculate and interpret these descriptive statistics. Different ways of organizing and visually presenting data through tables, graphs, histograms, pie charts and scatter plots are also outlined.
This document provides an introduction to measures of central tendency and dispersion used in descriptive statistics. It defines and explains key terms including mean, median, mode, range, standard deviation, variance, percentiles, and distributions. Examples are given using a fictional dataset on professors' weights to demonstrate how to calculate and interpret these descriptive statistics. Different ways of organizing and visually presenting data through tables, graphs, histograms, pie charts and scatter plots are also outlined.
This document discusses the normal distribution and how to standardize data. It explains that normally distributed data forms a bell curve around a central mean. It also describes how the standard deviation measures how spread out data is, with 68% of values within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. The document demonstrates how to convert a value into a z-score or standard score by subtracting the mean and dividing by the standard deviation, effectively standardizing the data.
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
Descriptive statistics are used to summarize and describe characteristics of a data set. They include measures of central tendency like the mean, median, and mode as well as measures of variability such as range, standard deviation, and variance. Descriptive statistics help analyze and understand patterns in data through tables, charts, and summaries without drawing inferences about the underlying population.
This document provides an overview of descriptive and inferential statistics concepts. It discusses parameters versus statistics, descriptive versus inferential statistics, measures of central tendency (mean, median, mode), variability (standard deviation, range), distributions (normal, positively/negatively skewed), z-scores, correlations, hypothesis testing, t-tests, ANOVA, chi-square tests, and presenting results. Key terms like alpha levels, degrees of freedom, effect sizes, and probabilities are also introduced at a high level.
This ppt includes basic concepts about data types, levels of measurements. It also explains which descriptive measure, graph and tests should be used for different types of data. A brief of Pivot tables and charts is also included.
The document discusses measures of variability in statistics including range, interquartile range, standard deviation, and variance. It provides examples of calculating each measure using sample data sets. The range is the difference between the highest and lowest values, while the interquartile range is the difference between the third and first quartiles. The standard deviation represents the average amount of dispersion from the mean, and variance is the average of the squared deviations from the mean. Both standard deviation and variance increase with greater variability in the data set.
Applications of Radioisotopes in Cancer Research.pptxMahitaLaveti
:
This presentation explores the diverse and impactful applications of radioisotopes in cancer research, spanning from early detection to therapeutic interventions. It covers the principles of radiotracer development, radiolabeling techniques, and the use of isotopes such as technetium-99m, fluorine-18, iodine-131, and lutetium-177 in molecular imaging and radionuclide therapy. Key imaging modalities like SPECT and PET are discussed in the context of tumor detection, staging, treatment monitoring, and evaluation of tumor biology. The talk also highlights cutting-edge advancements in theranostics, the use of radiolabeled antibodies, and biodistribution studies in preclinical cancer models. Ethical and safety considerations in handling radioisotopes and their translational significance in personalized oncology are also addressed. This presentation aims to showcase how radioisotopes serve as indispensable tools in advancing cancer diagnosis, research, and targeted treatment.
Dendritic cells are immune cells with unique features of their own. They possess the ability of cross-presentation. They can bridge the innate and adaptive arms of the immune system. Their crucial role in the immune response has implicated them in autoimmune diseases and cancer. Kosheeka delivers dendritic cells from diverse species to assist in your research endeavors. Our team provides high-quality dendritic cells with assured viability, purity, and functionality.
Astrobiological implications of the stability andreactivity of peptide nuclei...Sérgio Sacani
Recent renewed interest regarding the possibility of life in the Venusian clouds has led to new studies on organicchemistry in concentrated sulfuric acid. However, life requires complex genetic polymers for biological function.Therefore, finding suitable candidates for genetic polymers stable in concentrated sulfuric acid is a necessary firststep to establish that biologically functional macromolecules can exist in this environment. We explore peptidenucleic acid (PNA) as a candidate for a genetic-like polymer in a hypothetical sulfuric acid biochemistry. PNA hex-amers undergo between 0.4 and 28.6% degradation in 98% (w/w) sulfuric acid at ~25°C, over the span of 14 days,depending on the sequence, but undergo complete solvolysis above 80°C. Our work is the first key step towardthe identification of a genetic-like polymer that is stable in this unique solvent and further establishes that con-centrated sulfuric acid can sustain a diverse range of organic chemistry that might be the basis of a form of lifedifferent from Earth’s
A tale of two Lucies: talk at the maths dept, Free University of AmsterdamRichard Gill
Despite the title, this talk will focus on the case of Lucy Letby. It focusses on the way the police investigation determined "suspicious incidents" and enters into the actual medical condition of those babies. I hope to also discuss the mathematics of sandwich ELISA immunoassay and of neonatal insulin metabolism.
The Man Who Dared to Challenge Newton: The True Story of Thane Heins, the Canadian Genius
Who Changed the World
By Johnny Poppi – for international press
In a small town in Ontario, among wheat fields and wind-filled silences, a man has worked for decades in
anonymity, armed only with naive curiosity, motors, copper wires, and questions too big to ignore. His
name is Thane C/ Heins, and according to some scientists who have seen him in action, he may have
made—and indeed has made—the most important scientific discovery in the history of humanity.
A discovery which will eventually eliminate the need for oil, coal, and uranium, and at the very least their
harmful effects while eliminating the need to recharge electric vehicles, and even rewrite—as it has already
begun—the very laws of physics as we’ve known them since Aristotle in 300 BC.
Sound like science fiction? Then listen to this story.
3. Standard Deviation
• In statistics, the standard deviation is a measure of the amount of
variation of a random variable expected about its mean.
• A low standard deviation indicates that the values tend to be close to
the mean of the set, while a high standard deviation indicates that
the values are spread out over a wider range
4. Raw Scores
• The definition of a raw score in statistics is an unaltered
measurement.
• Raw scores have not been weighted, manipulated, calculated,
transformed, or converted. An entire data set that has been unaltered
is a raw data set.
5. Z - Score
• Z-score is a statistical measure that quantifies the distance between a
data point and the mean of a dataset.
• It's expressed in terms of standard deviations. It indicates how many
standard deviations a data point is from the mean of the distribution.
6. Z - Score
• For a recent final exam in STAT 500, the mean was 68.55 with
a standard deviation of 15.45.
• If you scored an 80%: 𝑍=(80−68.55)/15.45=0.74, which
means your score of 80 was 0.74 SD above the mean.
• If you scored a 60%: 𝑍=(60−68.55)/15.45=−0.55, which
means your score of 60 was 0.55 SD below the mean.
7. Z - Score
• The scores can be positive or negative.
• For data that is symmetric (i.e. bell-shaped) or nearly symmetric, a
common application of Z-scores for identifying potential outliers is for
any Z-scores that are beyond ± 3.
8. Using z-scores to standardise a distribution
• Every X value in a distribution can be transformed into a
corresponding z-score
• Any normal distribution can be standardized by converting its values
into z scores.
• Z scores tell you how many standard deviations from the mean each
value lies.
• Converting a normal distribution into a z-distribution allows you to
calculate the probability of certain values occurring and to compare
different data sets
9. Using z-scores to make comparison
• we can compare performance [values] in two different distributions,
based on their z-scores.
• Lower z-score means closer to the meanwhile higher means more far
away.
• Positive means to the right of the mean or greater while negative
means lower or smaller than the mean
10. Using z-scores to make comparison
• Jared scored a 92 on a test with a mean of 88 and a standard
deviation of 2.7. Jasper scored an 86 on a test with a mean
of 82 and a standard deviation of 1.8. Find the Z-scores for
Jared's and Jasper's test scores, and use them to determine
who did better on their test relative to their class.
11. Using z-scores to make comparison
• Step 1: Compute each test score's Z-score using the mean
and standard deviation for that test.
• For Jared's test, the Z-score is:
𝑍=(𝑥−𝜇)/𝜎 = (92−88)/2.7=4/2.7 = 1.48
• For Jasper's test, the Z-score is:
𝑍=(𝑥−𝜇)/𝜎 = (86−82)/1.8 = 4/1.8 = 2.22
12. Using z-scores to make comparison
• Step 2: Use Z-scores to compare across data sets.
• Jared's Z-score of 1.48 says that his score of 92 was between
1 and 2 standard deviations above the mean. Jasper's Z-score
of 2.22 says that his score of 86 was a bit more than 2
standard deviations above the mean. So, Jasper's score of 86
was relatively higher for his class than Jared's 92 was for his
class.
13. Probability
• Probability is simply how likely something is to happen.
• Whenever we're unsure about the outcome of an event, we
can talk about the probabilities of certain outcomes—how
likely they are.
• The analysis of events governed by probability is called
statistics.
14. What are Equally Likely Events?
• When the events have the same theoretical probability of happening, then
they are called equally likely events. The results of a sample space are
called equally likely if all of them have the same probability of occurring.
For example, if you throw a die, then the probability of getting 1 is 1/6.
Similarly, the probability of getting all the numbers from 2,3,4,5 and 6, one
at a time is 1/6. Hence, the following are some examples of equally likely
events when throwing a die:
• Getting 3 and 5 on throwing a die
• Getting an even number and an odd number on a die
• Getting 1, 2 or 3 on rolling a die
are equally likely events, since the probabilities of each event are equal
15. Random sampling
Simple random sample
• Each member of the population has an equal chance of being
selected
Independent random sample
• Each member of the population has an equal chance of being
selected
AND
• The probability of being selected stays constant from one selection
to the next [if more than one individual is selected]
• i.e. Sampling with replacement
16. Independent Random Sampling
• Probability of event A =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝐴
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
17. Probability and Frequency distributions
• Probability usually involves a population of scores displayed in a
frequency distribution graph.
• What is the probability of obtaining an individual score of less than 3?
[i.e. either 1 or 2?]
N = 20
18. Probability and the normal distribution
• In any normal distribution the percentage of values that lie within a
specified number of standard deviations from the mean is the same
19. Graphing Probability …
68 – 95 -99.7% Rule of Thumb revisited
• One standard deviation either side of the mean captures:
• Approx 68% of our data
• Mathematically: 68.26%
• Two standard deviations either side of the mean captures:
• Approx 95% of our data
• Mathematically: 95.44%
• Three standard deviations either side of the mean captures:
• Approx 99.7% of our data
• Mathematically: 99.73%
20. 68% – 95% -99.7% Rule of Thumb revisited
68.26% – 95.44% – 99.73% Maths calculation
21. Probability
What is the probability that a randomly selected data value in a normal distribution
lies more than 1 standard deviation below the mean?
p(z < - 1.00)
What is the probability that a randomly selected data value in a normal distribution
lies more than 1 standard deviation above the mean?
p(z > 1.00)
22. Calculating probability in a normal distribution
• When calculating the probability we should calculate the Z-Score
Standardise the distribution [z-score calculation],
z =
𝑋−𝜇
𝜎
If scores on a test were normally distributed with:
• mean of 𝜇 = 60, and a standard deviation of 𝜎 = 12,
• what is the probability [of a randomly selected person who took the
test] of a score greater than 84?
24. Probability using Unit Normal Table
• Quite often the values we are interested in are not exactly 1, 2 or 3
standard deviations away from the mean. Statistical tables [or online
probability calculators] can be used to calculate the probability
25. Probability using Unit Normal Table
The body always corresponds to the larger part of the distribution
• can be located on the left or the right of the distributions
The tail always corresponds to the smaller part of the distribution
• again, can be located on the left or the right of the distributions
27. Example
Information from the department of Motor Vehicles indicates that the
average age of licensed drivers is 𝜇 = 45.7 years with a standard
deviation of 𝜎 =12.5 years. Assuming that the distribution of drivers’
ages is approximately normal,
1. What proportion of licensed drivers are older than 50 years old?
z =
𝑋−𝜇
𝜎
=
50−45.7
12.5
=
4.3
12.5
= 0.34
2. What proportion of licensed drivers are younger than 30 years old?
z =
𝑋−𝜇
𝜎
=
30−45.7
12.5
=
−15.7
12.5
= -1.26 [so, 30 is 1.26 sds below]
28. Examples
• The length of a human pregnancy is normally distributed with a mean
of 272 days with a standard deviation of 9 days .
1. State the random variable.
2. Find the probability of a pregnancy lasting more than 280 days.
3. Find the probability of a pregnancy lasting less than 250 days.
4. Find the probability that a pregnancy lasts between 265 and 280
days..
5. Suppose you meet a woman who says that she was pregnant for
less than 250 days. Would this be unusual and what might you
think?