SlideShare a Scribd company logo
Lecture 1.
Statistical Graphs
• Statistics. Data
• Populations and Samples
• Graphical Representation of Data
Statistics
Course rules:
• Do not be late.
• No smart phones in class.
• All the lectures and notes will be available at iuhdportal.uni.tm.
• Every student must have 3 notebooks. ( lecture, practice, SIW)
• Be active
• Submit on time
Grading system
30% - midterm exam
45% - final exam
25% - SIW 40%
- lecture notes 40%
- group project 50%
- Extra points
Statistics
Statistics is a science about data: how to collect, analyze, present, and
interpret data.
Statistics – is a branch of applied mathematics and important in
everyday life.
- helps to make informed decisions
- understand risks
- follow the news
- conduct research
- make predictions about the future based on past data.
Two branches of statistics:
• Descriptive statistics organizes data by using tables, graphs, and
numerical measures.
• Inferential statistics makes decisions or predictions about a population
based on data from a sample.
Example:
A statistician collected data of software developers’ income per project
in Turkmenistan: 10,000; 15,000; 100,000; 130,000; 140,000, . . . . . .
Example:
A statistician collected data of software developers’ income per project
in Turkmenistan: 10,000; 15,000; 100,000; 130,000; 140,000, . . . . . .
Examples of questions arising in statistics:
• How to collect data? How to select people participating in the study?
• How to summarize data? For example, what is the average income,
spread of incomes, etc. of selected people?
• What can be said about the whole population? For example, how to
estimate the average income of all software developers in the country?
• How accurate are obtained estimates? How can they be improved?
Populations and samples
A population is a complete set of all items that interest a statistician.
Examples
• all people living in a country
• grades of all students in a university
• history of temperature observations in a city
A sample is a subset of population available for analysis
Examples
• a group of selected people
• marks of 2 students from each academic group
• temperature during the last year
A population is often too large for analysis and in this case the
analysis is performed on a sample. A sample can accurately
represent the whole population if the sample is chosen in a
right way
Applied Statistics
 Business
What’s the range of estimated sales next year that has a 95% chance of
being correct?
A manager maintaining inventory needs to know how many products
are being sold at which time of year so she can place orders before she
runs out of materials.
Supervisors have to monitor the quality of their production lines and
service levels to spot problem areas, inefficient processes, and people
who need to improve their knowledge, skills, and performance.
Applied Statistics
 Business
Sales managers need to know which customers:
• Buy the most?
• And which complain the most?
• Are increasing their purchase levels the fastest?
• Which salesperson has reduced productivity the most in the last
quarter?
• Is the reduction in sales by this salesperson due to chance alone or
could there be something else going on?
Applied Statistics
 Business
 Marketing managers ask about the millions of dollars spent on
several advertising campaigns; are any of them obviously better than
the others?
• Are the differences just temporary fluctuations or are they something
we should take seriously?
Applied Statistics
 Finance
• Which of several brokerages has a reliable record of higher-than-
average return on investment?
• Is the share price for this firm rising predictably enough for a day-
trader to invest in it?
• What has our return on investment been for these two brands of
computer equipment over the last four years?
Applied Statistics
Finance
• Should we pay the new premium being proposed by our insurance
company for liability insurance?
• What is the premium that our actuaries are calculating for a fire-
insurance policy on this type of building?
• Should we invest in bonds or in stock this year? What are the average
rates of return? How predictable are these numbers?
• Which country has the greatest chance of maximizing our profit on
investment over the next decade? Which industry? Which company?
Applied Statistics
Management
• One of our division manager’s claims that his below-average profit figures are just
the luck of the draw; how often would he get such a big drop in profit by chance
alone if we look at our historical records?
• Another manager claims that her above-average productivity is associated with the
weekly walkabout that is part of her management-by-walking-around philosophy;
can we test that idea to see if there is any rational basis for believing her analysis?
• The Human Resources Department wants to create a questionnaire that will help
them determine where to put their emphasis and resources in internal training next
year. How should we design the questionnaire to ensure the most neutral and least
ambiguous formulation of the questions? How can we tell if people are answering
seriously or if they are just filling in answers at random?
Applied Statistics
 Information Technologies
 Which of these motherboards has the lowest rate of failure according
to industry studies?
 How long should we expect these LCD monitors to last before
failure?
 How is our disk free space doing? When do we expect to have to buy
new disk drives? With what degree of confidence are you predicting
these dates?
Applied Statistics
 Information Technologies
 Which department is increasing their disk-space usage unusually fast? Is that
part of the normal variations or is something new going on?
 Which of our department has been increasing their use of printer paper the
most over the last quarter? Is their rate of increase just part of the normal
growth rate for the whole organization or is there anything unusual that we
should investigate?
 We have been testing three different antispam products over the last six
months; is any one of them obviously better than the others?
Applied Statistics
 Information Technologies
 Which of our programming teams has the lowest rate of coding errors
per thousand lines of production code? Is the difference between their
error rate and those of the other team significant enough to warrant a
lecture from them, or is it just one of those random things?
 Which programmer’s code has caused the highest number of helpdesk
calls in the past year? Is that bad luck or bad programming?
Applied Statistics
 Life in general
 This politician claims that our taxes have been rising over the last
couple of years; is that true?
 Is using this brand of tooth paste associated with decreased chances
of getting tooth decay compared with that brand?
 Does listening to two different sounds at the same time really relate to
changes in brainwaves that make us feel calmer and be smarter?
What is data?
In this course, Data will be represented by a collection of values
describing particular characteristics of some objects. Values xi are called
observations.
Types of data
Quantitative data (numerical) –
are numbers Income, profit,
weight, length, temperature,
time, . . . . .
Qualitative data (categorical)
– are not numbers Yes/No,
color, brand, country,
name, . . . . .
Warning: data can be represented by numbers which have no “numerical” meaning
(example: academic group number). We treat such data as categorical.
Examples of qualitative data:
• Color (e.g., red, orange, yellow, etc.)
• Tone (e.g., middle-C, b-flat below middle-C)
• Timbre (e.g., oboe-sound, flute-sound, drum-sound)
• Shape (e.g., round, square, tetrahedral, etc.)
• Preference for a particular movie (e.g., like, neutral, dislike, etc.)
• Type (e.g., wool, cotton, plastic; or complaint vs praise, addiction
vs habit; etc.)
• Origin (e.g., endogenous vs exogenous; local vs foreign)
Examples of quantitative data:
 Primary wavelength of light reflected by an object ?
 Primary frequency of light reflected by an object ?
 Waveform of a trumpet sound expressed as Fourier transforms of a sonogram
 Length of sides of a triangle
 Number of people expressing a particular preference for a movie
 Numerical representation (e.g., a Likert scale) of a feeling (e.g., 0 = no pain, 1 =
barely noticeable pain, 2 = slightly annoying pain, … 10 = “Angel of Death
please take me now.”)
Example:
Occupation, gender are categorical data. Age, income are numerical data.
Examples of statistical problems:
1. Which profession is the most popular? Which is the most well-paid?
2. Is there a relation between age and income? Gender and income?
3. How much will a 25 years old male economist will earn?
Occupation Gender Age Income
Programmer Male 25 100,000
Economist Female 25 60,000
Lawyer Male 45 100,000
Dentist Female 40 80,000
Programmer Female 35 90,000
Tables
It is a simple table (a matrix with labels for the rows down the side and
those for the columns across the top) showing some (invented)
qualitative and quantitative data.
The focus of the study is on different characteristics of the five divisions
in the company.(Figure:1). Division can be called the identifier and the
information about the different divisions can be called observations.
Tables
The different observations about each of the specific cases of the
identifier are called variables; in this case, we have a total of six
variables because the identifier is itself a variable. All the variables are
listed in columns in this table. Two of the variables are qualitative
(sometimes described as categorical because values are categories) and
four of the variables are quantitative. Quantitative variables may be
measured, counted, or ranked.
Qualitative variables in our study:
• The use of MBWA (management by walking around) – qualitative.
• The name or location of the division – qualitative.
Quantitative variables:
• Total employees in plant – quantitative (counted).
• Team rank in the company soccer competitions – quantitative (rank
order).
• Average monthly profit per employee in US dollars – quantitative
(measured).
• The percentage of Help Desk calls traced to a lack of training –
quantitative (measured).
Graphical representation of data
Example 1: grades for the 1st semester of XYZ university students in 2020
9, 5, 6, 6, 8, 3, 7, 5, 4, 4, 6, 7, 3, 9, 7, 4, 2, 5, 4, 8, 6, 5, 4, 5, 6, 1, 7, 5, 5, 4,
7, 7, 6, 6, 6, 6, 5, 7, 8, 4, 4, 6, 4, 5, 8, 3, 4, 6, 10, 10, 4, 7, 7, 2, 6, 8, 4, 5, 7,
5, 7, 7, 5, 4, 5, 6, 4, 7, 5, 5, 5, 1, 5, 6, 5, 4, 5, 5, 6, 7, 6, 4, 4, 9, 6, 6, 4, 4, 9,
1, 6, 8, 6, 7, 8, 10, 8, 2, 2, 6, 5, 6, 8, 4, 5, 4, 9, 4, 4, 5, 7, 4, 5, 7, 10, 6, 8, 2,
6, 5, 5, 2, 7, 9, 8, 4, 7, 8, 4, 7, 6, 5, 6, 6, 6, 10, 8, 2, 5, 6, 6, 6, 5, 2, 5, 9, 8, 5,
4, 2, 8, 10, 1, 5, 4, 4, 5, 6, 10, 1, 5, 4, 5, 5, 6, 6, 4, 3, 6, 8, 7, 4, 4, 4, 7, 7, 7,
4, 8, 7, 5, 8, 4, 1, 5, 5, 5, 1, 5, 7, 10, 3, 9, 4, 8, 5, 7, 4, 6, 4, 6, 7, 3, 4, 4, 7, 4,
5, 6, 7, 9, 4, 6, 2, 5, 4, 8, 2, 4, 5, 9, 7, 8, 9, 4, 3, 4, 4, 4, 4, 6, 6, 4, 5, 9, 6, 7,
6, 7, 7, 4, 5, 8, 6, 9, 4, 1, 6, 8, 7, 7, 9, 6, 6, 5, 6, 5, 5, 9, 7, 5, 4, 6, 1, 6, 5, 4,
5, 3, 4, 8, 6, 4, 5, 10, 5, 4, 9, 2, 4, 1, 6, 8 Looking at this list, it is difficult to
make conclusions.
Example 2: Largest companies in 2018 (by market capitalization)
Company Country Industry Cap., $
1 Apple USA Technology 851
2 Alphabet USA Technology 719
3 Microsoft USA Technology 703
4 Amazon.co
m
USA Consumer Services 701
5 Tencent China Technology 496
6 Alibaba China Consumer Services 470
7 Facebook USA Technology 464
Bar Chats
A bar chart (or a bar graph) shows how many elements belong to each
of the categories. It is used for categorical data or numerical data with a
small number of different values.
Example 1: Number of students who got each grade
Example 2: number of companies from the Top List in each industry.
Example 2: The same bar chart, but sorted (Pareto chart)
Things to remember:
1. The heights of the bars should be proportional to the number of
elements in the categories
2. Don’t forget to label the axis
3. If necessary, sort the categories
Bar charts with relative frequencies
Instead of numbers of elements in each category, sometimes it is more
informative to provide proportions of values in each category.
“Numbers of elements” are also called absolute frequencies.
“Proportions of elements” are also called relative frequencies.
Example 1 (with relative frequencies)
Histograms
A histogram is like a bar chart, but for numerical data with many
possible values.
Instead of counting the number of values in each category, we count the
number of values in ranges of values ]
Example 1: a bar chart looks uninformative for grades 0–100
Example 1: a histogram is much better
Example 2: a histogram of market capitalization
Things to remember
1. Data ranges must be of equal length; no gaps between bars
2. Choose data ranges that allow to see the general shape of data
3. Decide upon whether you use ranges closed on the right
or close on left
(Usually we prefer closed on the right – like a c.d.f. )
Bad histograms
Left – no clear picture
Right – unequal ranges
Histograms with relative values
Similarly to bar chars, it is possible to show not the absolute number of
elements in each range, but their proportion.
Normalized histograms and p.d.f.
If a population distribution is continuous, the shape of a histogram
resembles the graph of a p.d.f.
We can scale a histogram, so that the area under the histogram is 1 and
compare with the graph of the p.d.f. with estimated parameters (for
example, the normal p.d.f.).
Statistical Graphs Lecture 1 - statistics for computer major.pptx
Other graphs
Pie chart
Grades for semester 1
Usually, a pie chart is less
informative than a bar chart.
Stem and leaf plots
A frequency distribution can be made more visually appealing by
turning it into a stem and leaf plot. Table 3.8 shows the percentage of
males and females who were literate in 37 African countries in 1994.
Stem Leaf
Cumulative frequencies plot
Shows the number of elements (or the proportion of elements) in data
less or equal than x. It is similar to a c.d.f.
Scatter plot
Shows relation between two variables.

More Related Content

Similar to Statistical Graphs Lecture 1 - statistics for computer major.pptx (20)

PPT
N ch01
kk3bii
 
DOC
Statistics Assignments 090427
amykua
 
DOCX
Statics for the management
Rohit Mishra
 
DOCX
Statics for the management
Rohit Mishra
 
PPT
Introduction To Statistics.ppt
Manish Agarwal
 
PDF
INTRO to STATISTICAL THEORY.pdf
mt6280255
 
PDF
Introduction to Statistics
aan786
 
DOCX
Statistical lechure
Chia Barzinje
 
PPTX
Overall concept of statistics
ASA university Bangladesh
 
PPTX
Introduction to Statistics Presentation(2).pptx
pranavi452104
 
PPTX
Introduction to Health statistics and biostats
Ayushijaiswal709985
 
PPTX
Uses of Statistics for mathematics to guide who want to understand.
DhinHusin
 
PPT
grade7statistics-150427083137-conversion-gate01.ppt
KayraTheressGubat
 
PPT
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
DOCX
Business statistics
Sajjad Chitrali
 
PPTX
Statistics and its application
gopinathannsriramachandraeduin
 
PPT
Probability and statistics
Fatima Bianca Gueco
 
PDF
1.Introduction to Statistics - its types
bharath321164
 
PDF
statistical analysis ppt of data analysis in the world of nitin
a21007
 
PPT
Statistics-Lecture.ppt
jayson barsana
 
N ch01
kk3bii
 
Statistics Assignments 090427
amykua
 
Statics for the management
Rohit Mishra
 
Statics for the management
Rohit Mishra
 
Introduction To Statistics.ppt
Manish Agarwal
 
INTRO to STATISTICAL THEORY.pdf
mt6280255
 
Introduction to Statistics
aan786
 
Statistical lechure
Chia Barzinje
 
Overall concept of statistics
ASA university Bangladesh
 
Introduction to Statistics Presentation(2).pptx
pranavi452104
 
Introduction to Health statistics and biostats
Ayushijaiswal709985
 
Uses of Statistics for mathematics to guide who want to understand.
DhinHusin
 
grade7statistics-150427083137-conversion-gate01.ppt
KayraTheressGubat
 
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
Business statistics
Sajjad Chitrali
 
Statistics and its application
gopinathannsriramachandraeduin
 
Probability and statistics
Fatima Bianca Gueco
 
1.Introduction to Statistics - its types
bharath321164
 
statistical analysis ppt of data analysis in the world of nitin
a21007
 
Statistics-Lecture.ppt
jayson barsana
 

Recently uploaded (20)

PPTX
Mircosoft azure SQL detailing about how to use SQL with Microsoft Azure.
shrijasheth64
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
Introduction to Artificial Intelligence.pptx
StarToon1
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
PPTX
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
PPTX
isaacnewton-250718125311-e7ewqeqweqwa74d99.pptx
MahmoudHalim13
 
PPTX
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
PPT
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
PPTX
fashion industry boom.pptx an economics project
TGMPandeyji
 
PDF
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays
 
PDF
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
PPTX
things that used in cleaning of the things
drkaran1421
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PPTX
Presentation1.pptx4327r58465824358432884
udayfand0306
 
PPTX
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
nreddyjanga
 
Mircosoft azure SQL detailing about how to use SQL with Microsoft Azure.
shrijasheth64
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
Introduction to Artificial Intelligence.pptx
StarToon1
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
Green Vintage Notebook Science Subject for Middle School Climate and Weather ...
RiddhimaVarshney1
 
isaacnewton-250718125311-e7ewqeqweqwa74d99.pptx
MahmoudHalim13
 
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
fashion industry boom.pptx an economics project
TGMPandeyji
 
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays
 
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
things that used in cleaning of the things
drkaran1421
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
Presentation1.pptx4327r58465824358432884
udayfand0306
 
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
nreddyjanga
 
Ad

Statistical Graphs Lecture 1 - statistics for computer major.pptx

  • 1. Lecture 1. Statistical Graphs • Statistics. Data • Populations and Samples • Graphical Representation of Data Statistics
  • 2. Course rules: • Do not be late. • No smart phones in class. • All the lectures and notes will be available at iuhdportal.uni.tm. • Every student must have 3 notebooks. ( lecture, practice, SIW) • Be active • Submit on time
  • 3. Grading system 30% - midterm exam 45% - final exam 25% - SIW 40% - lecture notes 40% - group project 50% - Extra points
  • 4. Statistics Statistics is a science about data: how to collect, analyze, present, and interpret data. Statistics – is a branch of applied mathematics and important in everyday life. - helps to make informed decisions - understand risks - follow the news - conduct research - make predictions about the future based on past data.
  • 5. Two branches of statistics: • Descriptive statistics organizes data by using tables, graphs, and numerical measures. • Inferential statistics makes decisions or predictions about a population based on data from a sample.
  • 6. Example: A statistician collected data of software developers’ income per project in Turkmenistan: 10,000; 15,000; 100,000; 130,000; 140,000, . . . . . .
  • 7. Example: A statistician collected data of software developers’ income per project in Turkmenistan: 10,000; 15,000; 100,000; 130,000; 140,000, . . . . . . Examples of questions arising in statistics: • How to collect data? How to select people participating in the study? • How to summarize data? For example, what is the average income, spread of incomes, etc. of selected people? • What can be said about the whole population? For example, how to estimate the average income of all software developers in the country? • How accurate are obtained estimates? How can they be improved?
  • 8. Populations and samples A population is a complete set of all items that interest a statistician. Examples • all people living in a country • grades of all students in a university • history of temperature observations in a city A sample is a subset of population available for analysis Examples • a group of selected people • marks of 2 students from each academic group • temperature during the last year
  • 9. A population is often too large for analysis and in this case the analysis is performed on a sample. A sample can accurately represent the whole population if the sample is chosen in a right way
  • 10. Applied Statistics  Business What’s the range of estimated sales next year that has a 95% chance of being correct? A manager maintaining inventory needs to know how many products are being sold at which time of year so she can place orders before she runs out of materials. Supervisors have to monitor the quality of their production lines and service levels to spot problem areas, inefficient processes, and people who need to improve their knowledge, skills, and performance.
  • 11. Applied Statistics  Business Sales managers need to know which customers: • Buy the most? • And which complain the most? • Are increasing their purchase levels the fastest? • Which salesperson has reduced productivity the most in the last quarter? • Is the reduction in sales by this salesperson due to chance alone or could there be something else going on?
  • 12. Applied Statistics  Business  Marketing managers ask about the millions of dollars spent on several advertising campaigns; are any of them obviously better than the others? • Are the differences just temporary fluctuations or are they something we should take seriously?
  • 13. Applied Statistics  Finance • Which of several brokerages has a reliable record of higher-than- average return on investment? • Is the share price for this firm rising predictably enough for a day- trader to invest in it? • What has our return on investment been for these two brands of computer equipment over the last four years?
  • 14. Applied Statistics Finance • Should we pay the new premium being proposed by our insurance company for liability insurance? • What is the premium that our actuaries are calculating for a fire- insurance policy on this type of building? • Should we invest in bonds or in stock this year? What are the average rates of return? How predictable are these numbers? • Which country has the greatest chance of maximizing our profit on investment over the next decade? Which industry? Which company?
  • 15. Applied Statistics Management • One of our division manager’s claims that his below-average profit figures are just the luck of the draw; how often would he get such a big drop in profit by chance alone if we look at our historical records? • Another manager claims that her above-average productivity is associated with the weekly walkabout that is part of her management-by-walking-around philosophy; can we test that idea to see if there is any rational basis for believing her analysis? • The Human Resources Department wants to create a questionnaire that will help them determine where to put their emphasis and resources in internal training next year. How should we design the questionnaire to ensure the most neutral and least ambiguous formulation of the questions? How can we tell if people are answering seriously or if they are just filling in answers at random?
  • 16. Applied Statistics  Information Technologies  Which of these motherboards has the lowest rate of failure according to industry studies?  How long should we expect these LCD monitors to last before failure?  How is our disk free space doing? When do we expect to have to buy new disk drives? With what degree of confidence are you predicting these dates?
  • 17. Applied Statistics  Information Technologies  Which department is increasing their disk-space usage unusually fast? Is that part of the normal variations or is something new going on?  Which of our department has been increasing their use of printer paper the most over the last quarter? Is their rate of increase just part of the normal growth rate for the whole organization or is there anything unusual that we should investigate?  We have been testing three different antispam products over the last six months; is any one of them obviously better than the others?
  • 18. Applied Statistics  Information Technologies  Which of our programming teams has the lowest rate of coding errors per thousand lines of production code? Is the difference between their error rate and those of the other team significant enough to warrant a lecture from them, or is it just one of those random things?  Which programmer’s code has caused the highest number of helpdesk calls in the past year? Is that bad luck or bad programming?
  • 19. Applied Statistics  Life in general  This politician claims that our taxes have been rising over the last couple of years; is that true?  Is using this brand of tooth paste associated with decreased chances of getting tooth decay compared with that brand?  Does listening to two different sounds at the same time really relate to changes in brainwaves that make us feel calmer and be smarter?
  • 20. What is data? In this course, Data will be represented by a collection of values describing particular characteristics of some objects. Values xi are called observations. Types of data Quantitative data (numerical) – are numbers Income, profit, weight, length, temperature, time, . . . . . Qualitative data (categorical) – are not numbers Yes/No, color, brand, country, name, . . . . . Warning: data can be represented by numbers which have no “numerical” meaning (example: academic group number). We treat such data as categorical.
  • 21. Examples of qualitative data: • Color (e.g., red, orange, yellow, etc.) • Tone (e.g., middle-C, b-flat below middle-C) • Timbre (e.g., oboe-sound, flute-sound, drum-sound) • Shape (e.g., round, square, tetrahedral, etc.) • Preference for a particular movie (e.g., like, neutral, dislike, etc.) • Type (e.g., wool, cotton, plastic; or complaint vs praise, addiction vs habit; etc.) • Origin (e.g., endogenous vs exogenous; local vs foreign)
  • 22. Examples of quantitative data:  Primary wavelength of light reflected by an object ?  Primary frequency of light reflected by an object ?  Waveform of a trumpet sound expressed as Fourier transforms of a sonogram  Length of sides of a triangle  Number of people expressing a particular preference for a movie  Numerical representation (e.g., a Likert scale) of a feeling (e.g., 0 = no pain, 1 = barely noticeable pain, 2 = slightly annoying pain, … 10 = “Angel of Death please take me now.”)
  • 23. Example: Occupation, gender are categorical data. Age, income are numerical data. Examples of statistical problems: 1. Which profession is the most popular? Which is the most well-paid? 2. Is there a relation between age and income? Gender and income? 3. How much will a 25 years old male economist will earn? Occupation Gender Age Income Programmer Male 25 100,000 Economist Female 25 60,000 Lawyer Male 45 100,000 Dentist Female 40 80,000 Programmer Female 35 90,000
  • 24. Tables It is a simple table (a matrix with labels for the rows down the side and those for the columns across the top) showing some (invented) qualitative and quantitative data. The focus of the study is on different characteristics of the five divisions in the company.(Figure:1). Division can be called the identifier and the information about the different divisions can be called observations.
  • 26. The different observations about each of the specific cases of the identifier are called variables; in this case, we have a total of six variables because the identifier is itself a variable. All the variables are listed in columns in this table. Two of the variables are qualitative (sometimes described as categorical because values are categories) and four of the variables are quantitative. Quantitative variables may be measured, counted, or ranked.
  • 27. Qualitative variables in our study: • The use of MBWA (management by walking around) – qualitative. • The name or location of the division – qualitative. Quantitative variables: • Total employees in plant – quantitative (counted). • Team rank in the company soccer competitions – quantitative (rank order). • Average monthly profit per employee in US dollars – quantitative (measured). • The percentage of Help Desk calls traced to a lack of training – quantitative (measured).
  • 28. Graphical representation of data Example 1: grades for the 1st semester of XYZ university students in 2020 9, 5, 6, 6, 8, 3, 7, 5, 4, 4, 6, 7, 3, 9, 7, 4, 2, 5, 4, 8, 6, 5, 4, 5, 6, 1, 7, 5, 5, 4, 7, 7, 6, 6, 6, 6, 5, 7, 8, 4, 4, 6, 4, 5, 8, 3, 4, 6, 10, 10, 4, 7, 7, 2, 6, 8, 4, 5, 7, 5, 7, 7, 5, 4, 5, 6, 4, 7, 5, 5, 5, 1, 5, 6, 5, 4, 5, 5, 6, 7, 6, 4, 4, 9, 6, 6, 4, 4, 9, 1, 6, 8, 6, 7, 8, 10, 8, 2, 2, 6, 5, 6, 8, 4, 5, 4, 9, 4, 4, 5, 7, 4, 5, 7, 10, 6, 8, 2, 6, 5, 5, 2, 7, 9, 8, 4, 7, 8, 4, 7, 6, 5, 6, 6, 6, 10, 8, 2, 5, 6, 6, 6, 5, 2, 5, 9, 8, 5, 4, 2, 8, 10, 1, 5, 4, 4, 5, 6, 10, 1, 5, 4, 5, 5, 6, 6, 4, 3, 6, 8, 7, 4, 4, 4, 7, 7, 7, 4, 8, 7, 5, 8, 4, 1, 5, 5, 5, 1, 5, 7, 10, 3, 9, 4, 8, 5, 7, 4, 6, 4, 6, 7, 3, 4, 4, 7, 4, 5, 6, 7, 9, 4, 6, 2, 5, 4, 8, 2, 4, 5, 9, 7, 8, 9, 4, 3, 4, 4, 4, 4, 6, 6, 4, 5, 9, 6, 7, 6, 7, 7, 4, 5, 8, 6, 9, 4, 1, 6, 8, 7, 7, 9, 6, 6, 5, 6, 5, 5, 9, 7, 5, 4, 6, 1, 6, 5, 4, 5, 3, 4, 8, 6, 4, 5, 10, 5, 4, 9, 2, 4, 1, 6, 8 Looking at this list, it is difficult to make conclusions.
  • 29. Example 2: Largest companies in 2018 (by market capitalization) Company Country Industry Cap., $ 1 Apple USA Technology 851 2 Alphabet USA Technology 719 3 Microsoft USA Technology 703 4 Amazon.co m USA Consumer Services 701 5 Tencent China Technology 496 6 Alibaba China Consumer Services 470 7 Facebook USA Technology 464
  • 30. Bar Chats A bar chart (or a bar graph) shows how many elements belong to each of the categories. It is used for categorical data or numerical data with a small number of different values. Example 1: Number of students who got each grade
  • 31. Example 2: number of companies from the Top List in each industry.
  • 32. Example 2: The same bar chart, but sorted (Pareto chart)
  • 33. Things to remember: 1. The heights of the bars should be proportional to the number of elements in the categories 2. Don’t forget to label the axis 3. If necessary, sort the categories
  • 34. Bar charts with relative frequencies Instead of numbers of elements in each category, sometimes it is more informative to provide proportions of values in each category. “Numbers of elements” are also called absolute frequencies. “Proportions of elements” are also called relative frequencies. Example 1 (with relative frequencies)
  • 35. Histograms A histogram is like a bar chart, but for numerical data with many possible values. Instead of counting the number of values in each category, we count the number of values in ranges of values ]
  • 36. Example 1: a bar chart looks uninformative for grades 0–100
  • 37. Example 1: a histogram is much better
  • 38. Example 2: a histogram of market capitalization
  • 39. Things to remember 1. Data ranges must be of equal length; no gaps between bars 2. Choose data ranges that allow to see the general shape of data 3. Decide upon whether you use ranges closed on the right or close on left (Usually we prefer closed on the right – like a c.d.f. )
  • 40. Bad histograms Left – no clear picture Right – unequal ranges
  • 41. Histograms with relative values Similarly to bar chars, it is possible to show not the absolute number of elements in each range, but their proportion.
  • 42. Normalized histograms and p.d.f. If a population distribution is continuous, the shape of a histogram resembles the graph of a p.d.f. We can scale a histogram, so that the area under the histogram is 1 and compare with the graph of the p.d.f. with estimated parameters (for example, the normal p.d.f.).
  • 44. Other graphs Pie chart Grades for semester 1 Usually, a pie chart is less informative than a bar chart.
  • 45. Stem and leaf plots A frequency distribution can be made more visually appealing by turning it into a stem and leaf plot. Table 3.8 shows the percentage of males and females who were literate in 37 African countries in 1994.
  • 47. Cumulative frequencies plot Shows the number of elements (or the proportion of elements) in data less or equal than x. It is similar to a c.d.f.
  • 48. Scatter plot Shows relation between two variables.