Homework #3 February I need you to do solve those problems pleas but each one separate file. 1. Use the following data do a stemplot, mean, median, 5# summary, outlier test and standard deviation (round 2 decimal places throughout) 10, 15, 20, 25, 30, 35, 40 .2. Use the following data and find the mean, median, 5# summary, outlier test and standard deviation ( round 2 decial places throughout) 3, 5, 7, 9, 11, 12, 15.3. Use book p.39 data (North Carolina) find mean, median, 5 # summary and outlier test.————————————————————————————————————————————-In class work #3 1. Power Point Ch 2 slide #6 find mean, median, 5# summary, boxplot and do outlier test.2. In book ch 2: Example 2.7 standard deviation (round to 2 decimals).Describing Distributions
with Numbers
e saw in Chapter 1 (page 4) that the American Community
Survey asks, among much else, workers’ travel times to work.
Here are the travel times in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau:1
W
30
20
10
40
25
20
10
60
15
40
5
30
12
10
10
We aren’t surprised that most people estimate their travel time in multiples
of 5 minutes. Here is a stemplot of these data:
0
1
2
3
4
5
6
5
000025
005
00
00
Chapter 2
IN THIS CHAPTER
WE COVER…

Measuring center: the mean

Measuring center: the median

Comparing the mean and the
median

Measuring spread: the quartiles

The five-number summary and
boxplots

Spotting suspected outliers*

Measuring spread: the standard
deviation

Choosing measures of center and
spread

Using technology

Organizing a statistical problem
0
The distribution is single-peaked and right-skewed. The longest travel
time (60 minutes) may be an outlier. Our goal in this chapter is to describe
with numbers the center and spread of this and other distributions.
39
Logan Mock-Bunting/Getty Images
40
CHAP TER 2

Describing Distributions with Numbers
MEASURING CENTER: The Mean
The most common measure of center is the ordinary arithmetic average, or mean.
THE MEAN x
To find the mean of a set of observations, add their values and divide by the number
of observations. If the n observations are x1, x2, . . . , xn, their mean is
x
x1 x2 . . . xn
n
or, in more compact notation,
x
1
x
na i
The © (capital Greek sigma) in the formula for the mean is short for “add
them all up.’’ The subscripts on the observations xi are just a way of keeping the n
observations distinct. They do not necessarily indicate order or any other special
facts about the data. The bar over the x indicates the mean of all the x-values.
Pronounce the mean x as “x-bar.’’ This notation is very common. When writers
who are discussing data use x or y, they are talking about a mean.
NCTRAVELTIME
E X A M P L E 2 . 1 Travel times to work
The mean travel time of our 15 North Carolina workers is
Don’t hide
the outliers
Data from an
airliner’s control
surfaces, such
as the vertical tail rudder, go to
cockpit instruments and then to
the “black box’’ flight data recorder.
To avoid confusing the pilots, short
erratic movements in the data are
“smoothed’’ so that the instruments
show overall patterns. When a crash
killed 260 people, investigators
suspected a catastrophic movement
of the tail rudder. But the black
box contained only the smoothed
data. Sometimes outliers are more
important than the overall pattern.
resistant measure
x

x1 x2 . . . xn
n
30 20 . . . 10
15
337

22.5 minutes
15
In practice, you can enter the data into your calculator and ask for the mean. You don’t
have to actually add and divide. But you should know that this is what the calculator
is doing.
Notice that only 6 of the 15 travel times are larger than the mean. If we leave out
the longest single travel time, 60 minutes, the mean for the remaining 14 people is
19.8 minutes. That one observation raises the mean by 2.7 minutes. ■
Example 2.1 illustrates an important fact about the mean as a measure of
center: it is sensitive to the influence of a few extreme observations. These may
be outliers, but a skewed distribution that has no outliers will also pull the mean
toward its long tail. Because the mean cannot resist the influence of extreme
observations, we say that it is not a resistant measure of center.

Measuring Center: The Median
A P P LY Y O U R K N O W L E D G E
2.1 Pulling wood apart. Example 1.9 (page 21) gives the breaking strength in pounds
of 20 pieces of Douglas fir. Find the mean breaking strength. How many of the
pieces of wood have strengths less than the mean? What feature of the stemplot
(Figure 1.11, page 22) explains the fact that the mean is smaller than most of the
WOOD
observations?
2.2 Health care spending. Table 1.3 (page 23) gives the 2007 health care expendi-
ture per capita in 35 countries with the highest gross domestic product in 2007. The
United States, at $7285 (PPP, international $) per person, is a high outlier. Find the
mean health care spending in these nations with and without the United States.
HEALTHCARE
How much does the one outlier increase the mean?
MEASURING CENTER: The Median
In Chapter 1, we used the midpoint of a distribution as an informal measure of
center. The median is the formal version of the midpoint, with a specific rule for
calculation.
THE MEDIAN M
The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution:
1. Arrange all observations in order of size, from smallest to largest.
2. If the number of observations n is odd, the median M is the center observation in
the ordered list. If the number of observations n is even, the median M is midway
between the two center observations in the ordered list.
3. You can always locate the median in the ordered list of observations by counting
up 1n ⫹ 12/2 observations from the start of the list.
Note that the formula 1n ⫹ 12/2 does not give the median, just the location of the
median in the ordered list. Medians require little arithmetic, so they are easy to find
by hand for small sets of data. Arranging even a moderate number of observations
in order is very tedious, however, so that finding the median by hand for larger sets
of data is unpleasant. Even simple calculators have an x button, but you will need
to use software or a graphing calculator to automate finding the median.
E X A M P L E 2 . 2 Finding the median: odd n
What is the median travel time for our 15 North Carolina workers? Here are the data
arranged in order:
5
10
10
10
10
12
15
20
20
25
30
30
40
40
60
The count of observations n ⫽ 15 is odd. The bold 20 is the center observation in the
ordered list, with 7 observations to its left and 7 to its right. This is the median, M ⫽
20 minutes.
41
42
CHAP TER 2

Describing Distributions with Numbers
Because n 15, our rule for the location of the median gives
location of M
n 1 16

8
2
2
That is, the median is the 8th observation in the ordered list. It is faster to use this rule
than to locate the center by eye. ■
Mitchell Funk/Getty Images
E X A M P L E 2 . 3 Finding the median: even n
Travel times to work in New York State are (on the average) longer than in North
Carolina. Here are the travel times in minutes of 20 randomly chosen New York workers:
10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45
A stemplot not only displays the distribution but makes finding the median easy
because it arranges the observations in order:
0
1
NYTRAVELTIME
3
4
5
6
7
8
5
005555
00
00
005
005
5
The distribution is single-peaked and right-skewed, with several travel times of an hour
or more. There is no center observation, but there is a center pair. These are the bold
20 and 25 in the stemplot, which have 9 observations before them in the ordered list
and 9 after them. The median is midway between these two observations:
M
20 25
22.5 minutes
2
With n 20, the rule for locating the median in the list gives
location of M
n 1 21

10.5
2
2
The location 10.5 means “halfway between the 10th and 11th observations in the
ordered list.’’ That agrees with what we found by eye. ■
COMPARING THE MEAN AND THE MEDIAN
Examples 2.1 and 2.2 illustrate an important difference between the mean and
the median. The median travel time (the midpoint of the distribution) is 20 minutes. The mean travel time is higher, 22.5 minutes. The mean is pulled toward
the right tail of this right-skewed distribution. The median, unlike the mean, is
resistant. If the longest travel time were 600 minutes rather than 60 minutes, the

Measuring Spread: The Quartiles
43
mean would increase to more than 58 minutes but the median would not change
at all. The outlier just counts as one observation above the center, no matter how
far above the center it lies. The mean uses the actual value of each observation
and so will chase a single large observation upward. The Mean and Median applet
is an excellent way to compare the resistance of M and x.
COMPARING THE MEAN AND THE MEDIAN
The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed
distribution, the mean is usually farther out in the long tail than is the median.2
Many economic variables have distributions that are skewed to the right. For
example, the median endowment of colleges and universities in the United States
and Canada in 2009 was about $67 million—but the mean endowment was almost
$371 million. Most institutions have modest endowments, but a few are very
wealthy. Harvard’s endowment was over $35 billion.3 The few wealthy institutions
pull the mean up but do not affect the median. Reports about incomes and other
strongly skewed distributions usually give the median (“midpoint’’) rather than the
mean (“arithmetic average’’). However, a county that is about to impose a tax of 1%
on the incomes of its residents cares about the mean income, not the median. The
tax revenue will be 1% of total income, and the total is the mean times the number
of residents. The mean and median measure center in different ways, and
both are useful. Don’t confuse the “average” value of a variable (the mean) with
its “typical” value, which we might describe by the median.
A P P LY Y O U R K N O W L E D G E
2.3 New York travel times. Find the mean of the travel times to work for the 20 New
2.4 New-house prices. The mean and median sales prices of new homes sold in the
United States in November 2010 were $213,000 and $268,700.4 Which of these
numbers is the mean and which is the median? Explain how you know.
2.5 Carbon dioxide emissions. Table 1.6 (page 33) gives the 2007 carbon dioxide
(CO2) emissions per person for countries with populations of at least 30 million. Find
the mean and the median for these data. Make a histogram of the data. What features
CO2EMISSIONS
of the distribution explain why the mean is larger than the median?
MEASURING SPREAD: The Quartiles
The mean and median provide two different measures of the center of a distribution. But a measure of center alone can be misleading. The Census Bureau reports
that in 2009 the median income of American households was $49,777. Half of all
Jose Antonio Sancho/Photolibrary
York workers in Example 2.3. Compare the mean and median for these data. What
NYTRAVELTIME
general fact does your comparison illustrate?
c02DescribingDistributionsWithNu44 Page 44 8/17/11 5:40:58 PM user-s163
44
CHAP TER 2

user-F452
Describing Distributions with Numbers
households had incomes below $49,777, and half had higher incomes. The mean
was much higher, $67,976, because the distribution of incomes is skewed to the
right. But the median and mean don’t tell the whole story. The bottom 10% of
households had incomes less than $12,120, and households in the top 5% took
in more than $180,001.5 We are interested in the spread or variability of
incomes as well as their center. The simplest useful numerical description of
a distribution requires both a measure of center and a measure of spread.
One way to measure spread is to give the smallest and largest observations. For
example, the travel times of our 15 North Carolina workers range from 5 minutes
to 60 minutes. These single observations show the full spread of the data, but
they may be outliers. We can improve our description of spread by also looking at
the spread of the middle half of the data. The quartiles mark out the middle half.
Count up the ordered list of observations, starting from the smallest. The first
quartile lies one-quarter of the way up the list. The third quartile lies three-quarters
of the way up the list. In other words, the first quartile is larger than 25% of the
observations, and the third quartile is larger than 75% of the observations. The
second quartile is the median, which is larger than 50% of the observations. That
is the idea of quartiles. We need a rule to make the idea exact. The rule for calculating the quartiles uses the rule for the median.
THE QUARTILES Q1 AND Q3
To calculate the quartiles:
1. Arrange the observations in increasing order and locate the median M in the
ordered list of observations.
2. The first quartile Q1 is the median of the observations whose position in the
ordered list is to the left of the location of the overall median.
3. The third quartile Q3 is the median of the observations whose position in the
ordered list is to the right of the location of the overall median.
Here are examples that show how the rules for the quartiles work for both odd
and even numbers of observations.
E X A M P L E 2 . 4 Finding the quartiles: odd n
Our North Carolina sample of 15 workers’ travel times, arranged in increasing order, is
5
10
10
10
10
12
15
20
20
25
30
30
40
40
60
There is an odd number of observations, so the median is the middle one, the bold
20 in the list. The first quartile is the median of the 7 observations to the left of the
median. This is the 4th of these 7 observations, so Q1 10 minutes. If you want, you
can use the rule for the location of the median with n 7:
location of Q1
n 1 7 1

4
2
2

The Five-Number Summary and Boxplots
The third quartile is the median of the 7 observations to the right of the median, Q3
30 minutes. When there is an odd number of observations, leave out the overall
median when you locate the quartiles in the ordered list.
The quartiles are resistant because they are not affected by a few extreme observations. For example, Q3 would still be 30 if the outlier were 600 rather than 60. ■
E X A M P L E 2 . 5 Finding the quartiles: even n
Here are the travel times to work of the 20 New York workers from Example 2.3,
arranged in increasing order:
5 10 10 15 15 15 15 20 20 20 | 25 30 30 40 40 45 60 60 65 85
There is an even number of observations, so the median lies midway between the
middle pair, the 10th and 11th in the list. Its value is M 22.5 minutes. We have
marked the location of the median by |. The first quartile is the median of the first
10 observations, because these are the observations to the left of the location of the
median. Check that Q1 15 minutes and Q3 42.5 minutes. When the number of
observations is even, include all the observations when you locate the quartiles. ■
Be careful when, as in these examples, several observations take the same
numerical value. Write down all of the observations, arrange them in order, and
apply the rules just as if they all had distinct values.
THE FIVE-NUMBER SUMMARY AND BOXPLOTS
The smallest and largest observations tell us little about the distribution as a
whole, but they give information about the tails of the distribution that is missing
if we know only the median and the quartiles. To get a quick summary of both
center and spread, combine all five numbers.
THE FIVE-NUMBER SUMMARY
The five-number summary of a distribution consists of the smallest observation, the
first quartile, the median, the third quartile, and the largest observation, written in
order from smallest to largest. In symbols, the five-number summary is
Minimum Q1
M
Q3 Maximum
These five numbers offer a reasonably complete description of center and
spread. The five-number summaries of travel times to work from Examples 2.4
and 2.5 are
North Carolina
5
10
20
30
60
New York
5
15
22.5
42.5
85
45
c02DescribingDistributionsWithNu46 Page 46 8/17/11 5:40:58 PM user-s163
CHAP TER 2

Describing Distributions with Numbers
90
Maximum = 85
80
Travel time to work (minutes)
46
user-F452
70
60
Third quartile = 42.5
50
40
Median = 22.5
30
20
10
First quartile = 15
0
Minimum = 5
North Carolina
New York
F IGURE 2.1
Boxplots comparing the travel times to work of samples of workers in North Carolina and New York.
The five-number summary of a distribution leads to a new graph, the boxplot.
Figure 2.1 shows boxplots comparing travel times to work in North Carolina
and New York.
BOXPLOT
A boxplot is a graph of the five-number summary.
■ A central box spans the quartiles Q1 and Q3.
■ A line in the box marks the median M.
■ Lines extend from the box out to the smallest and largest observations.
Because boxplots show less detail than histograms or stemplots, they are best
used for side-by-side comparison of more than one distribution, as in Figure 2.1.
Be sure to include a numerical scale in the graph. When you look at a boxplot,
first locate the median, which marks the center of the distribution. Then look
at the spread. The span of the central box shows the spread of the middle half
of the data, and the extremes (the smallest and largest observations) show the
spread of the entire data set. We see from Figure 2.1 that travel times to work are
in general a bit longer in New York than in North Carolina. The median, both
c02DescribingDistributionsWithNu47 Page 47 8/17/11 5:40:58 PM user-s163
user-F452

The Five-Number Summary and Boxplots
47
quartiles, and the maximum are all larger in New York. New York travel times are
also more variable, as shown by the span of the box and the spread between the
extremes. Note that the boxes with arrows in Figure 2.1 that indicate the location
of the five-number summary are not part of the boxplot, but are included purely
for illustration.
Finally, the New York data are more strongly right-skewed. In a symmetric distribution, the first and third quartiles are equally distant from the median. In most
distributions that are skewed to the right, on the other hand, the third quartile
will be farther above the median than the first quartile is below it. The extremes
behave the same way, but remember that they are just single observations and
may say little about the distribution as a whole.
A P P LY Y O U R K N O W L E D G E
2.6 The Pittsburgh Steelers. The 2010 roster of the Pittsburgh Steelers professional
football team included 7 defensive linemen and 9 offensive linemen. The weights in
STEELERS
pounds of the defensive linemen were
305
325
305
300
285
280
298
315
304
319
and the weights of the offensive linemen were
338
324
325
304
344
318
2.7 Fuel economy for midsize cars. The Department of Energy provides fuel
economy ratings for all cars and light trucks sold in the United States. Here are the
estimated miles per gallon for city driving for the 129 cars classified as midsize in
MIDSIZECARS
2010, arranged in increasing order:6
9
15
16
17
18
19
21
22
26
10
15
16
17
18
19
22
22
26
10
16
16
17
18
19
22
22
26
11
16
16
17
18
19
22
23
28
11
16
17
18
18
19
22
23
33
11
16
17
18
18
19
22
23
35
12
16
17
18
18
19
22
23
41
13
16
17
18
18
19
22
24
41
14
16
17
18
18
20
22
24
51
14
16
17
18
18
20
22
24
15
16
17
18
18
20
22
25
15
16
17
18
18
21
22
26
15
16
17
18
19
21
22
26
15
16
17
18
19
21
22
26
15
16
17
18
19
21
22
26
(a) Give the five-number summary of this distribution.
(b) Draw a boxplot of these data. What is the shape of the distribution shown by
the boxplot? Which features of the boxplot led you to this conclusion? Are any
observations unusually small or large?
AP Photo/Greg Trott
(a) Make a stemplot of the weights of the defensive linemen and find the fivenumber summary.
(b) Make a stemplot of the weights of the offensive linemen and find the fivenumber summary.
(c) Does either group contain one or more clear outliers? Which group of players
tends to be heavier?
48
CHAP TER 2

How much is
that house
worth?
The town of
Manhattan,
Kansas, is sometimes called “the
Little Apple’’ to distinguish it
from that other Manhattan, “the
Big Apple.’’ A few years ago,
a house there appeared in the
county appraiser’s records valued at
$200,059,000. That would be quite
a house even on Manhattan Island.
As you might guess, the entry was
wrong: the true value was $59,500.
But before the error was discovered,
the county, the city, and the school
board had based their budgets on
the total appraised value of real
estate, which the one outlier jacked
up by 6.5%. It can pay to spot
outliers before you trust your data.
Describing Distributions with Numbers
SPOTTING SUSPECTED OUTLIERS*
Look again at the stemplot of travel times to work in New York in Example 2.3.
The five-number summary for this distribution is
5
15
22.5
42.5
85
How shall we describe the spread of this distribution? The smallest and largest
observations are extremes that don’t describe the spread of the majority of the
data. The distance between the quartiles (the range of the center half of the
data) is a more resistant measure of spread. This distance is called the interquartile
range.
THE INTERQUARTILE RANGE IQR
The interquartile range IQR is the distance between the first and third quartiles,
IQR Q3 Q1
For our data on New York travel times, IQR 42.5 15 27.5 minutes. However, no single numerical measure of spread, such as IQR, is very useful for describing skewed distributions. The two sides of a skewed distribution
have different spreads, so one number can’t summarize them. That’s why we give
the full five-number summary. The interquartile range is mainly used as the basis
for a rule of thumb for identifying suspected outliers. In some software, suspected
outliers are identified in a boxplot with a special plotting symbol such as *.
THE 1.5 ⴛ IQR RULE FOR OUTLIERS
Call an observation a suspected outlier if it falls more than 1.5 IQR above the third
quartile or below the first quartile.
E X A M P L E 2 . 6 Using the 1.5 ⴛ IQR rule
For the New York travel time data, IQR 27.5 and
1.5 IQR 1.5 27.5 41.25
Any values not falling between
Q1 11.5 IQR2 15.0 41.25 26.25
Q3 11.5 IQR2 42.5 41.25 83.75
and
are flagged as suspected outliers. Look again at the stemplot in Example 2.3: the only
suspected outlier is the longest travel time, 85 minutes. The 1.5 IQR rule suggests
that the three next-longest travel times (60 and 65 minutes) are just part of the long
right tail of this skewed distribution. ■
*This short section is optional.

Measuring Spread: The Standard Deviation
The 1.5 IQR rule is not a replacement for looking at the data. It is most
useful when large volumes of data are scanned automatically.
A P P LY Y O U R K N O W L E D G E
2.8
Travel time to work. In Example 2.1, we noted the influence of one long travel
time of 60 minutes in our sample of 15 North Carolina workers. Does the 1.5 IQR
rule identify this travel time as a suspected outlier?
2.9
Fuel economy for midsize cars. Exercise 2.7 gives the estimated miles per
gallon (mpg) for city driving for the 129 cars classified as midsize in 2010. In that
exercise we noted that several of the mpg values were unusually large. Which of
these are suspected outliers by the 1.5 IQR rule? While outliers can be produced
by errors or incorrectly recorded observations, they are often observations that differ from the others in some particular way. In this case, the cars producing the high
MIDSIZECARS
outliers share a common feature. What do you think that is?
MEASURING SPREAD: The Standard Deviation
The five-number summary is not the most common numerical description of a
distribution. That distinction belongs to the combination of the mean to measure
center and the standard deviation to measure spread. The standard deviation and
its close relative, the variance, measure spread by looking at how far the observations are from their mean.
THE STANDARD DEVIATION s
The variance s2 of a set of observations is an average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations
x1, x2, . . ., xn is
s2
1×1 x2 2 1×2 x2 2 . . . 1xn x2 2
n 1
or, more compactly,
s2
1
1xi x2 2
n 1 a
The standard deviation s is the square root of the variance s2:
s
1
1xi x2 2
Bn 1 a
In practice, use software or your calculator to obtain the standard deviation
from keyed-in data. Doing an example step-by-step will help you understand how
the variance and standard deviation work, however.
49
c02DescribingDistributionsWithNu50 Page 50 8/17/11 5:41:09 PM user-s163
50
CHAP TER 2

user-F452
Describing Distributions with Numbers
E X A M P L E 2 . 7 Calculating the standard deviation
SATCR
Georgia Southern University had 2417 students with regular admission in their freshman class of 2010. For each student, data are available on their SAT and ACT scores
(if taken), high school GPA, and the college within the university to which they were
admitted.7 In Exercise 3.49, the full data set for the SAT Critical Reading scores will
be examined. Here are the first five observations from that data set:
650
490
580
450
570
We will compute x and s for these students. First find the mean:
650 490 580 450 570
5
2740

548
5
x
Figure 2.2 displays the data as points above the number line, with their mean marked
by an asterisk (*). The arrows mark two of the deviations from the mean. The deviations show how spread out the data are about their mean. They are the starting point
for calculating the variance and the standard deviation.
Observations
xi
Deviations
x i ⴚ x–
Squared deviations
(x i ⴚ x– ) 2
650
650 548 102
1022 10,404
490
490 548 58
( 58)2 3,364
580
580 548
32
322 1,024
450
450 548 98
( 98)2 9,604
570
570 548
22
sum
0
222
484
sum 24,880
The variance is the sum of the squared deviations divided by one less than the number
of observations:
s2
1
24,880
1xi x 2 2
6220
n 1 a
4
The standard deviation is the square root of the variance:
s 26220 78.87 ■
degrees of freedom
Notice that the “average’’ in the variance s2 divides the sum by one fewer than
the number of observations, that is, n 1 rather than n. The reason is that the
deviations xi x always sum to exactly 0, so that knowing n 1 of them determines the last one. Only n 1 of the squared deviations can vary freely, and we
average by dividing the total by n 1. The number n 1 is called the degrees
of freedom of the variance or standard deviation. Some calculators offer a choice
between dividing by n and dividing by n 1, so be sure to use n 1.

x = 490
x = 650
SAT Critical Reading scores for five
students, with their mean (*) and the
deviations of two observations from
the mean shown, for Example 2.7.
deviation = 102
700
650
570
580
548
490
500
450
400
*
SAT Critical Reading Score
More important than the details of hand calculation are the properties that
determine the usefulness of the standard deviation:


51
FIGURE 2 . 2
–x = 548
deviation = -58
Choosing Measures of Center and Spread
s measures spread about the mean and should be used only when the mean is
chosen as the measure of center.
s is always zero or greater than zero. s 0 only when there is no spread. This
happens only when all observations have the same value. Otherwise, s 0.
As the observations become more spread out about their mean, s gets larger.

s has the same units of measurement as the original observations. For example, if
you measure weight in kilograms, both the mean x and the standard deviation
s are also in kilograms. This is one reason to prefer s to the variance s2, which
would be in squared kilograms.

Like the mean x, s is not resistant. A few outliers can make s very large.
The use of squared deviations renders s even more sensitive than x to a few
extreme observations. For example, the standard deviation of the travel
times for the 15 North Carolina workers in Example 2.1 is 15.23 minutes.
(Use your calculator or software to verify this.) If we omit the high outlier, the
standard deviation drops to 11.56 minutes.
If you feel that the importance of the standard deviation is not yet clear, you
are right. We will see in Chapter 3 that the standard deviation is the natural measure of spread for a very important class of symmetric distributions, the Normal
distributions. The usefulness of many statistical procedures is tied to distributions
of particular shapes. This is certainly true of the standard deviation.
CHOOSING MEASURES OF CENTER AND SPREAD
We now have a choice between two descriptions of the center and spread
of a distribution: the five-number summary, or x and s. Because x and s are
sensitive to extreme observations, they can be misleading when a distribution
is strongly skewed or has outliers. In fact, because the two sides of a skewed
distribution have different spreads, no single number describes the spread well.
The five-number summary, with its two quartiles and two extremes, does a
better job.
c02DescribingDistributionsWithNu52 Page 52 8/17/11 5:41:09 PM user-s163
52
CHAP TER 2

Describing Distributions with Numbers
CHOOSING A SUMMARY
The five-number summary is usually better than the mean and standard deviation for
describing a skewed distribution or a distribution with strong outliers. Use x and s
only for reasonably symmetric distributions that are free of outliers.
Outliers can greatly affect the values of the mean x and the standard deviation
s, the most common measures of center and spread. Many more elaborate statistical procedures also can’t be trusted when outliers are present. Whenever
you find outliers in your data, try to find an explanation for them. Sometimes
the explanation is as simple as a typing error, such as typing 10.1 as 101.
Sometimes a measuring device broke down or a subject gave a frivolous response,
like the student in a class survey who claimed to study 30,000 minutes per night.
(Yes, that really happened.) In all these cases, you can simply remove the outlier
from your data. When outliers are “real data,’’ like the long travel times of some
New York workers, you should choose statistical methods that are not greatly
disturbed by the outliers. For example, use the five-number summary rather than
x and s to describe a distribution with extreme outliers. We will meet other
examples later in the book.
Remember that a graph gives the best overall picture of a distribution. If
data have been entered into a calculator or statistical program, it is very
simple and quick to create several graphs to see all the different features
of a distribution. Numerical measures of center and spread report specific facts
about a distribution, but they do not describe its entire shape. Numerical summaries do not disclose the presence of multiple peaks or clusters, for example.
Exercise 2.11 shows how misleading numerical summaries can be. Always plot
your data.
A P P LY Y O U R K N O W L E D G E
2.10 x and s by hand. Radon is a naturally occurring gas and is the second leading
T. Jacobs/Custom Medical Stock Photo/Newscom
cause of lung cancer in the United States.8 It comes from the natural breakdown
of uranium in the soil and enters buildings through cracks and other holes in the
foundations. Found throughout the United States, levels vary considerably from
state to state. There are several methods to reduce the levels of radon in your
home, and the Environmental Protection Agency recommends using one of these if
the measured level in your home is above 4 picocuries per liter. Four readings from
Franklin County, Ohio, where the county average is 9.32 picocuries per liter, were
5.2, 13.8, 8.6, and 16.8.
(a) Find the mean step-by-step. That is, find the sum of the 4 observations and
divide by 4.
(b) Find the standard deviation step-by-step. That is, find the deviation of each
observation from the mean, square the deviations, then obtain the variance
and the standard deviation. Example 2.7 shows the method.
user-F452

(c) Now enter the data into your calculator and use the mean and standard
deviation buttons to obtain x and s. Do the results agree with your hand
calculations?
2.11 x and s are not enough. The mean x and standard deviation s measure
center and spread but are not a complete description of a distribution. Data
sets with different shapes can have the same mean and standard deviation.
To demonstrate this fact, use your calculator to find x and s for these two small
data sets. Then make a stemplot of each and comment on the shape of each
distribution.
2DATASETS
Data A
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74
Data B
6.58
5.76
7.71
8.84
8.47
7.04
5.25
5.56
7.91
6.89
12.50
2.12 Choose a summary. The shape of a distribution is a rough guide to whether the
mean and standard deviation are a helpful summary of center and spread. For which
of the following distributions would x and s be useful? In each case, give a reason for
your decision.
(a) Percents of high school graduates in the states taking the SAT, Figure 1.8
(page 18)
(b) Iowa Test scores, Figure 1.7 (page 17)
(c) New York travel times, Figure 2.1 (page 46)
USING TECHNOLOGY
Although a calculator with “two-variable statistics’’ functions will do the basic
calculations we need, more elaborate tools are helpful. Graphing calculators and
computer software will do calculations and make graphs as you command, freeing
you to concentrate on choosing the right methods and interpreting your results.
Figure 2.3 displays output describing the travel times to work of 20 people in
New York State (Example 2.3). Can you find x, s, and the five-number summary
in each output? The big message of this section is: once you know what to look for,
you can read output from any technological tool.
The displays in Figure 2.3 come from a Texas Instruments graphing calculator, the Minitab and CrunchIt! statistical programs, and the Microsoft
Excel spreadsheet program. Minitab allows you to choose what descriptive
measures you want, while the descriptive measures in the CrunchIt! output
are provided by default. Excel and the calculator give some things we don’t
need. Just ignore the extras. Excel’s “Descriptive Statistics’’ menu item
doesn’t give the quartiles. We used the spreadsheet’s separate quartile function to get Q1 and Q3.
Using Technology
53
c02DescribingDistributionsWithNu54 Page 54 11/15/11 5:03:54 PM user-s163
user-F452
Texas Instruments Graphing Calculator
Minitab
Descriptive Statistics: NYtime
Total
variable Count Mean
NYtime
20 31.25
StDev
21.88
Variance
478.62
Minimum
Q1
5.00 15.00
Median
22.50
Q3
43.75
Maximum
85.00
CrunchIt!
Export
NYtime
n
20
Sample Mean
31.25
Median
22.50
Standard Deviation
21.88
Max
85
Min
5
Q1
15
Q3
43.75
Microsoft Excel
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
B
C
D
QUARTILE(A2:A21,1)
QUARTILE(A2:A21,3)
15
42.5
minutes
Mean
Standard Error
Median
Mode
31.25
4.891924064
22.5
15
Standard Deviation 21.8773495
Sample Variance
478.6184211
Kurtosis
0.329884126
Skewness
1.040110836
Range
80
Minimum
5
85
Maximum
625
Sum
20
Count
Sheet4
Sheet1
Sheet2
Sheet
F IGURE 2.3
Output from a graphing calculator, two statistical software packages, and a spreadsheet program
describing the data on travel times to work in New York State.
54

Organizing a Statistical Problem
E X A M P L E 2 . 8 What is the third quartile?
In Example 2.5, we saw that the quartiles of the New York travel times are Q1 15
and Q3 42.5. Look at the output displays in Figure 2.3. The calculator and Excel
agree with our work. Minitab and CrunchIt! say that Q3 43.75. What
happened? There are several rules for finding the quartiles. Some calculators and
software use rules that give results different from ours for some sets of data. This is
true of Minitab, CrunchIt!, and also Excel, though Excel agrees with our work in this
example. Results from the various rules are always close to each other, so the differences
are never important in practice. Our rule is the simplest for hand calculation. ■
ORGANIZING A STATISTICAL PROBLEM
Most of our examples and exercises have aimed to help you learn basic tools
(graphs and calculations) for describing and comparing distributions. You have
also learned principles that guide use of these tools, such as “start with a graph’’ and
“look for the overall pattern and striking deviations from the pattern.’’ The data
you work with are not just numbers—they describe specific settings such as water
depth in the Everglades or travel time to work. Because data come from a specific
setting, the final step in examining data is a conclusion for that setting. Water depth
in the Everglades has a yearly cycle that reflects Florida’s wet and dry seasons.
Travel times to work are generally longer in New York than in North Carolina.
As you learn more statistical tools and principles, you will face more complex
statistical problems. Although no framework accommodates all the varied issues
that arise in applying statistics to real settings, the following four-step thought
process gives useful guidance. In particular, the first and last steps emphasize that
statistical problems are tied to specific real-world settings and therefore involve
more than doing calculations and making graphs.
ORGANIZING A STATISTICAL PROBLEM:
A Four-Step Process
STATE: What is the practical question, in the context of the real-world setting?
PLAN: What specific statistical operations does this problem call for?
SOLVE: Make the graphs and carry out the calculations needed for this problem.
CONCLUDE: Give your practical conclusion in the setting of the real-world
problem.
To help you master the basics, many exercises will continue to tell you what to
do—make a histogram, find the five-number summary, and so on. Real statistical
problems don’t come with detailed instructions. From now on, especially in the
later chapters of the book, you will meet some exercises that are more realistic.
Use the four-step process as a guide to solving and reporting these problems. They
are marked with the four-step icon, as the following example illustrates.
55
c02DescribingDistributionsWithNu56 Page 56 8/17/11 5:41:20 PM user-s163
56
CHAP TER 2

user-F452
Describing Distributions with Numbers
E X A M P L E 2 . 9 Comparing tropical flowers
STATE: Ethan Temeles of Amherst College, with his colleague W. John Kress, studied
the relationship between varieties of the tropical flower Heliconia on the island of
Dominica and the different species of hummingbirds that fertilize the flowers.9 Over
time, the researchers believe, the lengths of the flowers and the forms of the hummingbirds’ beaks have evolved to match each other. If that is true, flower varieties fertilized
by different hummingbird species should have distinct distributions of length.
Table 2.1 gives length measurements (in millimeters) for samples of three varieties
of Heliconia, each fertilized by a different species of hummingbird. Do the three varieties display distinct distributions of length? How do the mean lengths compare?
PLAN: Use graphs and numerical descriptions to describe and compare these three
distributions of flower length.
Art Wolfe/Getty Images
SOLVE: We might use boxplots to compare the distributions, but stemplots preserve
more detail and work well for data sets of these sizes. Figure 2.4 displays stemplots with
the stems lined up for easy comparison. The lengths have been rounded to the nearest
tenth of a millimeter. The bihai and red varieties have somewhat skewed distributions,
so we might choose to compare the five-number summaries. But because the researchers plan to use x and s for further analysis, we instead calculate these measures:
TROPICALFLOWER
Variety
Mean length
Standard deviation
bihai
red
yellow
47.60
39.71
36.18
1.213
1.799
0.975
CONCLUDE: The three varieties differ so much in flower length that there is little overlap among them. In particular, the flowers of bihai are longer than either red or yellow.
The mean lengths are 47.6 mm for H. bihai, 39.7 mm for H. caribaea red, and 36.2 mm
for H. caribaea yellow. ■
TABLE 2.1
Flower lengths (millimeters) for three Heliconia varieties
H. BIHAI
47.12
48.07
46.75
48.34
46.81
48.15
47.12
50.26
46.67
50.12
47.43
46.34
46.44
46.94
46.64
48.36
41.69
37.40
37.78
39.78
38.20
38.01
40.57
38.07
35.45
34.57
38.13
34.63
37.10
H. CARIBAEA RED
41.90
39.63
38.10
42.01
42.18
37.97
41.93
40.66
38.79
43.09
37.87
38.23
41.47
39.16
38.87
H. CARIBAEA YELLOW
36.78
35.17
37.02
36.82
36.52
36.66
36.11
35.68
36.03
36.03
c02DescribingDistributionsWithNu57 Page 57 8/17/11 5:41:21 PM user-s163
user-F452

bihai
34
35
36
37
38
39
40
41
42
43
44
45
46 3 4 6 7 8 8 9
47 1 1 4
48 1 2 3 4
49
50 1 3
red
34
35
36
37 4 8 9
38 0 0 1 1 2 2 8 9
39 2 6 8
40 6 7
5 799
41
42 0 2
43 1
44
45
46
47
48
49
50
Organizing a Statistical Problem
yellow
34 6 6
35 2 5 7
36 0 0 1 5 7 8 8
37 0 1
38 1
39
40
41
42
43
44
45
46
47
48
49
50
57
FIGU R E 2 . 4
Stemplots comparing the distributions of flower lengths from Table 2.1,
for Example 2.9. The stems are whole
millimeters and the leaves are tenths
of a millimeter.
A P P LY Y O U R K N O W L E D G E
2.13 Logging in the rain forest. “Conservationists have despaired over destruction of
tropical rain forest by logging, clearing, and burning.’’ These words begin a report on
a statistical study of the effects of logging in Borneo.10 Charles Cannon of Duke
University and his coworkers compared forest plots that had never been logged
(Group 1) with similar plots nearby that had been logged 1 year earlier (Group 2)
and 8 years earlier (Group 3). All plots were 0.1 hectare in area. Here are the counts
LOGGING
of trees for plots in each group:
Group 1
27
22
29
21
19
33
16
20
24
27
28
19
Group 2
12
12
15
9
20
18
17
14
14
2
17
19
Group 3
18
4
22
15
18
19
22
12
12
To what extent has logging affected the count of trees? Follow the four-step process
in reporting your work.
2.14 Diplomatic scofflaws. Until Congress allowed some enforcement in 2002, the
© James Leynse/CORBIS
thousands of foreign diplomats in New York City could freely violate parking
laws. Two economists looked at the number of unpaid parking tickets per diplomat
over a five-year period ending when enforcement reduced the problem.11 They
concluded that large numbers of unpaid tickets indicated a “culture of corruption’’
in a country and lined up well with more elaborate measures of corruption. The
data set for 145 countries is too large to print here, but look at the data file on
the text Web site and CD. The first 32 countries in the list (Australia to Trinidad
and Tobago) are classified by the World Bank as “developed.’’ The remaining
countries (Albania to Zimbabwe) are “developing.’’ The World Bank classification
is based only on national income and does not take into account measures of social
SCOFFLAWS
development.
c02DescribingDistributionsWithNu58 Page 58 8/17/11 5:41:21 PM user-s163
58
CHAP TER 2

user-F452
Describing Distributions with Numbers
Give a full description of the distribution of unpaid tickets for both groups of countries and identify any high outliers. Compare the two groups. Does national income
alone do a good job of distinguishing countries whose diplomats do and do not obey
parking laws?
CHAPTER 2
S U M M A RY
CHAPTER SPECIFICS

A numerical summary of a distribution should report at least its center and its spread
or variability.

The mean x and the median M describe the center of a distribution in different ways.
The mean is the arithmetic average of the observations, and the median is the midpoint of the values.

When you use the median to indicate the center of the distribution, describe its spread
by giving the quartiles. The first quartile Q1 has one-fourth of the observations
below it, and the third quartile Q3 has three-fourths of the observations below it.

The five-number summary consisting of the median, the quartiles, and the smallest
and largest individual observations provides a quick overall description of a distribution.
The median describes the center, and the quartiles and extremes show the spread.

Boxplots based on the five-number summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the
distribution. The median is marked within the box. Lines extend from the box to the
extremes and show the full spread of the data.

The variance s2 and especially its square root, the standard deviation s, are common
measures of spread about the mean as center. The standard deviation s is zero when
there is no spread and gets larger as the spread increases.

A resistant measure of any aspect of a distribution is relatively unaffected by changes
in the numerical value of a small proportion of the total number of observations, no
matter how large these changes are. The median and quartiles are resistant, but the
mean and the standard deviation are not.

The mean and standard deviation are good descriptions for symmetric distributions
without outliers. They are most useful for the Normal distributions introduced in the
next chapter. The five-number summary is a better description for skewed distributions.

Numerical summaries do not fully describe the shape of a distribution. Always plot
your data.

A statistical problem has a real-world setting. You can organize many problems using
the following four steps: state, plan, solve, and conclude.
LINK IT
In this chapter we have continued our study of exploratory data analysis. Graphs are
an important visual tool for organizing and identifying patterns in data. They give a
fairly complete description of a distribution, although for many problems the important
c02DescribingDistributionsWithNu59 Page 59 8/17/11 5:41:22 PM user-s163
user-F452

Check Your Skills
59
information in your data can be described by a few numbers. These numerical summaries
can be useful for describing a single distribution as well as for comparing the distributions
from several groups of observations.
Two important features of a distribution are the center and the spread. For distributions that are approximately symmetric without outliers, the mean and standard deviation
are important numeric summaries for describing and comparing distributions. But if the
distribution is not symmetric and/or has outliers, the five-number summary often provides
a better description.
The boxplot gives a picture of the five-number summary that is useful for a simple
comparison of several distributions. Remember that the boxplot is based only on the fivenumber summary and does not have any information beyond these five numbers. Certain
features of a distribution that are revealed in histograms and stemplots will not be evident
from a boxplot alone. These include gaps in the data and the presence of several peaks.
You must be careful when reducing a distribution to a few numbers to make sure that
important information has not been lost in the process.
CHECK YOUR SKILLS
2.15 The respiratory system can be a limiting factor in
maximal exercise performance. Researchers from the United
Kingdom studied the effect of two breathing frequencies on
both performance times and several physiological parameters
in swimming.12 Subjects were 10 male collegiate swimmers.
Here are their times in seconds to swim 200 meters at 90% of
race pace when breathing every second stroke in front-crawl
swimming:
SWIMTIMES
151.6
173.2
165.1
177.6
159.2
174.3
163.5
164.1
174.8
171.4
The mean of these data is
(a) 165.10.
(b) 167.48.
(c) 168.25.
2.16 The median of the data in Exercise 2.15 is
(a) 167.48.
(b) 168.25.
(c) 174.00.
2.17 The five-number summary of the data in Exercise 2.15
is
(a) 151.6, 159.2, 167.48, 174.8, 177.6.
(b) 151.6, 163.5, 168.25, 174.3, 177.6.
(c) 151.6, 159.2, 168.25, 174.8, 177.6.
2.18 If a distribution is skewed to the right,
(a) the mean is less than the median.
(b) the mean and median are equal.
(c) the mean is greater than the median.
2.19 What percent of the observations in a distribution lie
between the first quartile and the third quartile?
(a) 25%
(b) 50%
(c) 75%
2.20 To make a boxplot of a distribution, you must know
(a) all of the individual observations.
(b) the mean and the standard deviation.
(c) the five-number summary.
2.21 The standard deviation of the 10 swim times in Exercise
2.15 (use your calculator) is about
(a) 7.4.
(b) 7.8.
(c) 8.2.
2.22 What are all the values that a standard deviation s can
possibly take?
(a) 0 s
(b) 0 s 1
(c) 1 s 1
2.23 The correct units for the standard deviation in Exercise
2.21 are
(a) no units—it’s just a number.
(b) seconds.
(c) seconds squared.
2.24 Which of the following is least affected if an extreme
high outlier is added to your data?
(a) The median
(b) The mean
(c) The standard deviation
c02DescribingDistributionsWithNu60 Page 60 8/17/11 5:41:22 PM user-s163
60
CHAP TER 2

user-F452
Describing Distributions with Numbers
CHAPTER 2 EXERCISES
2.25 Incomes of college grads. According to the Census
(a) With a little care, you can find the median and the quartiles from the histogram. What are these numbers? How did
you find them?
(b) With a little care, you can also find the mean number of
servings of fruit claimed per day. First use the information in
the histogram to compute the sum of the 74 observations, and
then use this to compute the mean. What is the relationship
between the mean and median? Is this what you expected?
2.31 Guinea pig survival times. Here are the survival
times in days of 72 guinea pigs after they were injected with
infectious bacteria in a medical experiment.14 Survival times,
whether of machines under stress or cancer patients after
treatment, usually have distributions that are skewed to the
GUINEAPIGS
right.
43
45
53
56
56
57
58
66
67
73
74
79
80
80
81
81
81
82
83
83
84
88
89
91
91
92
92
97
99
99
100
100
101
102
102
102
103
104
107
108
109
113
114
118
121
123
126
128
137
138
139
144
145
147
156
162
174
178
179
184
191
198
211
214
243
249
329
380
403
511
522
598
(a) Graph the distribution and describe its main features.
Does it show the expected right-skew?
(b) Which numerical summary would you choose for these
data? Calculate your chosen summary. How does it reflect the
skewness of the distribution?
2.32 Weight of newborns. Page 61 gives the distribution
of the weight at birth for all babies born in the United States
in 2008:15
Photodisc Red/Getty Images
Bureau’s 2010 Current Population Survey, the mean and
median 2009 income of people at least 25 years old who had
a bachelor’s degree but no higher degree were $46,931 and
$58,762. Which of these numbers is the mean and which is
the median? Explain your reasoning.
2.26 Saving for retirement. Retirement seems a long
way off and we need money now, so saving for retirement is
hard. Once every three years, the Board of Governors of the
Federal Reserve System collects data on household assets and
liabilities through the Survey of Consumer Finances (SCF).
The most recent such survey was conducted in 2007, and the
survey results were released to the public in April 2009. The
survey presents data on household ownership of, and balances
in, retirement savings accounts. Only 53.6% of households
own retirement accounts. The mean value per household is
$148,579, but the median value is just $45,000. For households in which the head of household is under 35, 42.6% own
retirement accounts, the mean is $25,279, and the median is
$9600.13 What explains the differences between the two measures of center, both for all households and for the under-35
age group?
2.27 University endowments. The National Association
of College and University Business Officers collects data on
college endowments. In 2009, 842 colleges and universities
reported the value of their endowments. When the endowment values are arranged in order, what are the locations of
the median and the quartiles in this ordered list?
2.28 Pulling wood apart. Example 1.9 (page 21) gives the
WOOD
breaking strengths of 20 pieces of Douglas fir.
(a) Give the five-number summary of the distribution
of breaking strengths. (The stemplot, Figure 1.11, helps
because it arranges the data in order, but you should use the
unrounded values in numerical work.)
(b) The stemplot shows that the distribution is skewed to the
left. Does the five-number summary show the skew? Remember that only a graph gives a clear picture of the shape of a
distribution.
2.29 Comparing tropical flowers. An alternative presentation of the flower length data in Table 2.1 reports
the five-number summary and uses boxplots to display
the distributions. Do this. Do the boxplots fail to reveal
any important information visible in the stemplots in
TROPICALFLOWER
Figure 2.4?
2.30 How much fruit do adolescent girls eat? Figure
1.14 (page 30) is a histogram of the number of servings of fruit
per day claimed by 74 seventeen-year-old girls.
c02DescribingDistributionsWithNu61 Page 61 8/17/11 5:41:22 PM user-s163
user-F452

Chapter 2 Exercises
61
6,581
3,000 to 3,499
1,663,512
500 to 999
23,292
3,500 to 3,999
1,120,642
(b) Use the applet to convince yourself that when you add
yet another observation (there are now seven in all), the
median does not change no matter where you put the seventh
point. Explain why this must be true.
1,000 to 1,499
31,900
4,000 to 4,499
280,270
2.36 Never on Sunday: also in Canada? Exercise 1.5
1,500 to 1,999
67,140
4,500 to 4,999
39,109
2,000 to 2,499
218,296
5,000 to 5,499
4,443
2,500 to 2,999
788,148
(page 11) gives the number of births in the United States
on each day of the week during an entire year. The boxplots
in Figure 2.5 (page 62) are based on more detailed data
from Toronto, Canada: the number of births on each of the
365 days in a year, grouped by day of the week.16 Based on
these plots, compare the day-of-the-week distributions using
shape, center, and spread. Summarize your findings.
Weight (grams)
Less than 500
Count
Weight (grams)
Count
(a) For comparison with other years and with other countries, we prefer a histogram of the percents in each weight class
rather than the counts. Explain why.
(b) How many babies were there?
(c) Make a histogram of the distribution, using percents on
the vertical scale.
(d) What are the locations of the median and quartiles in
the ordered list of all birth weights? In which weight classes
do the median and quartiles fall?
2.33 More on study times. In Exercise 1.38 (page 34)
you examined the nightly study time claimed by first-year
college men and women. The most common methods for
formal comparison of two groups use x and s to summarize
STUDYTIMES
the data.
(a) What kinds of distributions are best summarized byx and
s? Do you think these summary measures are appropriate in
this case?
(b) One student in each group claimed to study at least 300
minutes (five hours) per night. How much does removing
these observations change x and s for each group? You will
need to compute x and s for each group, both with and without the high outlier.
2.34 Making resistance visible. In the Mean and Median
applet, place three observations on the line by clicking
below it: two close together near the center of the line
and one somewhat to the right of these two.
(a) Pull the single rightmost observation out to the right. (Place
the cursor on the point, hold down a mouse button, and drag
the point.) How does the mean behave? How does the median
behave? Explain briefly why each measure acts as it does.
(b) Now drag the single rightmost point to the left as far
as you can. What happens to the mean? What happens to
the median as you drag this point past the other two (watch
carefully)?
2.35 Behavior of the median. Place five observations on the
line in the Mean and Median applet by clicking below it.
(a) Add one additional observation without changing
the median. Where is your new point?
2.37 Thinking about means. Table 1.1 (page 12) gives
the percent of foreign-born residents in each of the states. For
the nation as a whole, 12.5% of residents are foreign-born.
Find the mean of the 51 entries in Table 1.1. It is not 12.5%.
Explain carefully why this happens. (Hint: The states with
the largest populations are California, Texas, New York, and
Florida. Look at their entries in Table 1.1.)
2.38 Thinking about medians. A report says that “the
median credit card debt of American households is zero.’’
We know that many households have large amounts of credit
card debt. In fact, the mean household credit card debt is
close to $8000. Explain how the median debt can nonetheless be zero.
2.39 A standard deviation contest. This is a standard
deviation contest. You must choose four numbers from the
whole numbers 0 to 10, with repeats allowed.
(a) Choose four numbers that have the smallest possible
standard deviation.
(b) Choose four numbers that have the largest possible standard deviation.
(c) Is more than one choice possible in either (a) or (b)?
Explain.
2.40 Test your technology. This exercise requires a calculator with a standard deviation button or statistical software
on a computer. The observations
10,001
10,002
10,003
have mean x 10,002 and standard deviation s 1. Adding
a 0 in the center of each number, the next set becomes
100,001
100,002
100,003
The standard deviation remains s 1 as more 0s are added.
Use your calculator or software to find the standard deviation
of these numbers, adding extra 0s until you get an incorrect
answer. How soon did you go wrong? This demonstrates that
c02DescribingDistributionsWithNu62 Page 62 8/17/11 5:41:23 PM user-s163
CHAP TER 2

Describing Distributions with Numbers
100
80
60
Number of births
120
62
user-F452
Monday
Tuesday Wednesday Thursday
Friday
Saturday
Sunday
Day of week
F IGURE 2.5
Boxplots of the distributions of numbers of births in Toronto, Canada, on each day of the week
during a year, for Exercise 2.36.
calculators and software cannot handle an arbitrary number
of digits correctly.
2.41 You create the data. Create a set of 5 positive numbers (repeats allowed) that have median 7 and mean 10.
What thought process did you use to create your numbers?
2.42 You create the data. Give an example of a small set of
data for which the mean is smaller than the first quartile.
2.43 Adolescent obesity. Adolescent obesity is a serious
health risk affecting more than 5 million young people in the
United States alone. Laparoscopic adjustable gastric banding
has the potential to provide a safe and effective treatment.
Fifty adolescents between 14 and 18 years old with a body
mass index (BMI) higher than 35 were recruited from the
Melbourne, Australia, community for the study.17 Twentyfive were randomly selected to undergo gastric banding,
and the remaining twenty-five were assigned to a supervised
lifestyle intervention program involving diet, exercise, and
behavior modification. All subjects were followed for two
years. Here are the weight losses in kilograms for the subjects
GASTRICBANDS
who completed the study:
Gastric banding
35.6
81.4
57.6
32.8
31.0 37.6
36.5
5.4
27.9
49.0
64.8 39.0
43.0
33.9
29.7
20.2
15.2 41.7
53.4
13.4
24.8
19.4
32.3 22.0
Lifestyle intervention
6.0
17.0
2.0 3.0
1.4
4.0
20.6
11.6 15.5
4.6
15.8 34.6
6.0 3.1 4.3 16.7 1.8 12.8
c02DescribingDistributionsWithNu63 Page 63 8/17/11 5:41:23 PM user-s163
user-F452

(a) In the context of this study, what do the negative values
in the data set mean?
(b) Give a graphical comparison of the weight loss distribution for both groups using side-by-side boxplots. Provide
appropriate numerical summaries for the two distributions and identify any high outliers in either group. What
can you say about the effects of gastric banding versus
lifestyle intervention on weight loss for the subjects in this
study?
(c) The measured variable was weight loss in kilograms.
Would two subjects with the same weight loss always have
similar benefits from a weight reduction program? Does it
depend on their initial weights? Other variables considered
in this study were the percent of excess weight lost and
the reduction in BMI. Do you see any advantages to either
of these variables when comparing weight loss for two
groups?
(d) One subject from the gastric-banding group dropped
out of the study and seven subjects from the lifestyle group
dropped out. Of the seven dropouts in the lifestyle group,
six had gained weight at the time they dropped out. If
all subjects had completed the study, how do you think
it would have affected the comparison between the two
groups?
Exercises 2.44 to 2.49 ask you to analyze data without having the details outlined for you. The exercise statements give
you the State step of the four-step process. In your work,
follow the Plan, Solve, and Conclude steps as illustrated in
Example 2.9.
2.44 Athletes’ salaries. The Montreal Canadiens were
founded in 1909 and are the longest continuously
operating professional ice hockey team. They have
won 24 Stanley Cups, making them one of the most suc-
TABLE 2.2
Chapter 2 Exercises
63
cessful professional sports teams of the traditional four
major sports of Canada and the United States. Table
2.2 gives the salaries of the 2010—2011 roster.18 Provide
the team owner with a full description of the distribution of salaries and a brief summary of its most important
HOCKEYSALARIES
features.
AP Photo/The Canadian Press, Ryan Remiorz
2.45 Returns on stocks. How well have stocks done
over the past generation? The Wilshire 5000 index
describes the average performance of all U.S. stocks.
The average is weighted by the total market value of each
company’s stock, so think of the index as measuring the
performance of the average investor. Page 64 gives the percent returns on the Wilshire 5000 index for the years from
WILSHIRE5000
1971 to 2010:
Salaries for the 2010—2011 Montreal Canadiens
PLAYER
Scott Gomez
Mike Cammalleri
Jaroslav Spacek
Carey Price
Benoit Pouliot
Max Pacioretty
Yannick Weber
David Desharnais
SALARY
$8,000,000
$5,000,000
$3,833,000
$2,500,000
$1,350,000
$875,000
$637,500
$550,000
PLAYER
Andrei Markov
Brian Gionta
Andrei Kostitsyn
Hal Gill
Josh Gorges
Lars Eller
Jeff Halpern
Mathieu Darche
SALARY
$5,750,000
$5,000,000
$3,250,000
$2,250,000
$1,300,000
$875,000
$600,000
$500,000
PLAYER
Roman Hamrlik
Tomas Plekanec
James Wisniewski
Travis Moen
Alex Auld
P. K. Subban
Alexandre Picard
Tom Pyatt
SALARY
$5,500,000
$5,000,000
$3,250,000
$1,500,000
$1,000,000
$875,000
$600,000
$500,000
c02DescribingDistributionsWithNu64 Page 64 8/17/11 5:41:28 PM user-s163
64
CHAP TER 2

user-F452
Describing Distributions with Numbers
What can you say about the distribution of yearly returns
on stocks?
Wilshire index for the years 1971 to 2010
Year
Return
Year
Return
1971
16.19
1991
33.58
1972
17.34
1992
9.02
1973
18.78
1993
10.67
1974
27.87
1994
0.06
1975
37.38
1995
36.41
1976
26.77
1996
21.56
1977
2.97
1997
31.48
1978
8.54
1998
24.31
1979
24.40
1999
24.23
1980
33.21
2000
10.89
1981
3.98
2001
10.97
1982
20.43
2002
20.86
1983
22.71
2003
31.64
1984
3.27
2004
12.48
1985
31.46
2005
6.38
1986
15.61
2006
15.77
1987
1.75
2007
5.62
1988
17.59
2008
37.23
1989
28.53
2009
28.30
1990
6.03
2010
17.16
TABLE 2.3
Amount spent (euros) by customers in a restaurant when exposed
to odors
2.46 Do good smells bring good business? Businesses
know that customers often respond to background
music. Do they also respond to odors? Nicolas Guéguen and his colleagues studied this question in a small pizza
restaurant in France on Saturday evenings in May. On one
evening, a relaxing lavender odor was spread through the
restaurant; on another evening, a stimulating lemon odor;
a third evening served as a control, with no odor. Table
2.3 shows the amounts (in euros) that customers spent on
each of these evenings.19 Compare the three distributions.
Were both odors associated with increased customer
ODORS
spending?
2.47 Daily activity and obesity. People gain weight
when they take in more energy from food than
they expend. Table 2.4 (page 65) compares volunteer subjects who were lean with others who were mildly
obese. None of the subjects followed an exercise program.
The subjects wore sensors that recorded every move for
10 days. The table shows the average minutes per day spent
in activity (standing and walking) and in lying down.20
Compare the distributions of time spent actively for lean
and obese subjects and also the distributions of time spent
lying down. How does the behavior of lean and mildly obese
OBESITY
people differ?
NO ODOR
15.9
15.9
18.5
18.5
18.5
18.5
15.9
18.5
15.9
18.5
18.5
18.5
18.5
20.5
15.9
21.9
18.5
18.5
15.9
18.5
15.9
15.9
15.9
25.5
15.9
15.9
12.9
15.9
15.9
15.9
18.5
18.5
18.5
15.9
18.5
18.5
18.5
18.5
18.5
18.5
18.5
18.5
21.9
22.5
18.5
20.7
21.5
24.9
21.9
21.9
21.9
22.5
LEMON ODOR
18.5
15.9
25.9
15.9
18.5
15.9
18.5
21.5
15.9
18.5
15.9
15.9
18.5
21.9
18.5
15.9
15.9
18.5
LAVENDER ODOR
21.9
21.5
25.9
18.5
18.5
21.9
22.3
25.5
18.5
21.9
18.5
18.5
18.5
18.5
22.8
24.9
21.9
18.5
CHApT€R
11
Data Description and Probability Oistributlon5
Mitch.d ProblCln 3
Add the salary $100,000 to those in Example 3 aDd compute the median
😉
,l
‘i’,,
and
mear for these eight salariesJt1e.llegian, as we-have defined it, is easy to deterntire and is not influenced by
extreme values. Our definition does have some minor handicaps, howevdrlFiist, if
the measurements we are analyiing were carried out in a laboratory and pres€nted
to us in a freguency table, we may not have access to the individual ineasurements,
In that case we would not be able to compute the median usiog the above definition
Second, a set like 4,4, 6, 7,7, ?, 9 would have mediau 7 by our definition, but 7 does
trot possess the symmetry we expect of a “middle elemcnt” since there are thrce
measulements below 7 but only one above.
To overcome these handicaps, we define a second cancept, llle medidn for
grouped dam.To guarcntee that the median for gouped data exists and is unique. we
assume that the frequency table lor the grouped data has no classes of frequency 0
:
!:
l
.–)
DEflNlTlOll Medirtu Giiouped Data
The median for grouped drtr with no classes of frequency 0 is the number
such
that the histogram has the same area to the left oI the mediafl as to the right of the
median (see Fig.2).
Frgute 2 ‘I.ll€ arci to the
lefl ofthe median equals th€
areB to thc righl.
J
@
iltf”t;[?.”
SoLUtloN


*”0*”
fot crouped Dula Compute tbe metlian for the groupcd diua
First $,e draw the histogram of the data (Frg.3).The total area of the histogram is l5.
which is just lhe sum o{ the frequencies, since all rectangles have a base of lengtb L
The area to the left of the median must be hau the total arca-that is, 1i = 7,5.
I-ooking at Figure 3,we $ee tha! the median M lies between 6..5 and ?.5.Thus, tbe area
to the left of il,wNch is the sum of tbe blue sbaded areas in Fgurc 3, lnust bc 7.5:
(1X3) +
Solving for M gives
M=
(lxl) + OX2)
+ (M
– 6.sX4) = ?.5
6.875. The median for the glcruped data in Thbte 3 is 6.875.
lbbl. 3
Chis lntervf,l
j
Ileqrercy
4.5-5.s
3
1,
s.tu.5
2
6.5-75
4
3.5-4.5
7.5-8.5
8.5-9.5
l.l
!
Flgv.e,
l-t
Gvu.f
i,”.,,i:
SECTION
Interval
0.5-2.5
2.54.5
4.5-6.5
5.5-8.5
@

5
1
State Gasoline Tax, 2OO7
2
Wisconsin
1.91
7.99
8.0i
8,04
6.24
6_24
8.13
8.09
7.95
preference.
a
is
51
91
80
95
91
81
85
fair die.
1n; f’orm
44.5
Connecticut
Nebraska
35.5
Kansas
25
Texas
20
California
Florida
31.1
26.2
41.6
Life IHours] of 50 Randomly Selected Lightbulbs
Interval
Frequency
60
expect the mean of the data set to be?
The median?
ffi
New York
15. Lightbulb lifetime. Find the mean and median for the data
in the following table.
formed by recording the results of 100 rolls of
(A) What would you
32.9
81.2
which single measure of central tendency-mean, median,
or mode-would you say best descdbes the following set
of measurements? Discuss the factgrs that justify your
47
69
Tax (Cents)
State
‘7
pteference.
9. A data set
Nleasures of Central Tendency
74. Gasoline tax. Find the mean, median, and mode for the
data in the following table.
Frequency
7, Which single measure of central tendency-mean, mediari,
or mode-would you say best describes the following set
of measureme[ts? Discuss the factors that justify your
S8.
U-a
such a data set by using a graphing calculator to sim-
799.5-899.5
3
899.5-999.5
10
999.5-1,099.5
24
1,099.5-1,199.5
12
i,199.5-1.299.5
1
ulate 100 rolls of a fair die, and find its mean and median.
10, A data set is formed by recording the sums on 200 rolls of
pair of fair dice.
(A) What would you
a
expect the mean of the data set to be?
The median?
ffi
(B) Fo.- s.,”h
a data set by using a graphing calculator to
simulate 200 rolls of a pair of fair dbe, and find the
(A) Construct a set of four numbets that has mean
300.
median 250, and mode 175.

Price-€arnings ratios. Find the mean and median for the
data in the following table.
Price-Earnings Fatios of 1OO Randomly Chosen Stocks
from the New York Stock Exchanqe
Interyd
mean and median of the set.
ll.
16.
1e; f-et mr > m2 > ,n3. Devise and discuss a procedure
for constructing a set of four numbers that has mean
Frequency
-0.5-4.5
5
4.!9.5
54
9.5-14.5
25
14.5-19.5
9
19.5-24.5
4
24.5-29.5
29.5-34.5
1
2
ml, median m2, and mode m3.
12. (A) Construct a set of five numbe$ that has mean 200,
median 150, and mode 50.

1b; I-et mt ) mz ) ,n3. Devise and discuss a procedure
for constructing a set of five numberc that has mean
ml, median
tn 2,
and mode
m 3.
Average Lcderal Work-Study Award
Year
Award ($)
Applications
13. Price-earnings ratios. Find the mean, median, and
for the data in the following table.
Price-Earninqs Ratios for Eight Stocks in a Portfolio
5.3
12.9
10.1
8.4
17. Financial aid. Find the mean, median, and mode for the
data on federal student financial assistance in the followilg
table. (Solrce. College Board)
1995
1,087
1,997
1,215
1999
I,252
2001
r,394
2003
‘ tq6
1,446
18.7
35.5
2005
16.2
10.1
2007
7
i
CHAPTER
11
Data Description and Probability Distributions
18. Tourism. Find the mean, median, and mode for the data in
the following table. (Souce. The World Bank)
22. Grade-point averages. Find the mean and median for the
grouped data in the following table.
lnternational Tourism Receipts, aOO6
Country
Udted States
Spain
Fmnce
Grcat Britain
Germary
Italy
Graduating Class Grade-Point Averages
Interval
Frequency
Receipts (bilion $)
12a.9
2t
1.95-2.15
57.5
19
54.0
74
42.8
2.35-2.55
2.55-2.75
2.75-2.95
47.6
2951.15
6
China
37.1.
Calada
17.0
3.15-3.35
3.35-3.55
4
14.5
3;,55-i;15
3
12.7
3.751.95
2
Greece
Belgiuo
19. Mouse
43.0
weights. Find the mean and median for the data in
the following table.
5
on page 518.
Presidents.
Frequency
7
45.547.5
47.549.5
13
49.5*51.5
19
51.5-53.5
53.5-55.5
l7
15
55.5*57.5
7
Frld the mean and median for the
grouped
data in the following table.
3
43.5-45.5
57.5-59.5
9
23. Entrance examination scores. Compute the median for the
gouped data of entrance examination scores given inTable 1
Mouse Weights (Grams)
Interval
41.543.5
t7
U.S. Presidents’ Ages at lnauguration
Age
1’7
Number
39.5-44.5
2
44549.5
7
49.5-54.5
12
54.5-59.5
13
.7
s9.5-64.5
2
Blood cholesterol levels. Find the mean and median for the
.
data in the following table.
– _t
64.5-49.5
2
69.5-:74.5
1
Blood Cholesterol Levels (Milligrams per Deciliter)
Inten€l
149.5-769.5
169.5-189.5
189.5-209.5
209.5-229.5
229.5,2495
249.5-269.5
269.5-289.5
289.5-309.5
Frequency
4
11
ls
25
13
‘7
3
2
Immigration. Find the mean, median, and mode for the
data in the following table. (Sorlce. U.S. Census Bureau)
Top Ten Countries of Birth of U.S. Foreiqn-Born
Population, eOOT
Country
Mexico
Number (thousands)
11,739
China
Philippines
1,930
India
El Salvado!
Vietnam
1,502
Korea
Cuba
1,043
1,701
1,1M
1,101
983
Canada
830
Domidcan Republic
756
Answel3 to Matched problems
1.
t:3.8
2., x
3. Median =
IO.1
=
4. Median for grouped data = 6.8
5. Arrange
$44,000; mean
$63,250
each set of data in ascending order:
Set
Mode Median
1
2
None 5
(c) 1,1,2,3,3,3,5,6,8,8,8 3,8
3
Data
9
(B) 1,2,4,5,7,8,9
(A)
1, 1, 1, 1,
2,2, 4, 5,
Mean
2.89
5,14
4.36
CHAPTER 2:
Describing Distributions
with Numbers
The Basic Practice of Statistics
6th Edition
Moore / Notz / Fligner
Lecture PowerPoint Slides
Chapter 2 Concepts
2

Measuring Center: Mean and Median

Measuring Spread: Quartiles

Five-Number Summary and Boxplots

Spotting Suspected Outliers

Measuring Spread: Standard Deviation

Choosing Measures of Center and Spread
Chapter 2 Objectives
3








Calculate and Interpret Mean and Median
Compare Mean and Median
Calculate and Interpret Quartiles
Construct and Interpret the Five-Number
Summary and Boxplots
Determine Suspected Outliers
Calculate and Interpret Standard Deviation
Choose Appropriate Measures of Center and
Spread
Organize a Statistical Problem
Measuring Center: The Mean
The most common measure of center is the arithmetic
average, or mean.
To find the mean x (pronounced “x-bar”) of a set of observations, add
their values and divide by the number of observations. If the n
observations are x1, x2, x3, …, xn, their mean is:
sum of observations x1 + x 2 + …+ x n
x=
=
n
n
or in more compact notation
x
å
x=
i
n
4
Measuring Center: The Median
5
Because the mean cannot resist the influence of extreme
observations, it is not a resistant measure of center.
Another common measure of center is the median.
The median M is the midpoint of a distribution, the number such
that half of the observations are smaller and the other half are
larger.
To find the median of a distribution:
1. Arrange all observations from smallest to largest.
2. If the number of observations n is odd, the median M is the
center observation in the ordered list.
3. If the number of observations n is even, the median M is the
average of the two center observations in the ordered list.
Measuring Center
6

10
Use the data below to calculate the mean and median of the
commuting times (in minutes) of 20 randomly selected New York
workers.
30
5
25
40
20
10
15
30
20
15
20
85
15
65
15
60
60
40
10 + 30 + 5 + 25 + …+ 40 + 45
x=
= 31.25 minutes
20
0
1
2
3
4
5
6
7
8
5
005555
0005
Key: 4|5
00
represents a
005
005
5
New York
worker who
reported a 45minute travel
time to work.
20 + 25
M=
= 22.5 minutes
2
45
Comparing the Mean and
Median
7

The mean and median measure center in different ways,
and both are useful.
Comparing the Mean and the Median
The mean and median of a roughly symmetric distribution are
close together.
If the distribution is exactly symmetric, the mean and median
are exactly the same.
In a skewed distribution, the mean is usually farther out in the
long tail than is the median.
Measuring Spread: Quartiles
8
A measure of center alone can be misleading.
 A useful numerical description of a distribution requires
both a measure of center and a measure of spread.

How to Calculate the Quartiles and the Interquartile Range
To calculate the quartiles:
1) Arrange the observations in increasing order and locate the
median M.
2) The first quartile Q1 is the median of the observations
located to the left of the median in the ordered list.
3) The third quartile Q3 is the median of the observations
located to the right of the median in the ordered list.
The interquartile range (IQR) is defined as: IQR = Q3 – Q1
Five-Number Summary
9

The minimum and maximum values alone tell us little about
the distribution as a whole. Likewise, the median and
quartiles tell us little about the tails of a distribution.

To get a quick summary of both center and spread,
combine all five numbers.
The five-number summary of a distribution consists of the
smallest observation, the first quartile, the median, the third
quartile, and the largest observation, written in order from
smallest to largest.
Minimum
Q1
M
Q3
Maximum
Boxplots
10

The five-number summary divides the distribution roughly
into quarters. This leads to a new way to display
quantitative data, the boxplot.
How to Make a Boxplot
• Draw and label a number line that includes the
range of the distribution.
• Draw a central box from Q1 to Q3.
• Note the median M inside the box.
• Extend lines (whiskers) from the box out to the
minimum and maximum values that are not
outliers.
Suspected Outliers: The 1.5  IQR Rule
11

In addition to serving as a measure of spread, the
interquartile range (IQR) is used as part of a rule of thumb
for identifying outliers.
The 1.5  IQR Rule for Outliers
Call an observation an outlier if it falls more than 1.5  IQR above the
third quartile or below the first quartile.
In the New York travel time data, we found Q1 = 15
minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes.
0
1
2
For these data, 1.5  IQR = 1.5(27.5) = 41.25
3
Q1 – 1.5  IQR = 15 – 41.25 = –26.25
4
Q3+ 1.5  IQR = 42.5 + 41.25 = 83.75
5
Any travel time shorter than −26.25 minutes or longer than 6
7
83.75 minutes is considered an outlier.
8
5
005555
0005
00
005
005
5
Boxplots
12

Consider our NY travel times data. Construct a boxplot.
10
30
5
25
40
20
10
15
30
20
15
20
85
15
65
15
60
60
40
45
5
10
10
15
15
15
15
20
20
20
25
30
30
40
40
45
60
60
65
85
M = 22.5
Measuring Spread: Standard
Deviation
13

The most common measure of spread looks at how far
each observation is from the mean. This measure is called
the standard deviation.
The standard deviation sx measures the average distance of the
observations from their mean. It is calculated by finding an average of
the squared distances and then taking the square root. This average
squared distance is called the variance.
(x1 – x ) 2 + (x 2 – x ) 2 + …+ (x n – x ) 2
1
variance = s =
=
(x i – x ) 2
å
n -1
n -1
2
x
1
2
standard deviation = sx =
(x
x
)
å
i
n -1
Calculating the Standard Deviation
14

Example: Consider the following data on the number of
pets owned by a group of nine children.
1) Calculate the mean.
2) Calculate each deviation.
deviation = observation – mean
deviation: 1 – 5 = -4
deviation: 8 – 5 = 3
x=5
Calculating the Standard Deviation
15
3) Square each deviation.
4) Find the “average” squared deviation.
Calculate the sum of the squared
deviations divided by (n-1)…this is
called the variance.
5) Calculate the square root of the
variance…this is the standard
deviation.
(xi-mean)2
xi
(xi-mean)
1
1 – 5 = -4
(-4)2 = 16
3
3 – 5 = -2
(-2)2 = 4
4
4 – 5 = -1
(-1)2 = 1
4
4 – 5 = -1
(-1)2 = 1
4
4 – 5 = -1
(-1)2 = 1
5
5-5=0
(0)2 = 0
7
7-5=2
(2)2 = 4
8
8-5=3
(3)2 = 9
9
9-5=4
(4)2 = 16
Sum=?
“Average” squared deviation = 52/(9-1) = 6.5
Standard deviation = square root of variance =
Sum=?
This is the variance.
6.5 = 2.55
Choosing Measures of Center and Spread
16

We now have a choice between two descriptions for center and spread

Mean and Standard Deviation

Median and Interquartile Range
Choosing Measures of Center and Spread
•The median and IQR are usually better than the mean and
standard deviation for describing a skewed distribution or a
distribution with outliers.
•Use mean and standard deviation only for reasonably
symmetric distributions that don’t have outliers.
•NOTE: Numerical summaries do not fully describe the
shape of a distribution. ALWAYS PLOT YOUR DATA!
Organizing a Statistical Problem
17

As you learn more about statistics, you will be asked to
solve more complex problems.

Here is a four-step process you can follow.
How to Organize a Statistical Problem: A Four-Step Process
State: What’s the practical question, in the context of the realworld setting?
Plan: What specific statistical operations does this problem call
for?
Do: Make graphs and carry out calculations needed for the
problem.
Conclude: Give your practical conclusion in the setting of the
real-world problem.
Chapter 2 Objectives Review
18








Calculate and Interpret Mean and Median
Compare Mean and Median
Calculate and Interpret Quartiles
Construct and Interpret the Five-Number
Summary and Boxplots
Determine Suspected Outliers
Calculate and Interpret Standard Deviation
Choose Appropriate Measures of Center and
Spread
Organize a Statistical Problem
‫ﺭﻗﻢ ﺍﻟﻄﺎﻟﺐ ‪1109285062 :‬‬
‫ﺇﺳﻢ ﺍﻟﻤﻮﻇﻒ ‪1109285062 :‬‬
‫ﺍﻟﺘﺎﺭﻳﺦ ‪PM 11:04:16 2021/01/27 :‬‬
‫ﺳ‬
‫ﺇ‬
‫ﻢﺍ‬
‫ﻟﻤ‬
‫ﻮﻇ‬
‫ﺍﻟ‬
‫ﺦ‪:‬‬
‫ﺎﺭﻳ‬
‫ﺘ‬
‫ﻒ‪:‬‬
‫‪62‬‬
‫‪21‬‬
‫‪/0‬‬
‫‪1/‬‬
‫‪27‬‬
‫‪50‬‬
‫‪28‬‬
‫‪09‬‬
‫‪11‬‬
‫‪20‬‬
‫‪16‬‬
‫‪4:‬‬
‫‪:0‬‬
‫‪11‬‬
‫‪PM‬‬
‫ﺳ‬
‫ﺇ‬
‫ﻢﺍ‬
‫ﻟﻤ‬
‫ﻮﻇ‬
‫ﺍﻟ‬
‫ﺦ‪:‬‬
‫ﺎﺭﻳ‬
‫ﺘ‬
‫ﻒ‪:‬‬
‫‪62‬‬
‫‪21‬‬
‫‪/0‬‬
‫‪1/‬‬
‫‪27‬‬
‫‪50‬‬
‫‪28‬‬
‫‪09‬‬
‫‪11‬‬
‫‪20‬‬
‫‪16‬‬
‫‪4:‬‬
‫‪:0‬‬
‫‪11‬‬
‫‪PM‬‬
‫ﺳ‬
‫ﺇ‬
‫ﻢﺍ‬
‫ﻟﻤ‬
‫ﻮﻇ‬
‫ﺍﻟ‬
‫ﺦ‪:‬‬
‫ﺎﺭﻳ‬
‫ﺘ‬
‫ﻒ‪:‬‬
‫‪62‬‬
‫‪21‬‬
‫‪/0‬‬
‫‪1/‬‬
‫‪27‬‬
‫‪50‬‬
‫‪28‬‬
‫‪09‬‬
‫‪11‬‬
‫‪20‬‬
‫‪16‬‬
‫‪4:‬‬
‫‪:0‬‬
‫‪11‬‬
‫‪PM‬‬
SACM Student Progress Evaluation
MSU Student ID:
Degree level:
Minor (if any):
Alsufari, Rahaf K.A
Department Of Biological Sciences
Biology
14079733
BS
N/A
‫ﺍﻟ‬
‫ﺎﺭﻳ‬
‫ﺘ‬
1
:‫ﻒ‬
‫ﻮﻇ‬
‫ﻟﻤ‬
‫ﻢﺍ‬
‫ﺳ‬
‫ﺇ‬
Student Name:
Department:
Current Major:
8
Expected Date of Graduation: (FALL/SPRING year)
1/
28
/0
09
4:
16
20
21
11
PM
6
11
:0
2
3
4
5
27
50
62
:‫ﺦ‬
7
Minimum number of Semester Credits Required to Complete Program of Study:
120
(equals 3+6)
53
Total Number of Completed Semester Credits including Transfer:
Total Number of Credit Hours Counting toward full degree program of study:
8
Number of Semester Credits Accepted in Transfer:
Of Which, How Many will apply towards the Major, Gen. Ed or Open
8
Electives:
Number of Remaining Credits to complete program of study (including
76
registered hours):
Yes
Has Student Been Accepted into Major?
(estimate that could be impacted by many variables including course availability,
at the
earliest
student performance etc.)
Advisor’s Name:
E-mail address:
Ken Adams
kenneth.adams@mnsu.edu
Spring 23
Signature
Date:
01/20/21
Additional note if required:
Graduation
would require a minimum of 5 semesters given the prerequisite structure of the biology
_____________________________________________________________________________________
and chemistry classes.
Date Issued: 01/06/2021
Student Full Name: Alsufari, Rahaf K.A
Tech Id
14079733
Degree Level: BS
Current Major
Biology
Hybrid %
Online In class
A. Previously Taken Online (Hybrid, Web-enhanced, Blended) Class (s):
Online
100%
1- Course Title:
Course No
Credits
2- Course Title:
Course No
Credits
Semester/Yr
Semester/Yr
3- Course Title:
Course No
Credits
Semester/Yr
4- Course Title:
Course No
Credits
Semester/Yr
5- Course Title:
Course No
Credits
Printed Name
Joshua Woldt
Signature of Registrar
Semester/Yr
Printed Name
Chair Signature
1-19-21
Date Signed
Email Address
Date Signed
‫ﻢﺍ‬
‫ﺳ‬
‫ﺇ‬
Joshua.woldt@mnsu.edu
Email Address
‫ﻮﻇ‬
‫ﻟﻤ‬
Hybrid %
In class
Credits
Semester/Yr
Semester/Yr
‫ﺍﻟ‬
27
50
Online
100%
1/
Printed Name
/0
09
11
20
21
Chair Signature
Date Signed
16
1-19-21
Email Address
Date Signed
:0
4:
Joshua.woldt@mnsu.edu
Email Address
Signature of Registrar
28
Printed Name
Joshua Woldt
‫ﺎﺭﻳ‬
‫ﺘ‬
Credits
Course No
:‫ﺦ‬
Course No
2- Course Title:
62
1- Course Title:
:‫ﻒ‬
B. Currently Registered (or Preregistered) Online (Hybrid, Web-enhanced, Blended) Class: Online
Yes:
Yes:
Yes:
Yes:
Yes:
Yes:
No:
No:
No:
No:
No:
No:
Course # (2)
Is the course required in Student’s program of study?
Is this course available in face-to-face format?
Is there an available substitute face-to-face class for this course?
Could it be taken in coming semesters without conflict with degree plan?
Will graduation be delayed if course not taken in the semester requested?
Is this student graduating by the end of current semester?
Yes:
Yes:
Yes:
Yes:
Yes:
Yes:
No:
No:
No:
No:
No:
No:
PM
11
Step 2: To be completed by the Student’s Advisor:
(Evaluation of Course Reported in Item B.)
Course # (1)
Is the course required in Student’s program of study?
Is this course available in face-to-face format?
Is there an available substitute face-to-face class for this course?
Could it be taken in coming semesters without conflict with degree plan?
Will graduation be delayed if course not taken in the semester requested?
Is this student graduating by the end of current semester?
Printed Name:
Signature of Advisor
Email Address:
Date signed:
Notes
FM.indd Page xxv 11/9/11 3:58:39 PM user-s163
user-F452
FM.indd Page i 11/10/11 3:45:17 PM user-s163
user-F452
The Basic Practice
of Statistics
SIXTH EDITION
D AV I D S . M O O R E
Purdue University
WILLIAM I. NOTZ
The Ohio State University
MICHAEL A. FLIGNER
The Ohio State University
W. H. Freeman and Company
New York
FM.indd Page ii 11/9/11 3:58:32 PM user-s163
Publisher: Ruth Baruth
Acquisitions Editor: Karen Carson
Executive Marketing Manager: Jennifer Somerville
Developmental Editors: Andrew Sylvester and Leslie Lahr
Senior Media Acquisitions Editor: Roland Cheyney
Senior Media Editor: Laura Capuano
Associate Editor: Katrina Wilhelm
Assistant Media Editor: Catriona Kaplan
Editorial Assistant: Tyler Holzer
Photo Editor: Cecilia Varas
Photo Researcher: Elyse Rieder
Cover and Text Designer: Blake Logan
Senior Project Editor: Mary Louise Byrd
Illustrations: Macmillan Solutions
Production Coordinator: Susan Wein
Composition: Aptara®, Inc.
Printing and Binding: Quad Graphics
Library of Congress Control Number:
2011934674
Student Edition (Hardcover w/cd) Student Edition (Paperback w/cd) Student Edition (Looseleaf w/cd)
ISBN-13: 978-1-4641-0254-7
ISBN-13: 978-1-4641-0434-3
ISBN-13: 978-1-4641-0433-6
ISBN-10: 1-4641-0254-6
ISBN-10: 1-4641-0434-4
ISBN-10: 1-4641-0433-6
© 2013, 2010, 2007, 2004 by W. H. Freeman and Company
All rights reserved
Printed in the United States of America
First printing
W. H. Freeman and Company
41 Madison Avenue
New York, NY 10010
Houndmills, Basingstoke RG21 6XS, England
www.whfreeman.com
user-F452
FM.indd Page iii 11/18/11 11:54:13 PM user-s163
user-F452
Brief Contents
Pa r t I
1
Exploring Data
Exploring Data: Variables and Distributions
CHAPTER 1
Picturing Distributions with Graphs 3
CHAPTER 2
Pa r t I I I
Describing Distributions with
Numbers 39
Quantitative Response Variable
Inference about a Population
Mean 437
Two-Sample Problems 465
Categorical Response Variable
CHAPTER 20 Inference about a Population
Proportion 493
The Normal Distributions 69
Exploring Data: Relationships
CHAPTER 4
Scatterplots and Correlation 97
CHAPTER 19
CHAPTER 5
Regression
125
CHAPTER 6
Two-Way Tables*
CHAPTER 7
Exploring Data: Part I Review
Pa r t I I
From Exploration to
Inference
197
159
175
CHAPTER 21
Comparing Two Proportions
CHAPTER 22
Inference about Variables: Part III
Review 533
Pa r t I V
Inference about
Relationships
Producing Data
Producing Data: Sampling
199
Producing Data: Experiments
Commentary: Data Ethics*
Probability and Sampling Distributions 246
CHAPTER 10 Introducing Probability 259
CHAPTER 9
435
CHAPTER 18
CHAPTER 3
CHAPTER 8
Inference about
Variables
Sampling Distributions 285
CHAPTER 12
General Rules of Probability* 307
Binomial Distributions* 331
Foundations of Inference
CHAPTER 14 Confidence Intervals: The Basics
351
CHAPTER 15
Tests of Significance: The Basics
369
CHAPTER 16
Inference in Practice
CHAPTER 17
From Exploration to Inference: Part II
Review 417
551
CHAPTER 23
Two Categorical Variables:
The Chi-Square Test 553
CHAPTER 24
Inference for Regression
CHAPTER 25
One-Way Analysis of Variance:
Comparing Several Means 623
Pa r t V
Optional Companion
Chapters
223
CHAPTER 11
515
CHAPTER 13
587
(available on the BPS CD and online)
391
CHAPTER 26
Nonparametrics Tests
26-3
CHAPTER 27
Statistical Process Control
CHAPTER 28
Multiple Regression*
CHAPTER 29
More about Analysis of Variance 29-3
27-3
28-3
*Starred material is not required for later parts of the text.
iii
FM.indd Page iv 11/9/11 3:58:32 PM user-s163
user-F452
Detailed Table of Contents
To the Instructor viii
Media and Supplements xix
About the Authors xxiv
To the Student xxvi
Pa r t I
CHAPTER 4
Scatterplots and Correlation
1
Exploring Data
CHAPTER 1
Picturing Distributions with Graphs 3
Individuals and variables 3
Categorical variables: pie charts and bar graphs
Quantitative variables: histograms 11
Interpreting histograms 15
Quantitative variables: stemplots 20
Time plots 23
6
Measuring center: the mean 40
Measuring center: the median 41
Comparing the mean and the median 42
Measuring spread: the quartiles 43
The five-number summary and boxplots 45
Spotting suspected outliers* 48
Measuring spread: the standard deviation 49
Choosing measures of center and spread 51
Using technology 53
Organizing a statistical problem 55
Regression lines 125
The least-squares regression line 128
Using technology 130
Facts about least-squares regression 132
Residuals 135
Influential observations 139
Cautions about correlation and regression 142
Association does not imply causation 144
CHAPTER 6
Two-Way Tables* 159
Marginal distributions 160
Conditional distributions 162
Simpson’s paradox 166
CHAPTER 7
Exploring Data: Part I Review
Part I summary 177
Test yourself 180
Supplementary exercises
175
191
69
Density curves 69
Describing density curves 73
Normal distributions 75
The 68–95–99.7 rule 77
The standard Normal distribution 80
Finding Normal proportions 81
Using the standard Normal table 83
Finding a value given a proportion 86
*Starred material is not required for later parts of the text.
iv
Explanatory and response variables 97
Displaying relationships: scatterplots 99
Interpreting scatterplots 101
Adding categorical variables to scatterplots 104
Measuring linear association: correlation 106
Facts about correlation 108
CHAPTER 5
Regression 125
CHAPTER 2
Describing Distributions with Numbers 39
CHAPTER 3
The Normal Distributions
97
Pa r t I I
From Exploration
to Inference
CHAPTER 8
Producing Data: Sampling
199
Population versus sample 199
How to sample badly 202
Simple random samples 203
197
FM.indd Page v 11/9/11 3:58:32 PM user-s163
user-F452

Inference about the population 208
Other sampling designs 209
Cautions about sample surveys 210
The impact of technology 213
CHAPTER 9
Producing Data: Experiments
CHAPTER 13
Binomial Distributions*
232
351
The reasoning of tests of significance 370
Stating hypotheses 372
P-value and statistical significance 374
Tests for a population mean 378
Significance from a table* 382
253
CHAPTER 16
Inference in Practice 391
Conditions for inference in practice 392
Cautions about confidence intervals 395
Cautions about significance tests 397
Planning studies: sample size for confidence intervals 401
Planning studies: the power of a statistical test* 402
268
CHAPTER 11
Sampling Distributions 285
Parameters and statistics 285
Statistical estimation and the law of large numbers
Sampling distributions 290 _
The sampling distribution of x 293
The central limit theorem 295
CHAPTER 12
General Rules of Probability*
CHAPTER 14
Confidence Intervals: The Basics
CHAPTER 15
Tests of Significance: The Basics 369
259
The idea of probability 260
The search for randomness* 262
Probability models 264
Probability rules 266
Finite and discrete probability models
Continuous probability models 271
Random variables 275
Personal probability* 276
331
The reasoning of statistical estimation 352
Margin of error and confidence level 354
Confidence intervals for a population mean 357
How confidence intervals behave 361
Commentary: Data Ethics* 246
Institutional review boards 248
Informed consent 248
Confidentiality 250
Clinical trials 252
Behavioral and social science experiments
v
The binomial setting and binomial distributions 331
Binomial distributions in statistical sampling 333
Binomial probabilities 334
Using technology 336
Binomial mean and standard deviation 338
The Normal approximation to binomial distributions 340
223
Observation versus experiment 223
Subjects, factors, treatments 225
How to experiment badly 228
Randomized comparative experiments 229
The logic of randomized comparative experiments
Cautions about experimentation 234
Matched pairs and other block designs 236
CHAPTER 10
Introducing Probability
D E T A I L E D T A BL E O F CON TE N TS
287
Part II summary 419
Test yourself 423
Supplementary exercises
Pa r t I I I
307
Independence and the multiplication rule
The general addition rule 312
Conditional probability 314
The general multiplication rule 316
Independence again 318
Tree diagrams 318
CHAPTER 17
From Exploration to Inference: Part II Review
431
Inference about
435
Variables
308
417
CHAPTER 18
Inference about a Population Mean
437
Conditions for inference about a mean 437
The t distributions 438
The one-sample t confidence interval 440
FM.indd Page vi 11/18/11 11:53:50 PM user-s163
vi
user-F452
DETA ILED TA B LE O F CO N T E N T S
The one-sample t test 443
Using technology 446
Matched pairs t procedures 449
Robustness of t procedures 452
The chi-square test statistic 560
Cell counts required for the chi-square test 561
Using technology 562
Uses of the chi-square test 567
The chi-square distributions 570
The chi-square test for goodness of fit* 572
CHAPTER 19
Two-Sample Problems 465
CHAPTER 24
Inference for Regression
Two-sample problems 465
Comparing two population means 466
Two-sample t procedures 469
Using technology 474
Robustness again 477
Details of the t approximation* 480
Avoid the pooled two-sample t procedures* 481
Avoid inference about standard deviations* 482
CHAPTER 25
One-Way Analysis of Variance: Comparing Several
Means 623
The sample proportion p̂ 494
Large-sample confidence intervals for a proportion 496
Accurate confidence intervals for a proportion 499
Choosing the sample size 502
Significance tests for a proportion 504
Comparing several means 625
The analysis of variance F test 625
Using technology 628
The idea of analysis of variance 631
Conditions for ANOVA 633
F distributions and degrees of freedom
Some details of ANOVA* 640
515
Two-sample problems: proportions 515
The sampling distribution of a difference between
proportions 516
Large-sample confidence intervals for comparing
proportions 517
Using technology 518
Accurate confidence intervals for comparing proportions
Significance tests for comparing proportions 522
Notes and Data Sources
Tables
520
CHAPTER 22
Inference about Variables: Part III Review 533
Part III summary 536
Test yourself 538
Supplementary exercises
587
Conditions for regression inference 589
Estimating the parameters 590
Using technology 593
Testing the hypothesis of no linear relationship 597
Testing lack of correlation 598
Confidence intervals for the regression slope 600
Inference about prediction 602
Checking the conditions for inference 607
CHAPTER 20
Inference about a Population Proportion 493
CHAPTER 21
Comparing Two Proportions

655
675
TABLE A
TABLE B
TABLE C
TABLE D
TABLE E
Standard Normal probabilities 676
Random digits 678
t distribution critical values 679
Chi-square distribution critical values 680
Critical values of the correlation r 681
Answers to Selected Exercises
Index
545
Inference about
Relationships
CHAPTER 23
Two Categorical Variables: The Chi-Square Test 553
Two-way tables 553
The problem of multiple comparisons 556
Expected counts in two-way tables 558
551
682
733
Pa r t V
Pa r t I V
637
Optional Companion
Chapters
(available on the BPS CD and online)
CHAPTER 26
Nonparametric Tests 26-3
Comparing two samples: the Wilcoxon rank sum test
The Normal approximation for W 26-8
26-4
FM.indd Page vii 11/9/11 3:58:32 PM user-s163
user-F452

Using technology 26-10
What hypotheses does Wilcoxon test? 26-13
Dealing with ties in rank tests 26-14
Matched pairs: the Wilcoxon signed rank test 26-19
The Normal approximation for W ⫹ 26-22
Dealing with ties in the signed rank test 26-24
Comparing several samples: the Kruskal-Wallis test 26-27
Hypotheses and conditions for the Kruskal-Wallis test 26-29
The Kruskal-Wallis test statistic 26-29
CHAPTER 27
Statistical Process Control
27-3
Processes 27-4
Describing processes 27-4
The
_ idea of statistical process control 27-9
x charts for process monitoring 27-10
s charts for process monitoring 27-16
Using control charts 27-23
Setting up control charts 27-25
Comments on statistical control 27-32
Don’t confuse control with capability! 27-34
Control charts for sample proportions 27-36
Control limits for p charts 27-37
D E T A I L E D T A BL E O F CON TE N TS
CHAPTER 28
Multiple Regression* 28-3
Parallel regression lines 28-4
Estimating parameters 28-8
Using technology 28-13
Inference for multiple regression 28-16
Interaction 28-26
The multiple linear regression model 28-32
The woes of regression coefficients 28-39
A case study for multiple regression 28-41
Inference for regression parameters 28-53
Checking the conditions for inference 28-58
CHAPTER 29
More about Analysis of Variance
29-3
Beyond one-way ANOVA 29-3
Follow-up analysis: Tukey pairwise multiple
comparisons 29-8
Follow-up analysis: contrasts* 29-12
Two-way ANOVA: conditions, main effects, and
interaction 29-16
Inference for two-way ANOVA 29-23
Some details of two-way ANOVA* 29-32
vii
FM.indd Page viii 11/9/11 3:58:33 PM user-s163
user-F452
To the Instructor: About this Book
elcome to the sixth edition of The Basic Practice of Statistics. This book
is the cumulation of 40 years of teaching undergraduates and 20 years of
writing texts. Previous editions have been very successful, and we think
that this new edition is the best yet. In this preface we describe for instructors the
nature and features of the book and the changes in this sixth edition.
BPS is designed to be accessible to college and university students with limited
quantitative background—“just algebra” in the sense of being able to read and use
simple equations. It is usable with almost any level of technology for calculating

Purchase answer to see full
attachment




Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.