Homework #3 February I need you to do solve those problems pleas but each one separate file. 1. Use the following data do a stemplot, mean, median, 5# summary, outlier test and standard deviation (round 2 decimal places throughout) 10, 15, 20, 25, 30, 35, 40 .2. Use the following data and find the mean, median, 5# summary, outlier test and standard deviation ( round 2 decial places throughout) 3, 5, 7, 9, 11, 12, 15.3. Use book p.39 data (North Carolina) find mean, median, 5 # summary and outlier test.————————————————————————————————————————————-In class work #3 1. Power Point Ch 2 slide #6 find mean, median, 5# summary, boxplot and do outlier test.2. In book ch 2: Example 2.7 standard deviation (round to 2 decimals).Describing Distributions

with Numbers

e saw in Chapter 1 (page 4) that the American Community

Survey asks, among much else, workers’ travel times to work.

Here are the travel times in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau:1

W

30

20

10

40

25

20

10

60

15

40

5

30

12

10

10

We aren’t surprised that most people estimate their travel time in multiples

of 5 minutes. Here is a stemplot of these data:

0

1

2

3

4

5

6

5

000025

005

00

00

Chapter 2

IN THIS CHAPTER

WE COVER…

■

Measuring center: the mean

■

Measuring center: the median

■

Comparing the mean and the

median

■

Measuring spread: the quartiles

■

The five-number summary and

boxplots

■

Spotting suspected outliers*

■

Measuring spread: the standard

deviation

■

Choosing measures of center and

spread

■

Using technology

■

Organizing a statistical problem

0

The distribution is single-peaked and right-skewed. The longest travel

time (60 minutes) may be an outlier. Our goal in this chapter is to describe

with numbers the center and spread of this and other distributions.

39

Logan Mock-Bunting/Getty Images

40

CHAP TER 2

•

Describing Distributions with Numbers

MEASURING CENTER: The Mean

The most common measure of center is the ordinary arithmetic average, or mean.

THE MEAN x

To find the mean of a set of observations, add their values and divide by the number

of observations. If the n observations are x1, x2, . . . , xn, their mean is

x

x1 x2 . . . xn

n

or, in more compact notation,

x

1

x

na i

The © (capital Greek sigma) in the formula for the mean is short for “add

them all up.’’ The subscripts on the observations xi are just a way of keeping the n

observations distinct. They do not necessarily indicate order or any other special

facts about the data. The bar over the x indicates the mean of all the x-values.

Pronounce the mean x as “x-bar.’’ This notation is very common. When writers

who are discussing data use x or y, they are talking about a mean.

NCTRAVELTIME

E X A M P L E 2 . 1 Travel times to work

The mean travel time of our 15 North Carolina workers is

Don’t hide

the outliers

Data from an

airliner’s control

surfaces, such

as the vertical tail rudder, go to

cockpit instruments and then to

the “black box’’ flight data recorder.

To avoid confusing the pilots, short

erratic movements in the data are

“smoothed’’ so that the instruments

show overall patterns. When a crash

killed 260 people, investigators

suspected a catastrophic movement

of the tail rudder. But the black

box contained only the smoothed

data. Sometimes outliers are more

important than the overall pattern.

resistant measure

x

x1 x2 . . . xn

n

30 20 . . . 10

15

337

22.5 minutes

15

In practice, you can enter the data into your calculator and ask for the mean. You don’t

have to actually add and divide. But you should know that this is what the calculator

is doing.

Notice that only 6 of the 15 travel times are larger than the mean. If we leave out

the longest single travel time, 60 minutes, the mean for the remaining 14 people is

19.8 minutes. That one observation raises the mean by 2.7 minutes. ■

Example 2.1 illustrates an important fact about the mean as a measure of

center: it is sensitive to the influence of a few extreme observations. These may

be outliers, but a skewed distribution that has no outliers will also pull the mean

toward its long tail. Because the mean cannot resist the influence of extreme

observations, we say that it is not a resistant measure of center.

•

Measuring Center: The Median

A P P LY Y O U R K N O W L E D G E

2.1 Pulling wood apart. Example 1.9 (page 21) gives the breaking strength in pounds

of 20 pieces of Douglas fir. Find the mean breaking strength. How many of the

pieces of wood have strengths less than the mean? What feature of the stemplot

(Figure 1.11, page 22) explains the fact that the mean is smaller than most of the

WOOD

observations?

2.2 Health care spending. Table 1.3 (page 23) gives the 2007 health care expendi-

ture per capita in 35 countries with the highest gross domestic product in 2007. The

United States, at $7285 (PPP, international $) per person, is a high outlier. Find the

mean health care spending in these nations with and without the United States.

HEALTHCARE

How much does the one outlier increase the mean?

MEASURING CENTER: The Median

In Chapter 1, we used the midpoint of a distribution as an informal measure of

center. The median is the formal version of the midpoint, with a specific rule for

calculation.

THE MEDIAN M

The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution:

1. Arrange all observations in order of size, from smallest to largest.

2. If the number of observations n is odd, the median M is the center observation in

the ordered list. If the number of observations n is even, the median M is midway

between the two center observations in the ordered list.

3. You can always locate the median in the ordered list of observations by counting

up 1n ⫹ 12/2 observations from the start of the list.

Note that the formula 1n ⫹ 12/2 does not give the median, just the location of the

median in the ordered list. Medians require little arithmetic, so they are easy to find

by hand for small sets of data. Arranging even a moderate number of observations

in order is very tedious, however, so that finding the median by hand for larger sets

of data is unpleasant. Even simple calculators have an x button, but you will need

to use software or a graphing calculator to automate finding the median.

E X A M P L E 2 . 2 Finding the median: odd n

What is the median travel time for our 15 North Carolina workers? Here are the data

arranged in order:

5

10

10

10

10

12

15

20

20

25

30

30

40

40

60

The count of observations n ⫽ 15 is odd. The bold 20 is the center observation in the

ordered list, with 7 observations to its left and 7 to its right. This is the median, M ⫽

20 minutes.

41

42

CHAP TER 2

•

Describing Distributions with Numbers

Because n 15, our rule for the location of the median gives

location of M

n 1 16

8

2

2

That is, the median is the 8th observation in the ordered list. It is faster to use this rule

than to locate the center by eye. ■

Mitchell Funk/Getty Images

E X A M P L E 2 . 3 Finding the median: even n

Travel times to work in New York State are (on the average) longer than in North

Carolina. Here are the travel times in minutes of 20 randomly chosen New York workers:

10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

A stemplot not only displays the distribution but makes finding the median easy

because it arranges the observations in order:

0

1

NYTRAVELTIME

3

4

5

6

7

8

5

005555

00

00

005

005

5

The distribution is single-peaked and right-skewed, with several travel times of an hour

or more. There is no center observation, but there is a center pair. These are the bold

20 and 25 in the stemplot, which have 9 observations before them in the ordered list

and 9 after them. The median is midway between these two observations:

M

20 25

22.5 minutes

2

With n 20, the rule for locating the median in the list gives

location of M

n 1 21

10.5

2

2

The location 10.5 means “halfway between the 10th and 11th observations in the

ordered list.’’ That agrees with what we found by eye. ■

COMPARING THE MEAN AND THE MEDIAN

Examples 2.1 and 2.2 illustrate an important difference between the mean and

the median. The median travel time (the midpoint of the distribution) is 20 minutes. The mean travel time is higher, 22.5 minutes. The mean is pulled toward

the right tail of this right-skewed distribution. The median, unlike the mean, is

resistant. If the longest travel time were 600 minutes rather than 60 minutes, the

•

Measuring Spread: The Quartiles

43

mean would increase to more than 58 minutes but the median would not change

at all. The outlier just counts as one observation above the center, no matter how

far above the center it lies. The mean uses the actual value of each observation

and so will chase a single large observation upward. The Mean and Median applet

is an excellent way to compare the resistance of M and x.

COMPARING THE MEAN AND THE MEDIAN

The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed

distribution, the mean is usually farther out in the long tail than is the median.2

Many economic variables have distributions that are skewed to the right. For

example, the median endowment of colleges and universities in the United States

and Canada in 2009 was about $67 million—but the mean endowment was almost

$371 million. Most institutions have modest endowments, but a few are very

wealthy. Harvard’s endowment was over $35 billion.3 The few wealthy institutions

pull the mean up but do not affect the median. Reports about incomes and other

strongly skewed distributions usually give the median (“midpoint’’) rather than the

mean (“arithmetic average’’). However, a county that is about to impose a tax of 1%

on the incomes of its residents cares about the mean income, not the median. The

tax revenue will be 1% of total income, and the total is the mean times the number

of residents. The mean and median measure center in different ways, and

both are useful. Don’t confuse the “average” value of a variable (the mean) with

its “typical” value, which we might describe by the median.

A P P LY Y O U R K N O W L E D G E

2.3 New York travel times. Find the mean of the travel times to work for the 20 New

2.4 New-house prices. The mean and median sales prices of new homes sold in the

United States in November 2010 were $213,000 and $268,700.4 Which of these

numbers is the mean and which is the median? Explain how you know.

2.5 Carbon dioxide emissions. Table 1.6 (page 33) gives the 2007 carbon dioxide

(CO2) emissions per person for countries with populations of at least 30 million. Find

the mean and the median for these data. Make a histogram of the data. What features

CO2EMISSIONS

of the distribution explain why the mean is larger than the median?

MEASURING SPREAD: The Quartiles

The mean and median provide two different measures of the center of a distribution. But a measure of center alone can be misleading. The Census Bureau reports

that in 2009 the median income of American households was $49,777. Half of all

Jose Antonio Sancho/Photolibrary

York workers in Example 2.3. Compare the mean and median for these data. What

NYTRAVELTIME

general fact does your comparison illustrate?

c02DescribingDistributionsWithNu44 Page 44 8/17/11 5:40:58 PM user-s163

44

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

households had incomes below $49,777, and half had higher incomes. The mean

was much higher, $67,976, because the distribution of incomes is skewed to the

right. But the median and mean don’t tell the whole story. The bottom 10% of

households had incomes less than $12,120, and households in the top 5% took

in more than $180,001.5 We are interested in the spread or variability of

incomes as well as their center. The simplest useful numerical description of

a distribution requires both a measure of center and a measure of spread.

One way to measure spread is to give the smallest and largest observations. For

example, the travel times of our 15 North Carolina workers range from 5 minutes

to 60 minutes. These single observations show the full spread of the data, but

they may be outliers. We can improve our description of spread by also looking at

the spread of the middle half of the data. The quartiles mark out the middle half.

Count up the ordered list of observations, starting from the smallest. The first

quartile lies one-quarter of the way up the list. The third quartile lies three-quarters

of the way up the list. In other words, the first quartile is larger than 25% of the

observations, and the third quartile is larger than 75% of the observations. The

second quartile is the median, which is larger than 50% of the observations. That

is the idea of quartiles. We need a rule to make the idea exact. The rule for calculating the quartiles uses the rule for the median.

THE QUARTILES Q1 AND Q3

To calculate the quartiles:

1. Arrange the observations in increasing order and locate the median M in the

ordered list of observations.

2. The first quartile Q1 is the median of the observations whose position in the

ordered list is to the left of the location of the overall median.

3. The third quartile Q3 is the median of the observations whose position in the

ordered list is to the right of the location of the overall median.

Here are examples that show how the rules for the quartiles work for both odd

and even numbers of observations.

E X A M P L E 2 . 4 Finding the quartiles: odd n

Our North Carolina sample of 15 workers’ travel times, arranged in increasing order, is

5

10

10

10

10

12

15

20

20

25

30

30

40

40

60

There is an odd number of observations, so the median is the middle one, the bold

20 in the list. The first quartile is the median of the 7 observations to the left of the

median. This is the 4th of these 7 observations, so Q1 10 minutes. If you want, you

can use the rule for the location of the median with n 7:

location of Q1

n 1 7 1

4

2

2

•

The Five-Number Summary and Boxplots

The third quartile is the median of the 7 observations to the right of the median, Q3

30 minutes. When there is an odd number of observations, leave out the overall

median when you locate the quartiles in the ordered list.

The quartiles are resistant because they are not affected by a few extreme observations. For example, Q3 would still be 30 if the outlier were 600 rather than 60. ■

E X A M P L E 2 . 5 Finding the quartiles: even n

Here are the travel times to work of the 20 New York workers from Example 2.3,

arranged in increasing order:

5 10 10 15 15 15 15 20 20 20 | 25 30 30 40 40 45 60 60 65 85

There is an even number of observations, so the median lies midway between the

middle pair, the 10th and 11th in the list. Its value is M 22.5 minutes. We have

marked the location of the median by |. The first quartile is the median of the first

10 observations, because these are the observations to the left of the location of the

median. Check that Q1 15 minutes and Q3 42.5 minutes. When the number of

observations is even, include all the observations when you locate the quartiles. ■

Be careful when, as in these examples, several observations take the same

numerical value. Write down all of the observations, arrange them in order, and

apply the rules just as if they all had distinct values.

THE FIVE-NUMBER SUMMARY AND BOXPLOTS

The smallest and largest observations tell us little about the distribution as a

whole, but they give information about the tails of the distribution that is missing

if we know only the median and the quartiles. To get a quick summary of both

center and spread, combine all five numbers.

THE FIVE-NUMBER SUMMARY

The five-number summary of a distribution consists of the smallest observation, the

first quartile, the median, the third quartile, and the largest observation, written in

order from smallest to largest. In symbols, the five-number summary is

Minimum Q1

M

Q3 Maximum

These five numbers offer a reasonably complete description of center and

spread. The five-number summaries of travel times to work from Examples 2.4

and 2.5 are

North Carolina

5

10

20

30

60

New York

5

15

22.5

42.5

85

45

c02DescribingDistributionsWithNu46 Page 46 8/17/11 5:40:58 PM user-s163

CHAP TER 2

•

Describing Distributions with Numbers

90

Maximum = 85

80

Travel time to work (minutes)

46

user-F452

70

60

Third quartile = 42.5

50

40

Median = 22.5

30

20

10

First quartile = 15

0

Minimum = 5

North Carolina

New York

F IGURE 2.1

Boxplots comparing the travel times to work of samples of workers in North Carolina and New York.

The five-number summary of a distribution leads to a new graph, the boxplot.

Figure 2.1 shows boxplots comparing travel times to work in North Carolina

and New York.

BOXPLOT

A boxplot is a graph of the five-number summary.

■ A central box spans the quartiles Q1 and Q3.

■ A line in the box marks the median M.

■ Lines extend from the box out to the smallest and largest observations.

Because boxplots show less detail than histograms or stemplots, they are best

used for side-by-side comparison of more than one distribution, as in Figure 2.1.

Be sure to include a numerical scale in the graph. When you look at a boxplot,

first locate the median, which marks the center of the distribution. Then look

at the spread. The span of the central box shows the spread of the middle half

of the data, and the extremes (the smallest and largest observations) show the

spread of the entire data set. We see from Figure 2.1 that travel times to work are

in general a bit longer in New York than in North Carolina. The median, both

c02DescribingDistributionsWithNu47 Page 47 8/17/11 5:40:58 PM user-s163

user-F452

•

The Five-Number Summary and Boxplots

47

quartiles, and the maximum are all larger in New York. New York travel times are

also more variable, as shown by the span of the box and the spread between the

extremes. Note that the boxes with arrows in Figure 2.1 that indicate the location

of the five-number summary are not part of the boxplot, but are included purely

for illustration.

Finally, the New York data are more strongly right-skewed. In a symmetric distribution, the first and third quartiles are equally distant from the median. In most

distributions that are skewed to the right, on the other hand, the third quartile

will be farther above the median than the first quartile is below it. The extremes

behave the same way, but remember that they are just single observations and

may say little about the distribution as a whole.

A P P LY Y O U R K N O W L E D G E

2.6 The Pittsburgh Steelers. The 2010 roster of the Pittsburgh Steelers professional

football team included 7 defensive linemen and 9 offensive linemen. The weights in

STEELERS

pounds of the defensive linemen were

305

325

305

300

285

280

298

315

304

319

and the weights of the offensive linemen were

338

324

325

304

344

318

2.7 Fuel economy for midsize cars. The Department of Energy provides fuel

economy ratings for all cars and light trucks sold in the United States. Here are the

estimated miles per gallon for city driving for the 129 cars classified as midsize in

MIDSIZECARS

2010, arranged in increasing order:6

9

15

16

17

18

19

21

22

26

10

15

16

17

18

19

22

22

26

10

16

16

17

18

19

22

22

26

11

16

16

17

18

19

22

23

28

11

16

17

18

18

19

22

23

33

11

16

17

18

18

19

22

23

35

12

16

17

18

18

19

22

23

41

13

16

17

18

18

19

22

24

41

14

16

17

18

18

20

22

24

51

14

16

17

18

18

20

22

24

15

16

17

18

18

20

22

25

15

16

17

18

18

21

22

26

15

16

17

18

19

21

22

26

15

16

17

18

19

21

22

26

15

16

17

18

19

21

22

26

(a) Give the five-number summary of this distribution.

(b) Draw a boxplot of these data. What is the shape of the distribution shown by

the boxplot? Which features of the boxplot led you to this conclusion? Are any

observations unusually small or large?

AP Photo/Greg Trott

(a) Make a stemplot of the weights of the defensive linemen and find the fivenumber summary.

(b) Make a stemplot of the weights of the offensive linemen and find the fivenumber summary.

(c) Does either group contain one or more clear outliers? Which group of players

tends to be heavier?

48

CHAP TER 2

•

How much is

that house

worth?

The town of

Manhattan,

Kansas, is sometimes called “the

Little Apple’’ to distinguish it

from that other Manhattan, “the

Big Apple.’’ A few years ago,

a house there appeared in the

county appraiser’s records valued at

$200,059,000. That would be quite

a house even on Manhattan Island.

As you might guess, the entry was

wrong: the true value was $59,500.

But before the error was discovered,

the county, the city, and the school

board had based their budgets on

the total appraised value of real

estate, which the one outlier jacked

up by 6.5%. It can pay to spot

outliers before you trust your data.

Describing Distributions with Numbers

SPOTTING SUSPECTED OUTLIERS*

Look again at the stemplot of travel times to work in New York in Example 2.3.

The five-number summary for this distribution is

5

15

22.5

42.5

85

How shall we describe the spread of this distribution? The smallest and largest

observations are extremes that don’t describe the spread of the majority of the

data. The distance between the quartiles (the range of the center half of the

data) is a more resistant measure of spread. This distance is called the interquartile

range.

THE INTERQUARTILE RANGE IQR

The interquartile range IQR is the distance between the first and third quartiles,

IQR Q3 Q1

For our data on New York travel times, IQR 42.5 15 27.5 minutes. However, no single numerical measure of spread, such as IQR, is very useful for describing skewed distributions. The two sides of a skewed distribution

have different spreads, so one number can’t summarize them. That’s why we give

the full five-number summary. The interquartile range is mainly used as the basis

for a rule of thumb for identifying suspected outliers. In some software, suspected

outliers are identified in a boxplot with a special plotting symbol such as *.

THE 1.5 ⴛ IQR RULE FOR OUTLIERS

Call an observation a suspected outlier if it falls more than 1.5 IQR above the third

quartile or below the first quartile.

E X A M P L E 2 . 6 Using the 1.5 ⴛ IQR rule

For the New York travel time data, IQR 27.5 and

1.5 IQR 1.5 27.5 41.25

Any values not falling between

Q1 11.5 IQR2 15.0 41.25 26.25

Q3 11.5 IQR2 42.5 41.25 83.75

and

are flagged as suspected outliers. Look again at the stemplot in Example 2.3: the only

suspected outlier is the longest travel time, 85 minutes. The 1.5 IQR rule suggests

that the three next-longest travel times (60 and 65 minutes) are just part of the long

right tail of this skewed distribution. ■

*This short section is optional.

•

Measuring Spread: The Standard Deviation

The 1.5 IQR rule is not a replacement for looking at the data. It is most

useful when large volumes of data are scanned automatically.

A P P LY Y O U R K N O W L E D G E

2.8

Travel time to work. In Example 2.1, we noted the influence of one long travel

time of 60 minutes in our sample of 15 North Carolina workers. Does the 1.5 IQR

rule identify this travel time as a suspected outlier?

2.9

Fuel economy for midsize cars. Exercise 2.7 gives the estimated miles per

gallon (mpg) for city driving for the 129 cars classified as midsize in 2010. In that

exercise we noted that several of the mpg values were unusually large. Which of

these are suspected outliers by the 1.5 IQR rule? While outliers can be produced

by errors or incorrectly recorded observations, they are often observations that differ from the others in some particular way. In this case, the cars producing the high

MIDSIZECARS

outliers share a common feature. What do you think that is?

MEASURING SPREAD: The Standard Deviation

The five-number summary is not the most common numerical description of a

distribution. That distinction belongs to the combination of the mean to measure

center and the standard deviation to measure spread. The standard deviation and

its close relative, the variance, measure spread by looking at how far the observations are from their mean.

THE STANDARD DEVIATION s

The variance s2 of a set of observations is an average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations

x1, x2, . . ., xn is

s2

1×1 x2 2 1×2 x2 2 . . . 1xn x2 2

n 1

or, more compactly,

s2

1

1xi x2 2

n 1 a

The standard deviation s is the square root of the variance s2:

s

1

1xi x2 2

Bn 1 a

In practice, use software or your calculator to obtain the standard deviation

from keyed-in data. Doing an example step-by-step will help you understand how

the variance and standard deviation work, however.

49

c02DescribingDistributionsWithNu50 Page 50 8/17/11 5:41:09 PM user-s163

50

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

E X A M P L E 2 . 7 Calculating the standard deviation

SATCR

Georgia Southern University had 2417 students with regular admission in their freshman class of 2010. For each student, data are available on their SAT and ACT scores

(if taken), high school GPA, and the college within the university to which they were

admitted.7 In Exercise 3.49, the full data set for the SAT Critical Reading scores will

be examined. Here are the first five observations from that data set:

650

490

580

450

570

We will compute x and s for these students. First find the mean:

650 490 580 450 570

5

2740

548

5

x

Figure 2.2 displays the data as points above the number line, with their mean marked

by an asterisk (*). The arrows mark two of the deviations from the mean. The deviations show how spread out the data are about their mean. They are the starting point

for calculating the variance and the standard deviation.

Observations

xi

Deviations

x i ⴚ x–

Squared deviations

(x i ⴚ x– ) 2

650

650 548 102

1022 10,404

490

490 548 58

( 58)2 3,364

580

580 548

32

322 1,024

450

450 548 98

( 98)2 9,604

570

570 548

22

sum

0

222

484

sum 24,880

The variance is the sum of the squared deviations divided by one less than the number

of observations:

s2

1

24,880

1xi x 2 2

6220

n 1 a

4

The standard deviation is the square root of the variance:

s 26220 78.87 ■

degrees of freedom

Notice that the “average’’ in the variance s2 divides the sum by one fewer than

the number of observations, that is, n 1 rather than n. The reason is that the

deviations xi x always sum to exactly 0, so that knowing n 1 of them determines the last one. Only n 1 of the squared deviations can vary freely, and we

average by dividing the total by n 1. The number n 1 is called the degrees

of freedom of the variance or standard deviation. Some calculators offer a choice

between dividing by n and dividing by n 1, so be sure to use n 1.

•

x = 490

x = 650

SAT Critical Reading scores for five

students, with their mean (*) and the

deviations of two observations from

the mean shown, for Example 2.7.

deviation = 102

700

650

570

580

548

490

500

450

400

*

SAT Critical Reading Score

More important than the details of hand calculation are the properties that

determine the usefulness of the standard deviation:

■

■

51

FIGURE 2 . 2

–x = 548

deviation = -58

Choosing Measures of Center and Spread

s measures spread about the mean and should be used only when the mean is

chosen as the measure of center.

s is always zero or greater than zero. s 0 only when there is no spread. This

happens only when all observations have the same value. Otherwise, s 0.

As the observations become more spread out about their mean, s gets larger.

■

s has the same units of measurement as the original observations. For example, if

you measure weight in kilograms, both the mean x and the standard deviation

s are also in kilograms. This is one reason to prefer s to the variance s2, which

would be in squared kilograms.

■

Like the mean x, s is not resistant. A few outliers can make s very large.

The use of squared deviations renders s even more sensitive than x to a few

extreme observations. For example, the standard deviation of the travel

times for the 15 North Carolina workers in Example 2.1 is 15.23 minutes.

(Use your calculator or software to verify this.) If we omit the high outlier, the

standard deviation drops to 11.56 minutes.

If you feel that the importance of the standard deviation is not yet clear, you

are right. We will see in Chapter 3 that the standard deviation is the natural measure of spread for a very important class of symmetric distributions, the Normal

distributions. The usefulness of many statistical procedures is tied to distributions

of particular shapes. This is certainly true of the standard deviation.

CHOOSING MEASURES OF CENTER AND SPREAD

We now have a choice between two descriptions of the center and spread

of a distribution: the five-number summary, or x and s. Because x and s are

sensitive to extreme observations, they can be misleading when a distribution

is strongly skewed or has outliers. In fact, because the two sides of a skewed

distribution have different spreads, no single number describes the spread well.

The five-number summary, with its two quartiles and two extremes, does a

better job.

c02DescribingDistributionsWithNu52 Page 52 8/17/11 5:41:09 PM user-s163

52

CHAP TER 2

•

Describing Distributions with Numbers

CHOOSING A SUMMARY

The five-number summary is usually better than the mean and standard deviation for

describing a skewed distribution or a distribution with strong outliers. Use x and s

only for reasonably symmetric distributions that are free of outliers.

Outliers can greatly affect the values of the mean x and the standard deviation

s, the most common measures of center and spread. Many more elaborate statistical procedures also can’t be trusted when outliers are present. Whenever

you find outliers in your data, try to find an explanation for them. Sometimes

the explanation is as simple as a typing error, such as typing 10.1 as 101.

Sometimes a measuring device broke down or a subject gave a frivolous response,

like the student in a class survey who claimed to study 30,000 minutes per night.

(Yes, that really happened.) In all these cases, you can simply remove the outlier

from your data. When outliers are “real data,’’ like the long travel times of some

New York workers, you should choose statistical methods that are not greatly

disturbed by the outliers. For example, use the five-number summary rather than

x and s to describe a distribution with extreme outliers. We will meet other

examples later in the book.

Remember that a graph gives the best overall picture of a distribution. If

data have been entered into a calculator or statistical program, it is very

simple and quick to create several graphs to see all the different features

of a distribution. Numerical measures of center and spread report specific facts

about a distribution, but they do not describe its entire shape. Numerical summaries do not disclose the presence of multiple peaks or clusters, for example.

Exercise 2.11 shows how misleading numerical summaries can be. Always plot

your data.

A P P LY Y O U R K N O W L E D G E

2.10 x and s by hand. Radon is a naturally occurring gas and is the second leading

T. Jacobs/Custom Medical Stock Photo/Newscom

cause of lung cancer in the United States.8 It comes from the natural breakdown

of uranium in the soil and enters buildings through cracks and other holes in the

foundations. Found throughout the United States, levels vary considerably from

state to state. There are several methods to reduce the levels of radon in your

home, and the Environmental Protection Agency recommends using one of these if

the measured level in your home is above 4 picocuries per liter. Four readings from

Franklin County, Ohio, where the county average is 9.32 picocuries per liter, were

5.2, 13.8, 8.6, and 16.8.

(a) Find the mean step-by-step. That is, find the sum of the 4 observations and

divide by 4.

(b) Find the standard deviation step-by-step. That is, find the deviation of each

observation from the mean, square the deviations, then obtain the variance

and the standard deviation. Example 2.7 shows the method.

user-F452

•

(c) Now enter the data into your calculator and use the mean and standard

deviation buttons to obtain x and s. Do the results agree with your hand

calculations?

2.11 x and s are not enough. The mean x and standard deviation s measure

center and spread but are not a complete description of a distribution. Data

sets with different shapes can have the same mean and standard deviation.

To demonstrate this fact, use your calculator to find x and s for these two small

data sets. Then make a stemplot of each and comment on the shape of each

distribution.

2DATASETS

Data A

9.14

8.14

8.74

8.77

9.26

8.10

6.13

3.10

9.13

7.26

4.74

Data B

6.58

5.76

7.71

8.84

8.47

7.04

5.25

5.56

7.91

6.89

12.50

2.12 Choose a summary. The shape of a distribution is a rough guide to whether the

mean and standard deviation are a helpful summary of center and spread. For which

of the following distributions would x and s be useful? In each case, give a reason for

your decision.

(a) Percents of high school graduates in the states taking the SAT, Figure 1.8

(page 18)

(b) Iowa Test scores, Figure 1.7 (page 17)

(c) New York travel times, Figure 2.1 (page 46)

USING TECHNOLOGY

Although a calculator with “two-variable statistics’’ functions will do the basic

calculations we need, more elaborate tools are helpful. Graphing calculators and

computer software will do calculations and make graphs as you command, freeing

you to concentrate on choosing the right methods and interpreting your results.

Figure 2.3 displays output describing the travel times to work of 20 people in

New York State (Example 2.3). Can you find x, s, and the five-number summary

in each output? The big message of this section is: once you know what to look for,

you can read output from any technological tool.

The displays in Figure 2.3 come from a Texas Instruments graphing calculator, the Minitab and CrunchIt! statistical programs, and the Microsoft

Excel spreadsheet program. Minitab allows you to choose what descriptive

measures you want, while the descriptive measures in the CrunchIt! output

are provided by default. Excel and the calculator give some things we don’t

need. Just ignore the extras. Excel’s “Descriptive Statistics’’ menu item

doesn’t give the quartiles. We used the spreadsheet’s separate quartile function to get Q1 and Q3.

Using Technology

53

c02DescribingDistributionsWithNu54 Page 54 11/15/11 5:03:54 PM user-s163

user-F452

Texas Instruments Graphing Calculator

Minitab

Descriptive Statistics: NYtime

Total

variable Count Mean

NYtime

20 31.25

StDev

21.88

Variance

478.62

Minimum

Q1

5.00 15.00

Median

22.50

Q3

43.75

Maximum

85.00

CrunchIt!

Export

NYtime

n

20

Sample Mean

31.25

Median

22.50

Standard Deviation

21.88

Max

85

Min

5

Q1

15

Q3

43.75

Microsoft Excel

A

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

B

C

D

QUARTILE(A2:A21,1)

QUARTILE(A2:A21,3)

15

42.5

minutes

Mean

Standard Error

Median

Mode

31.25

4.891924064

22.5

15

Standard Deviation 21.8773495

Sample Variance

478.6184211

Kurtosis

0.329884126

Skewness

1.040110836

Range

80

Minimum

5

85

Maximum

625

Sum

20

Count

Sheet4

Sheet1

Sheet2

Sheet

F IGURE 2.3

Output from a graphing calculator, two statistical software packages, and a spreadsheet program

describing the data on travel times to work in New York State.

54

•

Organizing a Statistical Problem

E X A M P L E 2 . 8 What is the third quartile?

In Example 2.5, we saw that the quartiles of the New York travel times are Q1 15

and Q3 42.5. Look at the output displays in Figure 2.3. The calculator and Excel

agree with our work. Minitab and CrunchIt! say that Q3 43.75. What

happened? There are several rules for finding the quartiles. Some calculators and

software use rules that give results different from ours for some sets of data. This is

true of Minitab, CrunchIt!, and also Excel, though Excel agrees with our work in this

example. Results from the various rules are always close to each other, so the differences

are never important in practice. Our rule is the simplest for hand calculation. ■

ORGANIZING A STATISTICAL PROBLEM

Most of our examples and exercises have aimed to help you learn basic tools

(graphs and calculations) for describing and comparing distributions. You have

also learned principles that guide use of these tools, such as “start with a graph’’ and

“look for the overall pattern and striking deviations from the pattern.’’ The data

you work with are not just numbers—they describe specific settings such as water

depth in the Everglades or travel time to work. Because data come from a specific

setting, the final step in examining data is a conclusion for that setting. Water depth

in the Everglades has a yearly cycle that reflects Florida’s wet and dry seasons.

Travel times to work are generally longer in New York than in North Carolina.

As you learn more statistical tools and principles, you will face more complex

statistical problems. Although no framework accommodates all the varied issues

that arise in applying statistics to real settings, the following four-step thought

process gives useful guidance. In particular, the first and last steps emphasize that

statistical problems are tied to specific real-world settings and therefore involve

more than doing calculations and making graphs.

ORGANIZING A STATISTICAL PROBLEM:

A Four-Step Process

STATE: What is the practical question, in the context of the real-world setting?

PLAN: What specific statistical operations does this problem call for?

SOLVE: Make the graphs and carry out the calculations needed for this problem.

CONCLUDE: Give your practical conclusion in the setting of the real-world

problem.

To help you master the basics, many exercises will continue to tell you what to

do—make a histogram, find the five-number summary, and so on. Real statistical

problems don’t come with detailed instructions. From now on, especially in the

later chapters of the book, you will meet some exercises that are more realistic.

Use the four-step process as a guide to solving and reporting these problems. They

are marked with the four-step icon, as the following example illustrates.

55

c02DescribingDistributionsWithNu56 Page 56 8/17/11 5:41:20 PM user-s163

56

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

E X A M P L E 2 . 9 Comparing tropical flowers

STATE: Ethan Temeles of Amherst College, with his colleague W. John Kress, studied

the relationship between varieties of the tropical flower Heliconia on the island of

Dominica and the different species of hummingbirds that fertilize the flowers.9 Over

time, the researchers believe, the lengths of the flowers and the forms of the hummingbirds’ beaks have evolved to match each other. If that is true, flower varieties fertilized

by different hummingbird species should have distinct distributions of length.

Table 2.1 gives length measurements (in millimeters) for samples of three varieties

of Heliconia, each fertilized by a different species of hummingbird. Do the three varieties display distinct distributions of length? How do the mean lengths compare?

PLAN: Use graphs and numerical descriptions to describe and compare these three

distributions of flower length.

Art Wolfe/Getty Images

SOLVE: We might use boxplots to compare the distributions, but stemplots preserve

more detail and work well for data sets of these sizes. Figure 2.4 displays stemplots with

the stems lined up for easy comparison. The lengths have been rounded to the nearest

tenth of a millimeter. The bihai and red varieties have somewhat skewed distributions,

so we might choose to compare the five-number summaries. But because the researchers plan to use x and s for further analysis, we instead calculate these measures:

TROPICALFLOWER

Variety

Mean length

Standard deviation

bihai

red

yellow

47.60

39.71

36.18

1.213

1.799

0.975

CONCLUDE: The three varieties differ so much in flower length that there is little overlap among them. In particular, the flowers of bihai are longer than either red or yellow.

The mean lengths are 47.6 mm for H. bihai, 39.7 mm for H. caribaea red, and 36.2 mm

for H. caribaea yellow. ■

TABLE 2.1

Flower lengths (millimeters) for three Heliconia varieties

H. BIHAI

47.12

48.07

46.75

48.34

46.81

48.15

47.12

50.26

46.67

50.12

47.43

46.34

46.44

46.94

46.64

48.36

41.69

37.40

37.78

39.78

38.20

38.01

40.57

38.07

35.45

34.57

38.13

34.63

37.10

H. CARIBAEA RED

41.90

39.63

38.10

42.01

42.18

37.97

41.93

40.66

38.79

43.09

37.87

38.23

41.47

39.16

38.87

H. CARIBAEA YELLOW

36.78

35.17

37.02

36.82

36.52

36.66

36.11

35.68

36.03

36.03

c02DescribingDistributionsWithNu57 Page 57 8/17/11 5:41:21 PM user-s163

user-F452

•

bihai

34

35

36

37

38

39

40

41

42

43

44

45

46 3 4 6 7 8 8 9

47 1 1 4

48 1 2 3 4

49

50 1 3

red

34

35

36

37 4 8 9

38 0 0 1 1 2 2 8 9

39 2 6 8

40 6 7

5 799

41

42 0 2

43 1

44

45

46

47

48

49

50

Organizing a Statistical Problem

yellow

34 6 6

35 2 5 7

36 0 0 1 5 7 8 8

37 0 1

38 1

39

40

41

42

43

44

45

46

47

48

49

50

57

FIGU R E 2 . 4

Stemplots comparing the distributions of flower lengths from Table 2.1,

for Example 2.9. The stems are whole

millimeters and the leaves are tenths

of a millimeter.

A P P LY Y O U R K N O W L E D G E

2.13 Logging in the rain forest. “Conservationists have despaired over destruction of

tropical rain forest by logging, clearing, and burning.’’ These words begin a report on

a statistical study of the effects of logging in Borneo.10 Charles Cannon of Duke

University and his coworkers compared forest plots that had never been logged

(Group 1) with similar plots nearby that had been logged 1 year earlier (Group 2)

and 8 years earlier (Group 3). All plots were 0.1 hectare in area. Here are the counts

LOGGING

of trees for plots in each group:

Group 1

27

22

29

21

19

33

16

20

24

27

28

19

Group 2

12

12

15

9

20

18

17

14

14

2

17

19

Group 3

18

4

22

15

18

19

22

12

12

To what extent has logging affected the count of trees? Follow the four-step process

in reporting your work.

2.14 Diplomatic scofflaws. Until Congress allowed some enforcement in 2002, the

© James Leynse/CORBIS

thousands of foreign diplomats in New York City could freely violate parking

laws. Two economists looked at the number of unpaid parking tickets per diplomat

over a five-year period ending when enforcement reduced the problem.11 They

concluded that large numbers of unpaid tickets indicated a “culture of corruption’’

in a country and lined up well with more elaborate measures of corruption. The

data set for 145 countries is too large to print here, but look at the data file on

the text Web site and CD. The first 32 countries in the list (Australia to Trinidad

and Tobago) are classified by the World Bank as “developed.’’ The remaining

countries (Albania to Zimbabwe) are “developing.’’ The World Bank classification

is based only on national income and does not take into account measures of social

SCOFFLAWS

development.

c02DescribingDistributionsWithNu58 Page 58 8/17/11 5:41:21 PM user-s163

58

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

Give a full description of the distribution of unpaid tickets for both groups of countries and identify any high outliers. Compare the two groups. Does national income

alone do a good job of distinguishing countries whose diplomats do and do not obey

parking laws?

CHAPTER 2

S U M M A RY

CHAPTER SPECIFICS

■

A numerical summary of a distribution should report at least its center and its spread

or variability.

■

The mean x and the median M describe the center of a distribution in different ways.

The mean is the arithmetic average of the observations, and the median is the midpoint of the values.

■

When you use the median to indicate the center of the distribution, describe its spread

by giving the quartiles. The first quartile Q1 has one-fourth of the observations

below it, and the third quartile Q3 has three-fourths of the observations below it.

■

The five-number summary consisting of the median, the quartiles, and the smallest

and largest individual observations provides a quick overall description of a distribution.

The median describes the center, and the quartiles and extremes show the spread.

■

Boxplots based on the five-number summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the

distribution. The median is marked within the box. Lines extend from the box to the

extremes and show the full spread of the data.

■

The variance s2 and especially its square root, the standard deviation s, are common

measures of spread about the mean as center. The standard deviation s is zero when

there is no spread and gets larger as the spread increases.

■

A resistant measure of any aspect of a distribution is relatively unaffected by changes

in the numerical value of a small proportion of the total number of observations, no

matter how large these changes are. The median and quartiles are resistant, but the

mean and the standard deviation are not.

■

The mean and standard deviation are good descriptions for symmetric distributions

without outliers. They are most useful for the Normal distributions introduced in the

next chapter. The five-number summary is a better description for skewed distributions.

■

Numerical summaries do not fully describe the shape of a distribution. Always plot

your data.

■

A statistical problem has a real-world setting. You can organize many problems using

the following four steps: state, plan, solve, and conclude.

LINK IT

In this chapter we have continued our study of exploratory data analysis. Graphs are

an important visual tool for organizing and identifying patterns in data. They give a

fairly complete description of a distribution, although for many problems the important

c02DescribingDistributionsWithNu59 Page 59 8/17/11 5:41:22 PM user-s163

user-F452

•

Check Your Skills

59

information in your data can be described by a few numbers. These numerical summaries

can be useful for describing a single distribution as well as for comparing the distributions

from several groups of observations.

Two important features of a distribution are the center and the spread. For distributions that are approximately symmetric without outliers, the mean and standard deviation

are important numeric summaries for describing and comparing distributions. But if the

distribution is not symmetric and/or has outliers, the five-number summary often provides

a better description.

The boxplot gives a picture of the five-number summary that is useful for a simple

comparison of several distributions. Remember that the boxplot is based only on the fivenumber summary and does not have any information beyond these five numbers. Certain

features of a distribution that are revealed in histograms and stemplots will not be evident

from a boxplot alone. These include gaps in the data and the presence of several peaks.

You must be careful when reducing a distribution to a few numbers to make sure that

important information has not been lost in the process.

CHECK YOUR SKILLS

2.15 The respiratory system can be a limiting factor in

maximal exercise performance. Researchers from the United

Kingdom studied the effect of two breathing frequencies on

both performance times and several physiological parameters

in swimming.12 Subjects were 10 male collegiate swimmers.

Here are their times in seconds to swim 200 meters at 90% of

race pace when breathing every second stroke in front-crawl

swimming:

SWIMTIMES

151.6

173.2

165.1

177.6

159.2

174.3

163.5

164.1

174.8

171.4

The mean of these data is

(a) 165.10.

(b) 167.48.

(c) 168.25.

2.16 The median of the data in Exercise 2.15 is

(a) 167.48.

(b) 168.25.

(c) 174.00.

2.17 The five-number summary of the data in Exercise 2.15

is

(a) 151.6, 159.2, 167.48, 174.8, 177.6.

(b) 151.6, 163.5, 168.25, 174.3, 177.6.

(c) 151.6, 159.2, 168.25, 174.8, 177.6.

2.18 If a distribution is skewed to the right,

(a) the mean is less than the median.

(b) the mean and median are equal.

(c) the mean is greater than the median.

2.19 What percent of the observations in a distribution lie

between the first quartile and the third quartile?

(a) 25%

(b) 50%

(c) 75%

2.20 To make a boxplot of a distribution, you must know

(a) all of the individual observations.

(b) the mean and the standard deviation.

(c) the five-number summary.

2.21 The standard deviation of the 10 swim times in Exercise

2.15 (use your calculator) is about

(a) 7.4.

(b) 7.8.

(c) 8.2.

2.22 What are all the values that a standard deviation s can

possibly take?

(a) 0 s

(b) 0 s 1

(c) 1 s 1

2.23 The correct units for the standard deviation in Exercise

2.21 are

(a) no units—it’s just a number.

(b) seconds.

(c) seconds squared.

2.24 Which of the following is least affected if an extreme

high outlier is added to your data?

(a) The median

(b) The mean

(c) The standard deviation

c02DescribingDistributionsWithNu60 Page 60 8/17/11 5:41:22 PM user-s163

60

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

CHAPTER 2 EXERCISES

2.25 Incomes of college grads. According to the Census

(a) With a little care, you can find the median and the quartiles from the histogram. What are these numbers? How did

you find them?

(b) With a little care, you can also find the mean number of

servings of fruit claimed per day. First use the information in

the histogram to compute the sum of the 74 observations, and

then use this to compute the mean. What is the relationship

between the mean and median? Is this what you expected?

2.31 Guinea pig survival times. Here are the survival

times in days of 72 guinea pigs after they were injected with

infectious bacteria in a medical experiment.14 Survival times,

whether of machines under stress or cancer patients after

treatment, usually have distributions that are skewed to the

GUINEAPIGS

right.

43

45

53

56

56

57

58

66

67

73

74

79

80

80

81

81

81

82

83

83

84

88

89

91

91

92

92

97

99

99

100

100

101

102

102

102

103

104

107

108

109

113

114

118

121

123

126

128

137

138

139

144

145

147

156

162

174

178

179

184

191

198

211

214

243

249

329

380

403

511

522

598

(a) Graph the distribution and describe its main features.

Does it show the expected right-skew?

(b) Which numerical summary would you choose for these

data? Calculate your chosen summary. How does it reflect the

skewness of the distribution?

2.32 Weight of newborns. Page 61 gives the distribution

of the weight at birth for all babies born in the United States

in 2008:15

Photodisc Red/Getty Images

Bureau’s 2010 Current Population Survey, the mean and

median 2009 income of people at least 25 years old who had

a bachelor’s degree but no higher degree were $46,931 and

$58,762. Which of these numbers is the mean and which is

the median? Explain your reasoning.

2.26 Saving for retirement. Retirement seems a long

way off and we need money now, so saving for retirement is

hard. Once every three years, the Board of Governors of the

Federal Reserve System collects data on household assets and

liabilities through the Survey of Consumer Finances (SCF).

The most recent such survey was conducted in 2007, and the

survey results were released to the public in April 2009. The

survey presents data on household ownership of, and balances

in, retirement savings accounts. Only 53.6% of households

own retirement accounts. The mean value per household is

$148,579, but the median value is just $45,000. For households in which the head of household is under 35, 42.6% own

retirement accounts, the mean is $25,279, and the median is

$9600.13 What explains the differences between the two measures of center, both for all households and for the under-35

age group?

2.27 University endowments. The National Association

of College and University Business Officers collects data on

college endowments. In 2009, 842 colleges and universities

reported the value of their endowments. When the endowment values are arranged in order, what are the locations of

the median and the quartiles in this ordered list?

2.28 Pulling wood apart. Example 1.9 (page 21) gives the

WOOD

breaking strengths of 20 pieces of Douglas fir.

(a) Give the five-number summary of the distribution

of breaking strengths. (The stemplot, Figure 1.11, helps

because it arranges the data in order, but you should use the

unrounded values in numerical work.)

(b) The stemplot shows that the distribution is skewed to the

left. Does the five-number summary show the skew? Remember that only a graph gives a clear picture of the shape of a

distribution.

2.29 Comparing tropical flowers. An alternative presentation of the flower length data in Table 2.1 reports

the five-number summary and uses boxplots to display

the distributions. Do this. Do the boxplots fail to reveal

any important information visible in the stemplots in

TROPICALFLOWER

Figure 2.4?

2.30 How much fruit do adolescent girls eat? Figure

1.14 (page 30) is a histogram of the number of servings of fruit

per day claimed by 74 seventeen-year-old girls.

c02DescribingDistributionsWithNu61 Page 61 8/17/11 5:41:22 PM user-s163

user-F452

•

Chapter 2 Exercises

61

6,581

3,000 to 3,499

1,663,512

500 to 999

23,292

3,500 to 3,999

1,120,642

(b) Use the applet to convince yourself that when you add

yet another observation (there are now seven in all), the

median does not change no matter where you put the seventh

point. Explain why this must be true.

1,000 to 1,499

31,900

4,000 to 4,499

280,270

2.36 Never on Sunday: also in Canada? Exercise 1.5

1,500 to 1,999

67,140

4,500 to 4,999

39,109

2,000 to 2,499

218,296

5,000 to 5,499

4,443

2,500 to 2,999

788,148

(page 11) gives the number of births in the United States

on each day of the week during an entire year. The boxplots

in Figure 2.5 (page 62) are based on more detailed data

from Toronto, Canada: the number of births on each of the

365 days in a year, grouped by day of the week.16 Based on

these plots, compare the day-of-the-week distributions using

shape, center, and spread. Summarize your findings.

Weight (grams)

Less than 500

Count

Weight (grams)

Count

(a) For comparison with other years and with other countries, we prefer a histogram of the percents in each weight class

rather than the counts. Explain why.

(b) How many babies were there?

(c) Make a histogram of the distribution, using percents on

the vertical scale.

(d) What are the locations of the median and quartiles in

the ordered list of all birth weights? In which weight classes

do the median and quartiles fall?

2.33 More on study times. In Exercise 1.38 (page 34)

you examined the nightly study time claimed by first-year

college men and women. The most common methods for

formal comparison of two groups use x and s to summarize

STUDYTIMES

the data.

(a) What kinds of distributions are best summarized byx and

s? Do you think these summary measures are appropriate in

this case?

(b) One student in each group claimed to study at least 300

minutes (five hours) per night. How much does removing

these observations change x and s for each group? You will

need to compute x and s for each group, both with and without the high outlier.

2.34 Making resistance visible. In the Mean and Median

applet, place three observations on the line by clicking

below it: two close together near the center of the line

and one somewhat to the right of these two.

(a) Pull the single rightmost observation out to the right. (Place

the cursor on the point, hold down a mouse button, and drag

the point.) How does the mean behave? How does the median

behave? Explain briefly why each measure acts as it does.

(b) Now drag the single rightmost point to the left as far

as you can. What happens to the mean? What happens to

the median as you drag this point past the other two (watch

carefully)?

2.35 Behavior of the median. Place five observations on the

line in the Mean and Median applet by clicking below it.

(a) Add one additional observation without changing

the median. Where is your new point?

2.37 Thinking about means. Table 1.1 (page 12) gives

the percent of foreign-born residents in each of the states. For

the nation as a whole, 12.5% of residents are foreign-born.

Find the mean of the 51 entries in Table 1.1. It is not 12.5%.

Explain carefully why this happens. (Hint: The states with

the largest populations are California, Texas, New York, and

Florida. Look at their entries in Table 1.1.)

2.38 Thinking about medians. A report says that “the

median credit card debt of American households is zero.’’

We know that many households have large amounts of credit

card debt. In fact, the mean household credit card debt is

close to $8000. Explain how the median debt can nonetheless be zero.

2.39 A standard deviation contest. This is a standard

deviation contest. You must choose four numbers from the

whole numbers 0 to 10, with repeats allowed.

(a) Choose four numbers that have the smallest possible

standard deviation.

(b) Choose four numbers that have the largest possible standard deviation.

(c) Is more than one choice possible in either (a) or (b)?

Explain.

2.40 Test your technology. This exercise requires a calculator with a standard deviation button or statistical software

on a computer. The observations

10,001

10,002

10,003

have mean x 10,002 and standard deviation s 1. Adding

a 0 in the center of each number, the next set becomes

100,001

100,002

100,003

The standard deviation remains s 1 as more 0s are added.

Use your calculator or software to find the standard deviation

of these numbers, adding extra 0s until you get an incorrect

answer. How soon did you go wrong? This demonstrates that

c02DescribingDistributionsWithNu62 Page 62 8/17/11 5:41:23 PM user-s163

CHAP TER 2

•

Describing Distributions with Numbers

100

80

60

Number of births

120

62

user-F452

Monday

Tuesday Wednesday Thursday

Friday

Saturday

Sunday

Day of week

F IGURE 2.5

Boxplots of the distributions of numbers of births in Toronto, Canada, on each day of the week

during a year, for Exercise 2.36.

calculators and software cannot handle an arbitrary number

of digits correctly.

2.41 You create the data. Create a set of 5 positive numbers (repeats allowed) that have median 7 and mean 10.

What thought process did you use to create your numbers?

2.42 You create the data. Give an example of a small set of

data for which the mean is smaller than the first quartile.

2.43 Adolescent obesity. Adolescent obesity is a serious

health risk affecting more than 5 million young people in the

United States alone. Laparoscopic adjustable gastric banding

has the potential to provide a safe and effective treatment.

Fifty adolescents between 14 and 18 years old with a body

mass index (BMI) higher than 35 were recruited from the

Melbourne, Australia, community for the study.17 Twentyfive were randomly selected to undergo gastric banding,

and the remaining twenty-five were assigned to a supervised

lifestyle intervention program involving diet, exercise, and

behavior modification. All subjects were followed for two

years. Here are the weight losses in kilograms for the subjects

GASTRICBANDS

who completed the study:

Gastric banding

35.6

81.4

57.6

32.8

31.0 37.6

36.5

5.4

27.9

49.0

64.8 39.0

43.0

33.9

29.7

20.2

15.2 41.7

53.4

13.4

24.8

19.4

32.3 22.0

Lifestyle intervention

6.0

17.0

2.0 3.0

1.4

4.0

20.6

11.6 15.5

4.6

15.8 34.6

6.0 3.1 4.3 16.7 1.8 12.8

c02DescribingDistributionsWithNu63 Page 63 8/17/11 5:41:23 PM user-s163

user-F452

•

(a) In the context of this study, what do the negative values

in the data set mean?

(b) Give a graphical comparison of the weight loss distribution for both groups using side-by-side boxplots. Provide

appropriate numerical summaries for the two distributions and identify any high outliers in either group. What

can you say about the effects of gastric banding versus

lifestyle intervention on weight loss for the subjects in this

study?

(c) The measured variable was weight loss in kilograms.

Would two subjects with the same weight loss always have

similar benefits from a weight reduction program? Does it

depend on their initial weights? Other variables considered

in this study were the percent of excess weight lost and

the reduction in BMI. Do you see any advantages to either

of these variables when comparing weight loss for two

groups?

(d) One subject from the gastric-banding group dropped

out of the study and seven subjects from the lifestyle group

dropped out. Of the seven dropouts in the lifestyle group,

six had gained weight at the time they dropped out. If

all subjects had completed the study, how do you think

it would have affected the comparison between the two

groups?

Exercises 2.44 to 2.49 ask you to analyze data without having the details outlined for you. The exercise statements give

you the State step of the four-step process. In your work,

follow the Plan, Solve, and Conclude steps as illustrated in

Example 2.9.

2.44 Athletes’ salaries. The Montreal Canadiens were

founded in 1909 and are the longest continuously

operating professional ice hockey team. They have

won 24 Stanley Cups, making them one of the most suc-

TABLE 2.2

Chapter 2 Exercises

63

cessful professional sports teams of the traditional four

major sports of Canada and the United States. Table

2.2 gives the salaries of the 2010—2011 roster.18 Provide

the team owner with a full description of the distribution of salaries and a brief summary of its most important

HOCKEYSALARIES

features.

AP Photo/The Canadian Press, Ryan Remiorz

2.45 Returns on stocks. How well have stocks done

over the past generation? The Wilshire 5000 index

describes the average performance of all U.S. stocks.

The average is weighted by the total market value of each

company’s stock, so think of the index as measuring the

performance of the average investor. Page 64 gives the percent returns on the Wilshire 5000 index for the years from

WILSHIRE5000

1971 to 2010:

Salaries for the 2010—2011 Montreal Canadiens

PLAYER

Scott Gomez

Mike Cammalleri

Jaroslav Spacek

Carey Price

Benoit Pouliot

Max Pacioretty

Yannick Weber

David Desharnais

SALARY

$8,000,000

$5,000,000

$3,833,000

$2,500,000

$1,350,000

$875,000

$637,500

$550,000

PLAYER

Andrei Markov

Brian Gionta

Andrei Kostitsyn

Hal Gill

Josh Gorges

Lars Eller

Jeff Halpern

Mathieu Darche

SALARY

$5,750,000

$5,000,000

$3,250,000

$2,250,000

$1,300,000

$875,000

$600,000

$500,000

PLAYER

Roman Hamrlik

Tomas Plekanec

James Wisniewski

Travis Moen

Alex Auld

P. K. Subban

Alexandre Picard

Tom Pyatt

SALARY

$5,500,000

$5,000,000

$3,250,000

$1,500,000

$1,000,000

$875,000

$600,000

$500,000

c02DescribingDistributionsWithNu64 Page 64 8/17/11 5:41:28 PM user-s163

64

CHAP TER 2

•

user-F452

Describing Distributions with Numbers

What can you say about the distribution of yearly returns

on stocks?

Wilshire index for the years 1971 to 2010

Year

Return

Year

Return

1971

16.19

1991

33.58

1972

17.34

1992

9.02

1973

18.78

1993

10.67

1974

27.87

1994

0.06

1975

37.38

1995

36.41

1976

26.77

1996

21.56

1977

2.97

1997

31.48

1978

8.54

1998

24.31

1979

24.40

1999

24.23

1980

33.21

2000

10.89

1981

3.98

2001

10.97

1982

20.43

2002

20.86

1983

22.71

2003

31.64

1984

3.27

2004

12.48

1985

31.46

2005

6.38

1986

15.61

2006

15.77

1987

1.75

2007

5.62

1988

17.59

2008

37.23

1989

28.53

2009

28.30

1990

6.03

2010

17.16

TABLE 2.3

Amount spent (euros) by customers in a restaurant when exposed

to odors

2.46 Do good smells bring good business? Businesses

know that customers often respond to background

music. Do they also respond to odors? Nicolas Guéguen and his colleagues studied this question in a small pizza

restaurant in France on Saturday evenings in May. On one

evening, a relaxing lavender odor was spread through the

restaurant; on another evening, a stimulating lemon odor;

a third evening served as a control, with no odor. Table

2.3 shows the amounts (in euros) that customers spent on

each of these evenings.19 Compare the three distributions.

Were both odors associated with increased customer

ODORS

spending?

2.47 Daily activity and obesity. People gain weight

when they take in more energy from food than

they expend. Table 2.4 (page 65) compares volunteer subjects who were lean with others who were mildly

obese. None of the subjects followed an exercise program.

The subjects wore sensors that recorded every move for

10 days. The table shows the average minutes per day spent

in activity (standing and walking) and in lying down.20

Compare the distributions of time spent actively for lean

and obese subjects and also the distributions of time spent

lying down. How does the behavior of lean and mildly obese

OBESITY

people differ?

NO ODOR

15.9

15.9

18.5

18.5

18.5

18.5

15.9

18.5

15.9

18.5

18.5

18.5

18.5

20.5

15.9

21.9

18.5

18.5

15.9

18.5

15.9

15.9

15.9

25.5

15.9

15.9

12.9

15.9

15.9

15.9

18.5

18.5

18.5

15.9

18.5

18.5

18.5

18.5

18.5

18.5

18.5

18.5

21.9

22.5

18.5

20.7

21.5

24.9

21.9

21.9

21.9

22.5

LEMON ODOR

18.5

15.9

25.9

15.9

18.5

15.9

18.5

21.5

15.9

18.5

15.9

15.9

18.5

21.9

18.5

15.9

15.9

18.5

LAVENDER ODOR

21.9

21.5

25.9

18.5

18.5

21.9

22.3

25.5

18.5

21.9

18.5

18.5

18.5

18.5

22.8

24.9

21.9

18.5

CHApT€R

11

Data Description and Probability Oistributlon5

Mitch.d ProblCln 3

Add the salary $100,000 to those in Example 3 aDd compute the median

😉

,l

‘i’,,

and

mear for these eight salariesJt1e.llegian, as we-have defined it, is easy to deterntire and is not influenced by

extreme values. Our definition does have some minor handicaps, howevdrlFiist, if

the measurements we are analyiing were carried out in a laboratory and pres€nted

to us in a freguency table, we may not have access to the individual ineasurements,

In that case we would not be able to compute the median usiog the above definition

Second, a set like 4,4, 6, 7,7, ?, 9 would have mediau 7 by our definition, but 7 does

trot possess the symmetry we expect of a “middle elemcnt” since there are thrce

measulements below 7 but only one above.

To overcome these handicaps, we define a second cancept, llle medidn for

grouped dam.To guarcntee that the median for gouped data exists and is unique. we

assume that the frequency table lor the grouped data has no classes of frequency 0

:

!:

l

.–)

DEflNlTlOll Medirtu Giiouped Data

The median for grouped drtr with no classes of frequency 0 is the number

such

that the histogram has the same area to the left oI the mediafl as to the right of the

median (see Fig.2).

Frgute 2 ‘I.ll€ arci to the

lefl ofthe median equals th€

areB to thc righl.

J

@

iltf”t;[?.”

SoLUtloN

‘

‘

*”0*”

fot crouped Dula Compute tbe metlian for the groupcd diua

First $,e draw the histogram of the data (Frg.3).The total area of the histogram is l5.

which is just lhe sum o{ the frequencies, since all rectangles have a base of lengtb L

The area to the left of the median must be hau the total arca-that is, 1i = 7,5.

I-ooking at Figure 3,we $ee tha! the median M lies between 6..5 and ?.5.Thus, tbe area

to the left of il,wNch is the sum of tbe blue sbaded areas in Fgurc 3, lnust bc 7.5:

(1X3) +

Solving for M gives

M=

(lxl) + OX2)

+ (M

– 6.sX4) = ?.5

6.875. The median for the glcruped data in Thbte 3 is 6.875.

lbbl. 3

Chis lntervf,l

j

Ileqrercy

4.5-5.s

3

1,

s.tu.5

2

6.5-75

4

3.5-4.5

7.5-8.5

8.5-9.5

l.l

!

Flgv.e,

l-t

Gvu.f

i,”.,,i:

SECTION

Interval

0.5-2.5

2.54.5

4.5-6.5

5.5-8.5

@

5

1

State Gasoline Tax, 2OO7

2

Wisconsin

1.91

7.99

8.0i

8,04

6.24

6_24

8.13

8.09

7.95

preference.

a

is

51

91

80

95

91

81

85

fair die.

1n; f’orm

44.5

Connecticut

Nebraska

35.5

Kansas

25

Texas

20

California

Florida

31.1

26.2

41.6

Life IHours] of 50 Randomly Selected Lightbulbs

Interval

Frequency

60

expect the mean of the data set to be?

The median?

ffi

New York

15. Lightbulb lifetime. Find the mean and median for the data

in the following table.

formed by recording the results of 100 rolls of

(A) What would you

32.9

81.2

which single measure of central tendency-mean, median,

or mode-would you say best descdbes the following set

of measurements? Discuss the factgrs that justify your

47

69

Tax (Cents)

State

‘7

pteference.

9. A data set

Nleasures of Central Tendency

74. Gasoline tax. Find the mean, median, and mode for the

data in the following table.

Frequency

7, Which single measure of central tendency-mean, mediari,

or mode-would you say best describes the following set

of measureme[ts? Discuss the factors that justify your

S8.

U-a

such a data set by using a graphing calculator to sim-

799.5-899.5

3

899.5-999.5

10

999.5-1,099.5

24

1,099.5-1,199.5

12

i,199.5-1.299.5

1

ulate 100 rolls of a fair die, and find its mean and median.

10, A data set is formed by recording the sums on 200 rolls of

pair of fair dice.

(A) What would you

a

expect the mean of the data set to be?

The median?

ffi

(B) Fo.- s.,”h

a data set by using a graphing calculator to

simulate 200 rolls of a pair of fair dbe, and find the

(A) Construct a set of four numbets that has mean

300.

median 250, and mode 175.

Price-€arnings ratios. Find the mean and median for the

data in the following table.

Price-Earnings Fatios of 1OO Randomly Chosen Stocks

from the New York Stock Exchanqe

Interyd

mean and median of the set.

ll.

16.

1e; f-et mr > m2 > ,n3. Devise and discuss a procedure

for constructing a set of four numbers that has mean

Frequency

-0.5-4.5

5

4.!9.5

54

9.5-14.5

25

14.5-19.5

9

19.5-24.5

4

24.5-29.5

29.5-34.5

1

2

ml, median m2, and mode m3.

12. (A) Construct a set of five numbe$ that has mean 200,

median 150, and mode 50.

1b; I-et mt ) mz ) ,n3. Devise and discuss a procedure

for constructing a set of five numberc that has mean

ml, median

tn 2,

and mode

m 3.

Average Lcderal Work-Study Award

Year

Award ($)

Applications

13. Price-earnings ratios. Find the mean, median, and

for the data in the following table.

Price-Earninqs Ratios for Eight Stocks in a Portfolio

5.3

12.9

10.1

8.4

17. Financial aid. Find the mean, median, and mode for the

data on federal student financial assistance in the followilg

table. (Solrce. College Board)

1995

1,087

1,997

1,215

1999

I,252

2001

r,394

2003

‘ tq6

1,446

18.7

35.5

2005

16.2

10.1

2007

7

i

CHAPTER

11

Data Description and Probability Distributions

18. Tourism. Find the mean, median, and mode for the data in

the following table. (Souce. The World Bank)

22. Grade-point averages. Find the mean and median for the

grouped data in the following table.

lnternational Tourism Receipts, aOO6

Country

Udted States

Spain

Fmnce

Grcat Britain

Germary

Italy

Graduating Class Grade-Point Averages

Interval

Frequency

Receipts (bilion $)

12a.9

2t

1.95-2.15

57.5

19

54.0

74

42.8

2.35-2.55

2.55-2.75

2.75-2.95

47.6

2951.15

6

China

37.1.

Calada

17.0

3.15-3.35

3.35-3.55

4

14.5

3;,55-i;15

3

12.7

3.751.95

2

Greece

Belgiuo

19. Mouse

43.0

weights. Find the mean and median for the data in

the following table.

5

on page 518.

Presidents.

Frequency

7

45.547.5

47.549.5

13

49.5*51.5

19

51.5-53.5

53.5-55.5

l7

15

55.5*57.5

7

Frld the mean and median for the

grouped

data in the following table.

3

43.5-45.5

57.5-59.5

9

23. Entrance examination scores. Compute the median for the

gouped data of entrance examination scores given inTable 1

Mouse Weights (Grams)

Interval

41.543.5

t7

U.S. Presidents’ Ages at lnauguration

Age

1’7

Number

39.5-44.5

2

44549.5

7

49.5-54.5

12

54.5-59.5

13

.7

s9.5-64.5

2

Blood cholesterol levels. Find the mean and median for the

.

data in the following table.

– _t

64.5-49.5

2

69.5-:74.5

1

Blood Cholesterol Levels (Milligrams per Deciliter)

Inten€l

149.5-769.5

169.5-189.5

189.5-209.5

209.5-229.5

229.5,2495

249.5-269.5

269.5-289.5

289.5-309.5

Frequency

4

11

ls

25

13

‘7

3

2

Immigration. Find the mean, median, and mode for the

data in the following table. (Sorlce. U.S. Census Bureau)

Top Ten Countries of Birth of U.S. Foreiqn-Born

Population, eOOT

Country

Mexico

Number (thousands)

11,739

China

Philippines

1,930

India

El Salvado!

Vietnam

1,502

Korea

Cuba

1,043

1,701

1,1M

1,101

983

Canada

830

Domidcan Republic

756

Answel3 to Matched problems

1.

t:3.8

2., x

3. Median =

IO.1

=

4. Median for grouped data = 6.8

5. Arrange

$44,000; mean

$63,250

each set of data in ascending order:

Set

Mode Median

1

2

None 5

(c) 1,1,2,3,3,3,5,6,8,8,8 3,8

3

Data

9

(B) 1,2,4,5,7,8,9

(A)

1, 1, 1, 1,

2,2, 4, 5,

Mean

2.89

5,14

4.36

CHAPTER 2:

Describing Distributions

with Numbers

The Basic Practice of Statistics

6th Edition

Moore / Notz / Fligner

Lecture PowerPoint Slides

Chapter 2 Concepts

2

Measuring Center: Mean and Median

Measuring Spread: Quartiles

Five-Number Summary and Boxplots

Spotting Suspected Outliers

Measuring Spread: Standard Deviation

Choosing Measures of Center and Spread

Chapter 2 Objectives

3

Calculate and Interpret Mean and Median

Compare Mean and Median

Calculate and Interpret Quartiles

Construct and Interpret the Five-Number

Summary and Boxplots

Determine Suspected Outliers

Calculate and Interpret Standard Deviation

Choose Appropriate Measures of Center and

Spread

Organize a Statistical Problem

Measuring Center: The Mean

The most common measure of center is the arithmetic

average, or mean.

To find the mean x (pronounced “x-bar”) of a set of observations, add

their values and divide by the number of observations. If the n

observations are x1, x2, x3, …, xn, their mean is:

sum of observations x1 + x 2 + …+ x n

x=

=

n

n

or in more compact notation

x

å

x=

i

n

4

Measuring Center: The Median

5

Because the mean cannot resist the influence of extreme

observations, it is not a resistant measure of center.

Another common measure of center is the median.

The median M is the midpoint of a distribution, the number such

that half of the observations are smaller and the other half are

larger.

To find the median of a distribution:

1. Arrange all observations from smallest to largest.

2. If the number of observations n is odd, the median M is the

center observation in the ordered list.

3. If the number of observations n is even, the median M is the

average of the two center observations in the ordered list.

Measuring Center

6

10

Use the data below to calculate the mean and median of the

commuting times (in minutes) of 20 randomly selected New York

workers.

30

5

25

40

20

10

15

30

20

15

20

85

15

65

15

60

60

40

10 + 30 + 5 + 25 + …+ 40 + 45

x=

= 31.25 minutes

20

0

1

2

3

4

5

6

7

8

5

005555

0005

Key: 4|5

00

represents a

005

005

5

New York

worker who

reported a 45minute travel

time to work.

20 + 25

M=

= 22.5 minutes

2

45

Comparing the Mean and

Median

7

The mean and median measure center in different ways,

and both are useful.

Comparing the Mean and the Median

The mean and median of a roughly symmetric distribution are

close together.

If the distribution is exactly symmetric, the mean and median

are exactly the same.

In a skewed distribution, the mean is usually farther out in the

long tail than is the median.

Measuring Spread: Quartiles

8

A measure of center alone can be misleading.

A useful numerical description of a distribution requires

both a measure of center and a measure of spread.

How to Calculate the Quartiles and the Interquartile Range

To calculate the quartiles:

1) Arrange the observations in increasing order and locate the

median M.

2) The first quartile Q1 is the median of the observations

located to the left of the median in the ordered list.

3) The third quartile Q3 is the median of the observations

located to the right of the median in the ordered list.

The interquartile range (IQR) is defined as: IQR = Q3 – Q1

Five-Number Summary

9

The minimum and maximum values alone tell us little about

the distribution as a whole. Likewise, the median and

quartiles tell us little about the tails of a distribution.

To get a quick summary of both center and spread,

combine all five numbers.

The five-number summary of a distribution consists of the

smallest observation, the first quartile, the median, the third

quartile, and the largest observation, written in order from

smallest to largest.

Minimum

Q1

M

Q3

Maximum

Boxplots

10

The five-number summary divides the distribution roughly

into quarters. This leads to a new way to display

quantitative data, the boxplot.

How to Make a Boxplot

• Draw and label a number line that includes the

range of the distribution.

• Draw a central box from Q1 to Q3.

• Note the median M inside the box.

• Extend lines (whiskers) from the box out to the

minimum and maximum values that are not

outliers.

Suspected Outliers: The 1.5 IQR Rule

11

In addition to serving as a measure of spread, the

interquartile range (IQR) is used as part of a rule of thumb

for identifying outliers.

The 1.5 IQR Rule for Outliers

Call an observation an outlier if it falls more than 1.5 IQR above the

third quartile or below the first quartile.

In the New York travel time data, we found Q1 = 15

minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes.

0

1

2

For these data, 1.5 IQR = 1.5(27.5) = 41.25

3

Q1 – 1.5 IQR = 15 – 41.25 = –26.25

4

Q3+ 1.5 IQR = 42.5 + 41.25 = 83.75

5

Any travel time shorter than −26.25 minutes or longer than 6

7

83.75 minutes is considered an outlier.

8

5

005555

0005

00

005

005

5

Boxplots

12

Consider our NY travel times data. Construct a boxplot.

10

30

5

25

40

20

10

15

30

20

15

20

85

15

65

15

60

60

40

45

5

10

10

15

15

15

15

20

20

20

25

30

30

40

40

45

60

60

65

85

M = 22.5

Measuring Spread: Standard

Deviation

13

The most common measure of spread looks at how far

each observation is from the mean. This measure is called

the standard deviation.

The standard deviation sx measures the average distance of the

observations from their mean. It is calculated by finding an average of

the squared distances and then taking the square root. This average

squared distance is called the variance.

(x1 – x ) 2 + (x 2 – x ) 2 + …+ (x n – x ) 2

1

variance = s =

=

(x i – x ) 2

å

n -1

n -1

2

x

1

2

standard deviation = sx =

(x

x

)

å

i

n -1

Calculating the Standard Deviation

14

Example: Consider the following data on the number of

pets owned by a group of nine children.

1) Calculate the mean.

2) Calculate each deviation.

deviation = observation – mean

deviation: 1 – 5 = -4

deviation: 8 – 5 = 3

x=5

Calculating the Standard Deviation

15

3) Square each deviation.

4) Find the “average” squared deviation.

Calculate the sum of the squared

deviations divided by (n-1)…this is

called the variance.

5) Calculate the square root of the

variance…this is the standard

deviation.

(xi-mean)2

xi

(xi-mean)

1

1 – 5 = -4

(-4)2 = 16

3

3 – 5 = -2

(-2)2 = 4

4

4 – 5 = -1

(-1)2 = 1

4

4 – 5 = -1

(-1)2 = 1

4

4 – 5 = -1

(-1)2 = 1

5

5-5=0

(0)2 = 0

7

7-5=2

(2)2 = 4

8

8-5=3

(3)2 = 9

9

9-5=4

(4)2 = 16

Sum=?

“Average” squared deviation = 52/(9-1) = 6.5

Standard deviation = square root of variance =

Sum=?

This is the variance.

6.5 = 2.55

Choosing Measures of Center and Spread

16

We now have a choice between two descriptions for center and spread

Mean and Standard Deviation

Median and Interquartile Range

Choosing Measures of Center and Spread

•The median and IQR are usually better than the mean and

standard deviation for describing a skewed distribution or a

distribution with outliers.

•Use mean and standard deviation only for reasonably

symmetric distributions that don’t have outliers.

•NOTE: Numerical summaries do not fully describe the

shape of a distribution. ALWAYS PLOT YOUR DATA!

Organizing a Statistical Problem

17

As you learn more about statistics, you will be asked to

solve more complex problems.

Here is a four-step process you can follow.

How to Organize a Statistical Problem: A Four-Step Process

State: What’s the practical question, in the context of the realworld setting?

Plan: What specific statistical operations does this problem call

for?

Do: Make graphs and carry out calculations needed for the

problem.

Conclude: Give your practical conclusion in the setting of the

real-world problem.

Chapter 2 Objectives Review

18

Calculate and Interpret Mean and Median

Compare Mean and Median

Calculate and Interpret Quartiles

Construct and Interpret the Five-Number

Summary and Boxplots

Determine Suspected Outliers

Calculate and Interpret Standard Deviation

Choose Appropriate Measures of Center and

Spread

Organize a Statistical Problem

ﺭﻗﻢ ﺍﻟﻄﺎﻟﺐ 1109285062 :

ﺇﺳﻢ ﺍﻟﻤﻮﻇﻒ 1109285062 :

ﺍﻟﺘﺎﺭﻳﺦ PM 11:04:16 2021/01/27 :

ﺳ

ﺇ

ﻢﺍ

ﻟﻤ

ﻮﻇ

ﺍﻟ

ﺦ:

ﺎﺭﻳ

ﺘ

ﻒ:

62

21

/0

1/

27

50

28

09

11

20

16

4:

:0

11

PM

ﺳ

ﺇ

ﻢﺍ

ﻟﻤ

ﻮﻇ

ﺍﻟ

ﺦ:

ﺎﺭﻳ

ﺘ

ﻒ:

62

21

/0

1/

27

50

28

09

11

20

16

4:

:0

11

PM

ﺳ

ﺇ

ﻢﺍ

ﻟﻤ

ﻮﻇ

ﺍﻟ

ﺦ:

ﺎﺭﻳ

ﺘ

ﻒ:

62

21

/0

1/

27

50

28

09

11

20

16

4:

:0

11

PM

SACM Student Progress Evaluation

MSU Student ID:

Degree level:

Minor (if any):

Alsufari, Rahaf K.A

Department Of Biological Sciences

Biology

14079733

BS

N/A

ﺍﻟ

ﺎﺭﻳ

ﺘ

1

:ﻒ

ﻮﻇ

ﻟﻤ

ﻢﺍ

ﺳ

ﺇ

Student Name:

Department:

Current Major:

8

Expected Date of Graduation: (FALL/SPRING year)

1/

28

/0

09

4:

16

20

21

11

PM

6

11

:0

2

3

4

5

27

50

62

:ﺦ

7

Minimum number of Semester Credits Required to Complete Program of Study:

120

(equals 3+6)

53

Total Number of Completed Semester Credits including Transfer:

Total Number of Credit Hours Counting toward full degree program of study:

8

Number of Semester Credits Accepted in Transfer:

Of Which, How Many will apply towards the Major, Gen. Ed or Open

8

Electives:

Number of Remaining Credits to complete program of study (including

76

registered hours):

Yes

Has Student Been Accepted into Major?

(estimate that could be impacted by many variables including course availability,

at the

earliest

student performance etc.)

Advisor’s Name:

E-mail address:

Ken Adams

kenneth.adams@mnsu.edu

Spring 23

Signature

Date:

01/20/21

Additional note if required:

Graduation

would require a minimum of 5 semesters given the prerequisite structure of the biology

_____________________________________________________________________________________

and chemistry classes.

Date Issued: 01/06/2021

Student Full Name: Alsufari, Rahaf K.A

Tech Id

14079733

Degree Level: BS

Current Major

Biology

Hybrid %

Online In class

A. Previously Taken Online (Hybrid, Web-enhanced, Blended) Class (s):

Online

100%

1- Course Title:

Course No

Credits

2- Course Title:

Course No

Credits

Semester/Yr

Semester/Yr

3- Course Title:

Course No

Credits

Semester/Yr

4- Course Title:

Course No

Credits

Semester/Yr

5- Course Title:

Course No

Credits

Printed Name

Joshua Woldt

Signature of Registrar

Semester/Yr

Printed Name

Chair Signature

1-19-21

Date Signed

Email Address

Date Signed

ﻢﺍ

ﺳ

ﺇ

Joshua.woldt@mnsu.edu

Email Address

ﻮﻇ

ﻟﻤ

Hybrid %

In class

Credits

Semester/Yr

Semester/Yr

ﺍﻟ

27

50

Online

100%

1/

Printed Name

/0

09

11

20

21

Chair Signature

Date Signed

16

1-19-21

Email Address

Date Signed

:0

4:

Joshua.woldt@mnsu.edu

Email Address

Signature of Registrar

28

Printed Name

Joshua Woldt

ﺎﺭﻳ

ﺘ

Credits

Course No

:ﺦ

Course No

2- Course Title:

62

1- Course Title:

:ﻒ

B. Currently Registered (or Preregistered) Online (Hybrid, Web-enhanced, Blended) Class: Online

Yes:

Yes:

Yes:

Yes:

Yes:

Yes:

No:

No:

No:

No:

No:

No:

Course # (2)

Is the course required in Student’s program of study?

Is this course available in face-to-face format?

Is there an available substitute face-to-face class for this course?

Could it be taken in coming semesters without conflict with degree plan?

Will graduation be delayed if course not taken in the semester requested?

Is this student graduating by the end of current semester?

Yes:

Yes:

Yes:

Yes:

Yes:

Yes:

No:

No:

No:

No:

No:

No:

PM

11

Step 2: To be completed by the Student’s Advisor:

(Evaluation of Course Reported in Item B.)

Course # (1)

Is the course required in Student’s program of study?

Is this course available in face-to-face format?

Is there an available substitute face-to-face class for this course?

Could it be taken in coming semesters without conflict with degree plan?

Will graduation be delayed if course not taken in the semester requested?

Is this student graduating by the end of current semester?

Printed Name:

Signature of Advisor

Email Address:

Date signed:

Notes

FM.indd Page xxv 11/9/11 3:58:39 PM user-s163

user-F452

FM.indd Page i 11/10/11 3:45:17 PM user-s163

user-F452

The Basic Practice

of Statistics

SIXTH EDITION

D AV I D S . M O O R E

Purdue University

WILLIAM I. NOTZ

The Ohio State University

MICHAEL A. FLIGNER

The Ohio State University

W. H. Freeman and Company

New York

FM.indd Page ii 11/9/11 3:58:32 PM user-s163

Publisher: Ruth Baruth

Acquisitions Editor: Karen Carson

Executive Marketing Manager: Jennifer Somerville

Developmental Editors: Andrew Sylvester and Leslie Lahr

Senior Media Acquisitions Editor: Roland Cheyney

Senior Media Editor: Laura Capuano

Associate Editor: Katrina Wilhelm

Assistant Media Editor: Catriona Kaplan

Editorial Assistant: Tyler Holzer

Photo Editor: Cecilia Varas

Photo Researcher: Elyse Rieder

Cover and Text Designer: Blake Logan

Senior Project Editor: Mary Louise Byrd

Illustrations: Macmillan Solutions

Production Coordinator: Susan Wein

Composition: Aptara®, Inc.

Printing and Binding: Quad Graphics

Library of Congress Control Number:

2011934674

Student Edition (Hardcover w/cd) Student Edition (Paperback w/cd) Student Edition (Looseleaf w/cd)

ISBN-13: 978-1-4641-0254-7

ISBN-13: 978-1-4641-0434-3

ISBN-13: 978-1-4641-0433-6

ISBN-10: 1-4641-0254-6

ISBN-10: 1-4641-0434-4

ISBN-10: 1-4641-0433-6

© 2013, 2010, 2007, 2004 by W. H. Freeman and Company

All rights reserved

Printed in the United States of America

First printing

W. H. Freeman and Company

41 Madison Avenue

New York, NY 10010

Houndmills, Basingstoke RG21 6XS, England

www.whfreeman.com

user-F452

FM.indd Page iii 11/18/11 11:54:13 PM user-s163

user-F452

Brief Contents

Pa r t I

1

Exploring Data

Exploring Data: Variables and Distributions

CHAPTER 1

Picturing Distributions with Graphs 3

CHAPTER 2

Pa r t I I I

Describing Distributions with

Numbers 39

Quantitative Response Variable

Inference about a Population

Mean 437

Two-Sample Problems 465

Categorical Response Variable

CHAPTER 20 Inference about a Population

Proportion 493

The Normal Distributions 69

Exploring Data: Relationships

CHAPTER 4

Scatterplots and Correlation 97

CHAPTER 19

CHAPTER 5

Regression

125

CHAPTER 6

Two-Way Tables*

CHAPTER 7

Exploring Data: Part I Review

Pa r t I I

From Exploration to

Inference

197

159

175

CHAPTER 21

Comparing Two Proportions

CHAPTER 22

Inference about Variables: Part III

Review 533

Pa r t I V

Inference about

Relationships

Producing Data

Producing Data: Sampling

199

Producing Data: Experiments

Commentary: Data Ethics*

Probability and Sampling Distributions 246

CHAPTER 10 Introducing Probability 259

CHAPTER 9

435

CHAPTER 18

CHAPTER 3

CHAPTER 8

Inference about

Variables

Sampling Distributions 285

CHAPTER 12

General Rules of Probability* 307

Binomial Distributions* 331

Foundations of Inference

CHAPTER 14 Confidence Intervals: The Basics

351

CHAPTER 15

Tests of Significance: The Basics

369

CHAPTER 16

Inference in Practice

CHAPTER 17

From Exploration to Inference: Part II

Review 417

551

CHAPTER 23

Two Categorical Variables:

The Chi-Square Test 553

CHAPTER 24

Inference for Regression

CHAPTER 25

One-Way Analysis of Variance:

Comparing Several Means 623

Pa r t V

Optional Companion

Chapters

223

CHAPTER 11

515

CHAPTER 13

587

(available on the BPS CD and online)

391

CHAPTER 26

Nonparametrics Tests

26-3

CHAPTER 27

Statistical Process Control

CHAPTER 28

Multiple Regression*

CHAPTER 29

More about Analysis of Variance 29-3

27-3

28-3

*Starred material is not required for later parts of the text.

iii

FM.indd Page iv 11/9/11 3:58:32 PM user-s163

user-F452

Detailed Table of Contents

To the Instructor viii

Media and Supplements xix

About the Authors xxiv

To the Student xxvi

Pa r t I

CHAPTER 4

Scatterplots and Correlation

1

Exploring Data

CHAPTER 1

Picturing Distributions with Graphs 3

Individuals and variables 3

Categorical variables: pie charts and bar graphs

Quantitative variables: histograms 11

Interpreting histograms 15

Quantitative variables: stemplots 20

Time plots 23

6

Measuring center: the mean 40

Measuring center: the median 41

Comparing the mean and the median 42

Measuring spread: the quartiles 43

The five-number summary and boxplots 45

Spotting suspected outliers* 48

Measuring spread: the standard deviation 49

Choosing measures of center and spread 51

Using technology 53

Organizing a statistical problem 55

Regression lines 125

The least-squares regression line 128

Using technology 130

Facts about least-squares regression 132

Residuals 135

Influential observations 139

Cautions about correlation and regression 142

Association does not imply causation 144

CHAPTER 6

Two-Way Tables* 159

Marginal distributions 160

Conditional distributions 162

Simpson’s paradox 166

CHAPTER 7

Exploring Data: Part I Review

Part I summary 177

Test yourself 180

Supplementary exercises

175

191

69

Density curves 69

Describing density curves 73

Normal distributions 75

The 68–95–99.7 rule 77

The standard Normal distribution 80

Finding Normal proportions 81

Using the standard Normal table 83

Finding a value given a proportion 86

*Starred material is not required for later parts of the text.

iv

Explanatory and response variables 97

Displaying relationships: scatterplots 99

Interpreting scatterplots 101

Adding categorical variables to scatterplots 104

Measuring linear association: correlation 106

Facts about correlation 108

CHAPTER 5

Regression 125

CHAPTER 2

Describing Distributions with Numbers 39

CHAPTER 3

The Normal Distributions

97

Pa r t I I

From Exploration

to Inference

CHAPTER 8

Producing Data: Sampling

199

Population versus sample 199

How to sample badly 202

Simple random samples 203

197

FM.indd Page v 11/9/11 3:58:32 PM user-s163

user-F452

•

Inference about the population 208

Other sampling designs 209

Cautions about sample surveys 210

The impact of technology 213

CHAPTER 9

Producing Data: Experiments

CHAPTER 13

Binomial Distributions*

232

351

The reasoning of tests of significance 370

Stating hypotheses 372

P-value and statistical significance 374

Tests for a population mean 378

Significance from a table* 382

253

CHAPTER 16

Inference in Practice 391

Conditions for inference in practice 392

Cautions about confidence intervals 395

Cautions about significance tests 397

Planning studies: sample size for confidence intervals 401

Planning studies: the power of a statistical test* 402

268

CHAPTER 11

Sampling Distributions 285

Parameters and statistics 285

Statistical estimation and the law of large numbers

Sampling distributions 290 _

The sampling distribution of x 293

The central limit theorem 295

CHAPTER 12

General Rules of Probability*

CHAPTER 14

Conﬁdence Intervals: The Basics

CHAPTER 15

Tests of Signiﬁcance: The Basics 369

259

The idea of probability 260

The search for randomness* 262

Probability models 264

Probability rules 266

Finite and discrete probability models

Continuous probability models 271

Random variables 275

Personal probability* 276

331

The reasoning of statistical estimation 352

Margin of error and confidence level 354

Confidence intervals for a population mean 357

How confidence intervals behave 361

Commentary: Data Ethics* 246

Institutional review boards 248

Informed consent 248

Confidentiality 250

Clinical trials 252

Behavioral and social science experiments

v

The binomial setting and binomial distributions 331

Binomial distributions in statistical sampling 333

Binomial probabilities 334

Using technology 336

Binomial mean and standard deviation 338

The Normal approximation to binomial distributions 340

223

Observation versus experiment 223

Subjects, factors, treatments 225

How to experiment badly 228

Randomized comparative experiments 229

The logic of randomized comparative experiments

Cautions about experimentation 234

Matched pairs and other block designs 236

CHAPTER 10

Introducing Probability

D E T A I L E D T A BL E O F CON TE N TS

287

Part II summary 419

Test yourself 423

Supplementary exercises

Pa r t I I I

307

Independence and the multiplication rule

The general addition rule 312

Conditional probability 314

The general multiplication rule 316

Independence again 318

Tree diagrams 318

CHAPTER 17

From Exploration to Inference: Part II Review

431

Inference about

435

Variables

308

417

CHAPTER 18

Inference about a Population Mean

437

Conditions for inference about a mean 437

The t distributions 438

The one-sample t confidence interval 440

FM.indd Page vi 11/18/11 11:53:50 PM user-s163

vi

user-F452

DETA ILED TA B LE O F CO N T E N T S

The one-sample t test 443

Using technology 446

Matched pairs t procedures 449

Robustness of t procedures 452

The chi-square test statistic 560

Cell counts required for the chi-square test 561

Using technology 562

Uses of the chi-square test 567

The chi-square distributions 570

The chi-square test for goodness of fit* 572

CHAPTER 19

Two-Sample Problems 465

CHAPTER 24

Inference for Regression

Two-sample problems 465

Comparing two population means 466

Two-sample t procedures 469

Using technology 474

Robustness again 477

Details of the t approximation* 480

Avoid the pooled two-sample t procedures* 481

Avoid inference about standard deviations* 482

CHAPTER 25

One-Way Analysis of Variance: Comparing Several

Means 623

The sample proportion p̂ 494

Large-sample confidence intervals for a proportion 496

Accurate confidence intervals for a proportion 499

Choosing the sample size 502

Significance tests for a proportion 504

Comparing several means 625

The analysis of variance F test 625

Using technology 628

The idea of analysis of variance 631

Conditions for ANOVA 633

F distributions and degrees of freedom

Some details of ANOVA* 640

515

Two-sample problems: proportions 515

The sampling distribution of a difference between

proportions 516

Large-sample confidence intervals for comparing

proportions 517

Using technology 518

Accurate confidence intervals for comparing proportions

Significance tests for comparing proportions 522

Notes and Data Sources

Tables

520

CHAPTER 22

Inference about Variables: Part III Review 533

Part III summary 536

Test yourself 538

Supplementary exercises

587

Conditions for regression inference 589

Estimating the parameters 590

Using technology 593

Testing the hypothesis of no linear relationship 597

Testing lack of correlation 598

Confidence intervals for the regression slope 600

Inference about prediction 602

Checking the conditions for inference 607

CHAPTER 20

Inference about a Population Proportion 493

CHAPTER 21

Comparing Two Proportions

•

655

675

TABLE A

TABLE B

TABLE C

TABLE D

TABLE E

Standard Normal probabilities 676

Random digits 678

t distribution critical values 679

Chi-square distribution critical values 680

Critical values of the correlation r 681

Answers to Selected Exercises

Index

545

Inference about

Relationships

CHAPTER 23

Two Categorical Variables: The Chi-Square Test 553

Two-way tables 553

The problem of multiple comparisons 556

Expected counts in two-way tables 558

551

682

733

Pa r t V

Pa r t I V

637

Optional Companion

Chapters

(available on the BPS CD and online)

CHAPTER 26

Nonparametric Tests 26-3

Comparing two samples: the Wilcoxon rank sum test

The Normal approximation for W 26-8

26-4

FM.indd Page vii 11/9/11 3:58:32 PM user-s163

user-F452

•

Using technology 26-10

What hypotheses does Wilcoxon test? 26-13

Dealing with ties in rank tests 26-14

Matched pairs: the Wilcoxon signed rank test 26-19

The Normal approximation for W ⫹ 26-22

Dealing with ties in the signed rank test 26-24

Comparing several samples: the Kruskal-Wallis test 26-27

Hypotheses and conditions for the Kruskal-Wallis test 26-29

The Kruskal-Wallis test statistic 26-29

CHAPTER 27

Statistical Process Control

27-3

Processes 27-4

Describing processes 27-4

The

_ idea of statistical process control 27-9

x charts for process monitoring 27-10

s charts for process monitoring 27-16

Using control charts 27-23

Setting up control charts 27-25

Comments on statistical control 27-32

Don’t confuse control with capability! 27-34

Control charts for sample proportions 27-36

Control limits for p charts 27-37

D E T A I L E D T A BL E O F CON TE N TS

CHAPTER 28

Multiple Regression* 28-3

Parallel regression lines 28-4

Estimating parameters 28-8

Using technology 28-13

Inference for multiple regression 28-16

Interaction 28-26

The multiple linear regression model 28-32

The woes of regression coefficients 28-39

A case study for multiple regression 28-41

Inference for regression parameters 28-53

Checking the conditions for inference 28-58

CHAPTER 29

More about Analysis of Variance

29-3

Beyond one-way ANOVA 29-3

Follow-up analysis: Tukey pairwise multiple

comparisons 29-8

Follow-up analysis: contrasts* 29-12

Two-way ANOVA: conditions, main effects, and

interaction 29-16

Inference for two-way ANOVA 29-23

Some details of two-way ANOVA* 29-32

vii

FM.indd Page viii 11/9/11 3:58:33 PM user-s163

user-F452

To the Instructor: About this Book

elcome to the sixth edition of The Basic Practice of Statistics. This book

is the cumulation of 40 years of teaching undergraduates and 20 years of

writing texts. Previous editions have been very successful, and we think

that this new edition is the best yet. In this preface we describe for instructors the

nature and features of the book and the changes in this sixth edition.

BPS is designed to be accessible to college and university students with limited

quantitative background—“just algebra” in the sense of being able to read and use

simple equations. It is usable with almost any level of technology for calculating

…

Purchase answer to see full

attachment

#### Why Choose Us

- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee

#### How it Works

- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "
**PAPER DETAILS**" section. - Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “
**CREATE ACCOUNT & SIGN IN**” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page. - From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.