In This Chapter

^ Understanding correlation from a nonparametric point of view Finding and interpreting Spearmen’s rank correlation

M «ata analysts commonly look for and try to quantify relationships W\f Between two variables, X And Y. Depending on the type of data you’re dealing with in X And Y, There are different procedures to use for quantifying their relationship.

When X And Y Variables are Quantitative (that is, their possible outcomes are measurements or counts), the correlation coefficient (also known as the Pearson’s correlation coefficient) Measures the strength and direction of their linear relationship. (See Chapter 4 for all the info on Pearson’s correlation coefficient, denoted by r.) If X And Y Are both Categorical Variables (their possible outcomes are categories that have no numerical meaning; for example male and female), you use Chi-square procedures and conditional probabilities to look for and describe their relationship. All of that machinery is laid out in Chapters 13 and 14.

Then there is a third type of variable, called Ordinal Variables (their values fall into categories, but the possible values can be placed into an order and given a numerical value that has some meaning, for example, grades on a scale of A = 4, B = 3, C = 2, D = 1, and E = 0 or a student’s evaluation of a teacher on a scale from best [5] to worst [1]). To look for a relationship between two ordinal variables like these, use Spearman’s rank correlation; it’s the nonparametric counterpart to Pearson’s correlation coefficient (Chapter 4). In this chapter, you see why ordinal variables don’t meet Pearson’s conditions, and you see how to use and interpret Spearman’s rank correlation to correctly quantify and interpret the relationship between two ordinal variables.

Pickin’ On PeaRSon and His PREcious Conditions

Pearson’s correlation coefficient is the most common correlation measure out there, and many data analysts think it’s the only one out there. Trouble is, Pearson’s correlation has certain conditions that must be met before using it. If those conditions are not met, Spearman’s correlation is waiting in the wings. In this section, you see the conditions for Pearson’s correlation and how they are easy pickin’s for Spearman’s rank correlation.

The Pearson correlation coefficient R (the correlation) is a number that measures the direction and strength of the linear relationships between two variables X And Y. (For more info on the correlation, see Chapter 4.)

Several conditions have to be met for ol’ Pearson:

The variables X And Y Must have a linear relationship (as shown on a scatterplot; see Chapter 4).

Both variables X And Y Must be numerical (or quantitative). That is, they must represent measurements with no restriction on their level of precision. For example, numbers with many places after the decimal point (such as 12.322 or 0.219) must be possible.

The Y Values must have a normal distribution for each X, With the same variance at each X.

One of the most common instances where Pearson’s conditions aren’t met is when the two variables are ordinal. Ordinal data Comes in categories that can be assigned numerical values that make sense. However, typically with ordinal variables, you won’t see many different categories offered or compared for simplicity reasons. This means there won’t be enough numerical values to try to build a linear regression model for two ordinal variables like you can with two quantitative variables. (Because there are typically not enough categories offered with an ordinal variable, Pearson’s conditions aren’t met.) That also makes condition three impossible.

As well, if you have a gender variable with categories male and female, you can assign the numbers 1 and 2 to each gender, but those numbers have no numerical meaning. Gender isn’t an ordinal variable; rather it is a Categorical variable (a variable that places individuals into categories only). Categorical variables, such as gender, also don’t lend themselves to linear relationships, so they don’t meet Pearson’s conditions either. (To explore relationships between categorical variables, see Chapter 14.)

Some people are lucky enough to have a statistic actually named after them. Typically, the person who came up with the statistic in the first place, recognizing a need for it and coming up with a solution, gets the honor. If the new statistic gets picked up and used by others, it eventually takes on the name of its inventor.

Spearman’s rank correlation is named after its inventor, Charles Edward Spearman, who lived from 1863 to 1945. He was an English psychologist who studied experimental psychology and worked in the area of human intelligence. He was a professor for many years at the University College London. Spearman followed closely the

Work of Francis Galton, who Originally Developed the concept of correlation. Spearman developed his rank correlation in 1904.

Pearson’s correlation coefficient was developed several years prior, in 1893 by Karl Pearson, one of Spearman’s fellow colleagues at University College London and another follower of Galton. Pearson and Spearman didn’t get along. Pearson had an especially strong and volatile personality, and had problems getting along with quite a few people in fact. Such is the way of some of the more brilliant people of the 19th century.

Scoring with Spearman’s Rank Correlation

Spearman’s rank correlation doesn’t require the relationship between the variables X And Y To be linear, nor does it require the variables to be numerical. You use Spearman’s rank when the variables are ordinal and/or quantitative. Rather than examining a linear relationship between X And Y, Spearman’s rank correlation tests whether two ordinal and/or quantitative variables are dependent (in other words, related to each other).

Note: Spearman’s rank applies to ordinal data only. To test to see if two categorical (and non-ordinal) variables are independent, you use a Chi-square test; see Chapter 14.

Spearman’s rank correlation is the same as Pearson’s correlation except that it’s calculated based on the ranks of the X Variable and the ranks of the Y Variable rather than their actual values. You interpret the value of Spearman’s rank correlation, Rs The same way you interpret Pearson’s correlation, R (see Chapter 4). The values of Rs Can go between -1 and +1. The higher the magnitude of Rs (in the positive or negative directions), the stronger the relationship

Between X And Y. If Rs Is zero, this indicates that X And Y Are independent. However, if the correlation between X And Y Is not zero, you can’t say whether or not they’re independent.

In this section, you see how to calculate and interpret Spearman’s rank correlation and apply it to an example.

Figuring Spearman’s rank correlation

The notation for Spearman’s rank correlation is Rs, Where S Stands for Spearman. To find Rs, You do the steps listed in this section. Minitab does the work for you in steps two through six, although some professors may ask you to do the work by hand (not me of course).

1. Collect the data in the form of pairs of values X And Y.

2. Rank the data from the X Variable where 1 = lowest to N = highest, where N Is the number of pairs of data in the data set. (This gives you a new set of data for the X Variable called the Ranks Of the X Values.)

If any of the values appear more than once, Minitab assigns each tied value the average of the ranks they would normally be given if they were not tied.

3. Complete step two with the data from the Y Variable. (This gives you a new data set called the Ranks Of the Y-values.)

4. Find the standard deviation of the ranks of The x-valueS, using the

, , , , ….. Ix (X – xH2 „ . ,

Usual formula for standard deviation, Sx = J —–s-^-; call it sx. In

N-1

A similar manner find the standard deviation of the ranks of the

Ix _ Y – y I2

Y-values using Sy = J —–Y*—; call it sy.

(Note that n is the sample size, X Is the mean of the ranks of the X Values, and Y Is the mean of the ranks of the Y Values.

5. Find the Covariance Of the X-y Values, using the formula

Cov _ X, y I =-(-^i———; call it sxy.

The covariance of x and y is a measure of the total deviation of the X And Y Values from the point _ X,~y ^.

6. Calculate the value of Spearman’s rank correlation by using the formula Rs = ss .

Notice that the formula for Spearman’s rank correlation is just the same as the formula for Pearson’s correlation coefficient, except the data Spearman uses for his correlation formula is the ranks of X And the ranks of Y, Rather than the original X – And y-values as used by Pearson. So Spearman just cares about the order of the values of the X’s And the Y’s, Not their actual values.

To calculate Spearman’s rank correlation straightaway by using Minitab, rank the X-values, rank the Y-values, and then find the correlation of the ranks. That is, go to Data>Rank and click on the X Variable to get X Ranks. Then do the same thing to get the Y Ranks. Now go to Stat>Basic Statistics>Correlation, click on the two columns representing ranks, and click OK.

Watching Spearman at Work: Relating aptitude to performance

Knowing the process of how to calculate Spearman’s rank correlation is one thing, but if you can apply it to real-world situations, you’ll be the golden child of the statistics world (or at least your intermediate stats class). So, try to put yourself in this section’s scenario to get the full effect of Spearman’s rank correlation.

You’re a statistics professor, and you give exams every now and then (it’s a dirty job, but someone’s got to do it). After looking at students’ final grades over the years (yes, you’re an old professor, or at least in your mid-forties), you notice that students who do well in your class tend to have a better aptitude (background ability) for math and statistics. You want to check out this theory, so you give students a math and statistics aptitude test on the first day of the course; you want to compare students’ aptitude test scores with their final grades at the end of the course.

Now for the specifics. Your variables are X = aptitude test score (using a 100-point pretest on the first day of the course) and Y = final grade, on a scale from 1 to 5 where 1 = F (failed the course); 2 = D (passed); 3 = C (average); 4 = B (above average); and 5 = A (excellent). The Y Variable, final grade, is an ordinal variable, and the X Variable, aptitude, is a numerical variable. You want to find out whether there’s a relationship between X And Y. You collect data on a random sample of 20 students; the data are shown in Table 20-1. This is step one of the process of calculating Spearman’s rank correlation (from the steps listed in the previous section).

Table 20-1

Aptitude Test Scores and Final Grades in Statistics

Student

Aptitude

Final Grade

1

59

3

2

47

2

3

58

4

4

66

3

5

77

2

6

57

4

7

62

3

8 68 3

9

69

5

10

36

1

11

48

3

12

65

3

13

51

2

14

61

3

15

40

3

16

67

4

17

60

2

18

56

3

19

76

3

20

71

5

Using Minitab for the aptitudes and final grades example, you get a correlation of 0.379. The following discussion walks you through steps two through six as you do this correlation yourself. This is likely what you may be asked to do on an exam.

Steps two and three of finding Spearman’s rank correlation are to rank the aptitude test scores (x) From lowest (1) to highest; then rank the final grades (y) From lowest (1) to highest. Note that the final exam grades have several ties, so you use average ranks. For example, in column three of Table 20-1 you

See a single 1, which gets rank 1. Then you see four 2s. Their ranks, had they not been tied, would be 2, 3, 4, and 5. The average of these four ranks is

= 2 + 3 + 4 + 5 = 14 = 3.5. Each of the 2s in column three, therefore, receive

Rank 3.5.

Table 20-2 shows the original data, the ranks of the aptitude scores (x), and the ranks of the final grades (y) as calculated by Minitab.

Table 20-2

Aptitude Test Scores, Final Exam Grades, and Rank

Student

Aptitude Rank of Aptitude

Final Grade

Rank of Final Grade

1

59

9

3

10.5

2

47

3

2

3.5

3

58

8

4

17.0

4

66

14

3

10.5

5

77

20

2

3.5

6

57

7

4

17.0

7

62

12

3

10.5

8

68

16

3

10.5

9

69

17

5

19.5

10

36

1

1

1.0

11

48

4

3

10.5

12

65

13

3

10.5

13

51

5

2

3.5

14

61

11

3

10.5

15

40

2

3

10.5

16

67

15

4

17.0

17

60

10

2

3.5

18

56

6

3

10.5

19

76

19

3

10.5

20

71

18

5

19.5

For step four of the process of finding Spearman’s rank correlation, you have Minitab calculate the standard deviation of the aptitude test score ranks (located in column two of Table 20-2) and the standard deviation of the final grades (located in column four of Table 20-2). In step five, you have Minitab calculate the covariance of the ranks of aptitude test scores and final grade ranks. These statistics are shown in Figure 20-1.

Figure 20-1:

Standard deviations and

Covariance of ranks of aptitude (x) And final grade (y).

For the sixth and final step of finding Spearman’s rank correlation, calculate Rs By taking the covariance of the ranks of X And Y, Divided by the standard deviation of the ranks of X(sx) Times the standard deviation of the ranks of Y(sy).

You get 5 no23r 50 = 0.379. This matches the value for Spearman’s correla -

5.92 * 5.50

Tion that was found by Minitab straightaway.

This correlation of 0.379 is fairly low, indicating a weak relationship between aptitude scores before the course and final grades at the end of the course. The moral of the story? If you aren’t the sharpest tack in the bunch, you can still hope, and if you come in on top, you may not go out the same way. Although, there is still something to be said about working hard during the course (buying Intermediate Statistics For Dummies Certainly doesn’t hurt!).

  • Автор: Анкар
  • Категории: (8

In This Chapter

^ Comparing more than two population medians with the Kruskal-Wallis test

^ Determining which populations are different by using the Wilcoxon rank sum test

Statisticians who are in the nonparametrics business make it their jobs to always find a nonparametric equivalent to a parametric procedure (one that doesn’t depend on the normal distribution). And in the case of comparing more than two populations, these stats superheroes didn’t let us down. In this chapter, you see how the Kruskal-Wallis test works to compare more than two populations as a nonparametric procedure. If Kruskal-Wallis tells you at least two populations differ, you also figure out how to use the Wilcoxon rank sum test to determine which population is different.

Doing the Kruskal-Wallis Test to Compare More than Two Populations

The Kruskal-Wallis test compares the medians of several (more than two) populations to see whether or not they are different. The basic idea of Kruskal-Wallis is to collect a sample from each population, rank all the combined data from smallest to largest, and then look for a pattern in how those ranks are distributed among the various samples. For example, if one sample gets all the low ranks and another sample gets all the high ranks, perhaps their population medians are different. Or if all the samples have an equal mix of all the ranks, perhaps the medians of the populations are all deemed to be the same. In this section, you see exactly how the Kruskal-Wallis test is conducted using ranks and sums and all that good stuff, and you see it applied to an example comparing airline ratings.

Suppose your boss flies a lot, and she wants you to determine which of three airlines gets the best ratings from customers. You know that ratings involve data that is just not normal (pun intended), so you opt to use the Kruskal-Wallis test. You take three random samples of nine people each from three different airlines. You ask each person to rate his satisfaction with the one airline for which you chose that person to rate. Each person uses a scale from 1 (the worst) to 4 (the best). You can see the data from your samples in Table 19-1.

Table 19-1 Customer Ratings of Three Airlines

Airline A Rating Airline B Rating Airline C Rating

4 2 2

3 3 3

4 3 3 4 3 2 3 4 2 3 4 1 2 3 3 342 432

In looking at the data in Table 19-1, it appears that airlines A and B have better ratings than airline C. However, the data has a lot of variability in it, so you have to conduct a hypothesis test before you can make any general conclusions beyond this data set.

You may be thinking of using ANOVA to analyze this data (the test that compares the means of several populations and is found in Chapter 9). But the data from each airline is ratings from 1 to 4, and this blows the strongest condition of ANOVA — the data from each population must follow a normal distribution. (A Normal distribution Is continuous, meaning it takes on all real numbers in a certain range. Data that are whole numbers like 1, 2, 3, and 4 don’t fall under this category.)

But don’t sweat; a nonparametric alternative fits the bill. The Kruskal-Wallis test compares the medians of several (more than two) populations to see whether they are all the same or not. In other words, it’s like ANOVA, except it’s done with medians not means.

In this section, you discover how to check the conditions of the Kruskal-Wallis test, set it up, and carry it out step by step.

CheckINg the condITIOns

Following are all of the conditions of the Kruskal-Wallis test that must be met:

The random samples taken from each population are independent. (This means matched-pairs data like in Chapter 17 are out of this picture.)

All the populations have the same distribution. (That is, their shapes are the same as seen on a histogram.)

The variances of the populations are the same. That means the amount of spread in the population values is the same from one population to the next.

Note that these conditions mention shape and spread, but they don’t mention the center of the distributions. That’s what the test is trying to determine, whether the populations are centered at the same place.

,^fi-ST(/j^ In nonparametrics, you often see the word Location In reference to a population ^-^jtgjrv distribution rather than the Center, Although the two words mean about the {2 (J ) Same thing. Location indicates where the distribution is sitting on the number line. If you have two bell-shaped curves with the same variance, and one has mean 10 and the other has mean 15, the second distribution is located five units to the right of the first. In other words, it’s location is a five-unit shift to the right of the first distribution. In nonparametrics, where you don’t have bell-shaped distributions, you typically use the median as a measure of location (center) of a distribution. So throughout this discussion, you could use the word Median Instead of location (although location leaves it a bit more open).

Regarding the airline survey, you know that the samples are independent, because you didn’t use the same person to rate more than one airline. The other two conditions have to do with the distributions the samples came from; each population must have the same shape and the same spread. You can examine both conditions by looking at boxplots of the data (see Figure 19-1) and descriptive statistics, such as the median, standard deviation, and the rest of the summary statistics making up the boxplots (see Figure 19-2).

The boxplots in Figure 19-1 all have the same shape, and their standard deviations, shown in Figure 19-2, are very close. All of this evidence taken together allows you to go ahead with the Kruskal-Wallis test. (Now looking at the overlap in the boxplots for airlines A and B, in Figure 19-1, you can also make an early prediction that airlines A and B have similar ratings; whether C is different enough from A and B is impossible to say without running the hypothesis test.)

Figure 19-1:

Boxplots comparing the ratings of three airlines.

Figure 19-2:

Descriptive statistics comparing the ratings of three airlines.

Descriptive Statistics: Rating

Variable Airline StDev

Minimum

Q1

Median

Q3

Maximum

Rating A 0.707

2.000

3.000

3.000

4.000

4.000

B 0.667

2.000

3.000

3.000

4.000

4.000

C 0.667

1.000

2.000

2.000

3.000

3.000

Either a boxplot or a histogram can tell you about the shape and spread of a distribution (as well as the center). The Boxplot Is a common type of graph to use for nonparametric procedures because it displays the median (the non-parametric statistic of choice) rather than the mean. A Histogram Is at its best showing the shape of the data; it doesn’t directly tell where the center is — you just have to eyeball it. Go ahead with the boxplot versus the histogram for the airline data.

To make boxplots of each sample of data show up side by side on one graph (called side-by-side boxplots, cleverly) in Minitab, click on Graph>Box Plots and select the Multiple Y’s Simple version. In the left-hand box, click on each of the column names for your data sets. They each appear in the Graph Variables window on the right. Click OK and you get a set of boxplots that are side by side, all on the same graph using the same scale (slick, huh?).

Setting up the test

The Kruskal-Wallis test assesses Ho: All K Populations have the same location versus Ha: The location of at least two of the K Populations are different. (Here, K Is the number of populations you’re comparing.)

In Ho, you see that all the populations have the same location (which means they all sit on top of each other on the number line and are in essence the same population). Ha is looking for the opposite situation in this case. However, the opposite of "the locations are all equal" isn’t "the locations are all different." The opposite is that at least two of them are different. Failure to recognize this difference will lead you to believe all the populations differ when, in reality, there may only be two that differ, and the rest are all the same. That’s why you see Ha stated the way it is in the Kruskal-Wallis test. (The same idea holds for comparing means using ANOVA; see Chapter 9.)

For the airline satisfaction example (see Table 19-1), your setup looks like this: Ho: The satisfaction ratings of all three airlines have the same median versus Ha: The median satisfaction ratings of at least two airlines are different.

Conducting the test step by step

After you’ve determined your hypotheses, and checked the conditions, you must carry out the test. Here are the steps for conducting the Kruskal-Wallis test, using the airline example to show how each step works:

1. Rank all the numbers in the entire data set from smallest to largest (using all samples combined); in the case of ties, use the average of the ranks that the values would have normally been given.

For an example of a tie, say that on a scale from 1 to 4, the observations 1, 1, 1 would normally have gotten ranks 1, 2, 3 if they were different, but because they’re equal, give each one the average of 1, 2, 3, which is (1 + 2 + 3) _ ,

–o – = 2. Figure 19-3 shows the results for ranking and summing

3

The data in the airline example.

In Figure 19-3, you can see how to rank the ties. For example, you have only one 1, which is given rank 1. Then you have seven 2s, which normally would have gotten ranks 2, 3, 4, 5, 6, 7, and 8. Because the 2s are all equal, you give each of them the average of all these ranks, which is

(2 + 3 + 4 + 5 + 6 + 7 + 8) r „ , , , „ ,

–=7– = 5. Similarly, you see twelve 3s, whose ranks

7

Would be 9 through 20. Because they’re all equal, give them each a rank

Equal to ——10 10 —20) = 14.5. Finally, you see seven 4s, each with rank 12

24, which is the average of their would-be ranks, ranging from 21 to 27.

Figure 19-3:

Rankings and rank sum for the airline example.

Airline A Rating Rank

Airline B Rating Rank

Airline C Rating Rank

4

24

2

5

2

5

3

14.5

3

14.5

3

14.5

4

24

3

14.5

3

14.5

4

24

3

14.5

2

5

3

14.5

4

24

2

5

3

14.5

4

24

1

1

2

5

3

14.5

3

14.5

3

14.5

4

24

2

5

4

24

3

14.5

2

5

71=159

1

72 = 149.5

2

73 = 69.5

3

2. Total the ranks for each of the samples; call those totals T1, T2, . . ., Tk, Where K Is the number of populations.

The totals of the ranks in each column of Figure 19.3 for the airline data are T1 = 159, T2 = 149.5, and T3 = 69.5. In the steps that follow, you use these rank totals in the Kruskal-Wallis test statistic (denoted KW). (Note T1 and T2 are close to equal, but T3 is much lower, giving the idea that airline C may be the odd man out.)

3. Calculate the Kruskal-Wallis test statistic, KW = -tt – ! -J – - 3 (N +1),

‘ n(n + 1)"^ NJ v ;’

Where N Is the total number of observations (all sample sizes combined). Continuing with the airline example, the Kruskal-Wallis test statistic is KW = 27(27 + 1)(+ 14q5 + 695 J - 3(27 + 1), which equals 0.0159 * 5,829.056 – 3(28) = 8.52.

4. Find the p-value.

You find the p-value for your KW test statistic by comparing it to the Chi-square distribution with K - 1 degrees of freedom (Table A-3 in the Appendix). For the airline example, you look at the Chi-square table (Table A-3 in Appendix) and find the row for with 3 – 1 = 2 degrees of freedom. Then look at where your test statistic (8.52) falls in that row. Because 8.52 lies between 7.38 and 9.21 (shown on the table in row two) that means the p-value for 8.52 lies between 0.025 and 0.010 (shown in their respective column headings.)

5. Make your conclusion about whether you can reject Ho by examining the p-value.

You can reject Ho: All populations have the same location, in favor of Ha: At least two populations have differing locations, if the p-value associated with KW is < a, where a is 0.05 (or your prespecified a level). Otherwise, you must fail to reject Ho.

Following the airline example, because the p-value is between 0.010 and 0.025, which are both less than a = 0.05, you can reject Ho. You conclude that the ratings of at least two of the three airlines are different.

To conduct the Kruskal-Wallis test by using Minitab, enter your data in two columns, the first column represents the actual data values and the second column represents which population the data came from (for example, 1, 2, 3). Then click on Stat>Nonparametrics>Kruskal-Wallis. In the left-hand box, click on column one; it appears on the right side as your Response variable. Then click on column two in the left-hand box. This column appears on the right side as the Factor variable. Click OK, and the KW test is done. The main results of the KW test are shown in the last two lines of the Minitab output.

The results of the Minitab data analysis of the airline data are shown in Figure 19-4. On the second-to-last line of Figure 19-4, you can see the KW test statistic for the airline example is 8.52, which matches the one you found by hand (whew!). The exact P-value from Minitab is 0.014.

Kruskal-Wallis Test: Rating versus Airline

Kruskal-Wallis Test

On Rating

Figure 19-4:

Airline

N

Median

Ave

Rank

Z

Comparing

A

9

3.000

17.7

1.70

Ratings

B

9

3.000

16.6

1.21

Of three

C

9

2.000

7.7

-2.91

Airlines by

Overall

27

14.0

Using the

Kruskal-

H = 8.52

DF = 2

P = 0.014

Wallis test.

H = 9.70

DF = 2

P = 0.008

(adjusted

For ties)

-1

However, quite a few ties are in this data set, and the formulas adjust a bit for that (in ways that go outside the scope of this book). Taking those ties into account, the computer gives you KW = 9.70 with a p-value of 0.008. The total evidence here says the same result loud and clear — reject Ho: The ratings for the three airlines have the same location. You conclude that the ratings of at least two of the airlines are different. (But which ones? The answer comes in the next section.)

Most people want life — from football to food portions — to be fair. And nothing appears to be more unfair than car insurance rates, right? You’ve heard the ads; one company claims to offer the lowest possible rates one day and a competing company makes the same claim the very next day. Who can you believe? You decide to grab the wheel and run your own test. You take a random sample of 20 different car and driver combinations (for example, a 40-year-old female with a Ford pickup, or a 78-year-old lady driving a Caddy) and you get the corresponding car insurance estimates from each company for each car and driver combo based on a six-month premium. Knowing that the distribution of prices for each company has no real reason to

Be normal (as in distribution) you go for the Kruskal-Wallis test of their medians. You rank all the premiums from smallest to largest, you sum the ranks that correspond to estimates from each company, and you compare them using the KW statistic. In the end, you might very well find that the companies’ prices don’t look that different after all, because the prices they talk about in their advertisements represent a selective sample of the population of all their prices, and your sample gets more at the heart of the pricing that is actually going on overall. The moral of the story is don’t listen to everything you hear about car insurance rates. Get a cross section of prices and do the Kruskal-Wallis. Your pocketbook will thank you for it.

Pinpointing the Differences: The Wilcoxon Rank Sum Test

Suppose you reject Ho in the Kruskal-Wallis test. That means you have enough evidence to conclude that at least two of the populations have different medians. But you don’t know which ones are different. When someone finds that a set of populations don’t all share the same median, the next question is very likely to be, "Well then, which ones are different?" To find out which populations are different after the Kruskal-Wallis test has rejected Ho, you can use the Wilcoxon rank sum test (also known as the Mann-Whitney test; refer to Chapter 18).

You can’t go looking for differences in specific pairs of populations until you’ve first established that the populations aren’t all the same (that is, Ho is rejected in the Kruskal-Wallis test). If you don’t make this check first, you can encounter a ton of problems, not the least of which being much-increased chance of making the wrong decision.

In the following sections, you see how pairwise comparisons are conducted and interpreted in order to find out where the differences lie among the K Population medians you’re studying.

Pairing off with pairwise comparisons

The rank sum test is a nonparametric test that compares two population locations (for example, their medians). When you have more than two populations, you conduct the rank sum test on every pair of populations in order to see whether differences exist. This procedure is called conducting Pairwise comparisons Or Multiple comparisons. (See Chapter 10 for info on the parametric version of multiple comparisons.) For example, because you’re comparing three airlines in the airline satisfaction example (see Table 19-1), you have to run the rank sum test three times to compare airlines A and B, A and C, and B and C, respectively. So you need three pairwise comparisons to figure out which populations are different.

To determine how many pairs of comparisons you need if you’re given K Populations, you use the formula —(kr-—-. You have K Populations to choose

From first, and then K - 1 populations left to compare them with. Finally, you don’t care what the order is among the populations (as long as you keep track of them); so you divide by two because you have two ways to order any pair (for example, comparing A and B gives you the same results as comparing B and A). In the airlines example, you have K = 3 populations, so you

Should have ^r;- = —^—- = 3 pairs of populations to compare, which

Matches what was determined previously. (For more information and examples on how to count the number of ways to choose or order a group of items by using permutations and combinations, see another book I authored, Probability For Dummies [Wiley].)

Carrying out comparison tests to see who’s different

The Wilcoxon rank sum test assesses Ho: The two populations have the same location versus Ha: The two populations have different locations. Here are the general steps for using the Wilcoxon rank sum test for making comparisons (for detailed step-by-step instructions for the Wilcoxon rank sum test see Chapter 18):

1. Check the conditions for the test by using descriptive statistics and histograms for the last two and proper sampling procedures for the first one:

• The two samples must be from independent populations

• The populations must have the same distribution (shape)

• The populations must have the same variance

2. Set up your Ho: Medians are equal versus Ha: Medians aren’t equal.

3. Combine all the data and rank the values from smallest to largest.

4. Add up all the ranks from the first sample (or the smallest sample if the sample sizes are not equal).

This result is your test statistic, T.

5. Compare T To the critical values in Table A-4 (Appendix) in the row and column corresponding to the two sample sizes.

If T Is at or beyond the critical values (less than or equal to the lower one or greater than or equal to the upper one), reject Ho and conclude the two population medians are different. Otherwise, you can’t reject Ho.

6. Repeat Steps 1-5 on every pair of samples in the data set and draw conclusions.

Sort through all the results to see the overall picture of which pairs of populations have the same median and which ones don’t.

To conduct the Wilcoxon rank sum test for pairwise comparisons in Minitab, refer to Chapter 18. Note that Minitab calls this test by its other name, the Mann-Whitney test.

You can see the Minitab results of the three Wilcoxon rank sum tests comparing airlines A and B, A and C, and B and C, respectively, in Figures 19-5a, 19-5b, and 19-5c.

Before you make any judgments about your hypotheses, you must analyze your data. Figure 19-5a compares the ratings of airlines A and B. The p-value (adjusted for ties) is 0.7325, which is much higher than the 0.05 you need to reject Ho. So you can’t conclude that airlines A and B have satisfaction ratings with different medians. Figure 19-5b shows that the P-value for comparing airlines A and C is 0.0078. Because this P-value is a lot smaller than the typical a level of 0.05, this is very convincing evidence that airlines A and C don’t have the same median ratings. Figure 19-5c also has a small P-value (0.0107), which gives evidence that airlines B and C have significantly different ratings.

Examining the medians to see how they’re different

Now that you know two or more populations have different medians, the next question to answer is how they are different; which one has the higher

Median, which one has the lower median. In this section, you see how to take the results of your pairwise comparisons combined with some descriptive statistics to get your answers.

Point estimate for ETA1-ETA2 is -0.000

95.8 Percent CI for ETA1-ETA2 is (-1.000,1.000)

W = 89.5

Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.7573 The test is significant at 0.7325 (adjusted for ties)

A

Mann-Whitney Test and CI: Airline A, Airline C

N Median

A 9 3.000 C 9 2.000

Point estimate for ETA1-ETA2 is 1.000

95.8 Percent CI for ETA1-ETA2 is (0.000,2.000)

W = 114.5

Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0118

The test is significant at 0.0078 (adjusted for ties)

B

Figure 19-5:

Wilcoxon rank sum tests comparing ratings of two airlines at a time.

Mann-Whitney Test and CI: Airline B, Airline C

N Median B 9 3.000

C 9 2.000

Point estimate for ETA1-ETA2 is 1.000

95.8 Percent CI for ETA1-ETA2 is (0.000,2.000)

W = 113.0

Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0171

The test is significant at 0.0107 (adjusted for ties)

After you’ve rejected Ho for a multiple comparison, that means the two populations you examined have different medians. There are two ways to proceed from here to see how the medians differ:

You can look at side-by-side boxplots of all the samples and compare their medians (located at the line in the middle of each box).

You can calculate the median of each sample and see which ones are higher and which ones are lower (from the populations you have concluded are statistically different).

From the previous section, you see that the pairwise comparisons for the airline data conducted by Wilcoxon rank sum tests conclude that the ratings of airlines A and B aren’t found to be different, but both of them are found to be different from airline C.

But you can say even more; you can say how the differing airline compares to the others. Going back to Figure 19-2, you see the medians of both airlines A and B are 3.0, while the median of airline C is only 2.0. That difference means airlines A and B have similar ratings, but airline C has lower ratings than A and B.

The boxplots in Figure 19-1 confirm these results. By looking at these box-plots first, you may have had an idea that A and B were the same, but you didn’t know whether airline C was statistically significantly different from airlines A and B. And now you know it is.

  • Автор: Анкар
  • Категории: (8

(8

15 Май
0

Figure 18-1b shows that the median for Suzy (131 days on the market) is less than the median for Tommy (175 days). It may appear Suzy sells homes faster than Tommy. However, the results aren’t exactly clear-cut. A portion of the two boxplots (Figure 18-1a) overlap with each other. You may not be able to declare Suzy the clear winner as being the fastest real estate agent. You need a hypothesis test to make that final determination.

At

TEsting the hypotheses

The null hypothesis for the real estate agent test (from previous sections) is Ho: R|i = N2, Where R|i = median days on the market for the population of all Suzy’s homes sold in the last year, and R|2 = median days on the market for the population of all Tommy’s homes sold in the last year. The alternative hypothesis is Ha: R|i ^r|2.

After you looked at the data, you developed a hunch that if one of the agents sold homes faster, it was Suzy. However, before you saw the data, you had no preconceived notion as to whom was faster. You must base your Ho and Ha on what your thoughts were Before You looked at the data, not after. Setting up your hypotheses after you collect the data is unfair and unethical.

After you determine your Ho and Ha, the time has come to test your data. So, keep reading to figure out what this test looks like in a real-life example.

Combining and ranking

The first step in the data analysis is to combine all the data together and rank the days on the market from lowest (rank = 1) to highest. You can see the overall ranks for the combined data in Table 18-2.

In the case of ties, you give both of the values the average of the ranks they normally would have received. You can see in Table 18-2 that two values of 145 are in the data set. Because they represent the sixth and seventh numbers in the ordered data set, you give each of them the same rank of (6*7>2 = 6.5.

Table 18-2 Ranks of Combined Data from the Real Estate Example

Suzy Sellfast

Overall Rank

Tommy Nowait

Overall Rank

48

1

109

4

97

2

145

6.5

103

3

160

9

117

5

165

10

145

6.5

185

11

151

8

250

13

220

12

251

14

300

15

350

16

Finding the test statistic

After you’ve ranked your data, you can determine which group is group one, so you can find your test statistic, T. Because the sample sizes are equal, let group one be Suzy, because her data is given first. Now sum the ranks from Suzy’s data set. The sum of Suzy’s ranks is 1 + 2 + 3 + 5 + 6.5 + 8 + 12 + 15 = 52.5; this value of T Is your rank sum test statistic.

Determining whether you can reject Ho

Suppose you want to use a = 0.05 for this test; using this cutoff means that you use Table A-4 (see Appendix), because you have a two-sided test at level a = 0.05. Looking at Table A-4, you go to the column for n1 = 8 and the row for n2 = 8. You see TL = 49 and TU = 87. You reject Ho if T Is outside this range; in other words, reject Ho if T< TL = 49 or if T> TU = 87. Your statistic T = 52.5 doesn’t fall outside this range; you don’t have enough evidence to reject Ho at the a = 0.05 level. So you can’t say that you see a difference in the median days on the market for Suzy and Tommy.

These results may seem very strange given the fact that the medians for the two data sets were so different: 131 days on the market for Suzy compared to 175 days on the market for Tommy. However you have two strikes against you in terms of being able to find a real difference here:

The sample sizes are quite small (only eight in each group). A small sample size makes it very hard to get enough evidence to reject Ho.

The standard deviations are both in the high 70s, which is quite large compared to the medians.

Both of these problems make it hard for the test to actually find anything through all the variability the data shows.

To conduct the rank sum test by using Minitab, click on Stat>Nonparametric> Mann-Whitney. Select your two samples and choose your alternate Ha as >, <, or ^. The Confidence Level is equal to one minus your value of A. After you make all of these settings, click on OK.

Figure 18-2 shows the Minitab output when you conduct the rank sum test on the real estate data. To interpret the results in Figure 18-2, you must note that the Mann-Whitney test is just another word for the rank sum test. Also, Minitab writes ETA rather than R For the medians. The results at the bottom of the output say that the test for equal (versus nonequal) medians is significant at the level 0.1149, when adjusting for ties. This is your p-value adjusted for ties. (Note that if no ties are present in your data, you use the results just above that line. That gives you the P-value not adjusted for ties.)

Figure 18-2:

Using the rank sum test to figure out who sells homes faster.

Mann-Whitney Test and CI: Suzy, Tommy

N Median Suzy 8 131.0 Tommy 8 175.0

Point estimate for ETA1-ETA2 is -49.0 95.9 Percent CI for ETA1-ETA2 is (-137.0, 36.0) W = 52.5

Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1152

The test is significant at 0.1149 (adjusted for ties)

To make your final conclusion, compare your P-value to your pre-specified level of a (typically 0.05.) If your a level is 0.1149 (or larger), you reject Ho; otherwise you can’t. In this case, because 0.1149 is greater then 0.05, you can’t reject Ho. That means you don’t have enough evidence to say the population medians for days on the market for Suzy’s versus Tommy’s houses are different based on this data. These results confirm your conclusions from the previous section.

The Minitab output in Figure 18-2 also provides a confidence interval for the difference in the medians between the two populations, based on the data from these two samples. The difference in the sample medians (Suzy – Tommy) is 131.0 – 175.0 = -44.0. Adding and subtracting the margin of error (these calculations are beyond the scope of this book), Minitab finds the confidence interval for the difference in medians (Suzy – Tommy) is -137.0, +36.0. The difference in the population medians could be anywhere from -137.0 to 36.0. Because 0, the value in Ho, is in this interval, you can’t reject Ho in this case. So again, you can’t say that the medians are different, based on this (limited) data set.

Rank sum tests can be used to compare two groups of judges of a competition, to see whether there is a difference in their scores. For example, in the Olympic ice-skating events, the gender of the judges is sometimes suspected to play a role in the scores they give to certain skaters. Suppose you have a men’s ice-skating competition and you have ten judges: five males and five females. You want to know whether male and female judges score the competitors in the same way, so you do a rank sum test to compare their median scores. Your hypotheses are Ho: male and female judges have the same median score versus Ha: they have different median scores. For your sample, you let each

Judge score the same individual. You rank their scores in order from lowest to highest and label M for a male judge and F for a female judge. Your results are the following: F M M M M F F F F M. The value of the test statistic T Is the sum of the ranks for group one (say the males), which gives you T = 2 + 3 + 4 + 5 + 10 = 24. Now compare that to the critical values in Table A-4 (Appendix), where both sample sizes equal five, and you get TL = 18 and TU = 37. Because your test statistic, T = 24, is inside this interval, you fail to reject Ho: judging is the same for male and female judges. You just don’t have enough evidence to say that they differ.

  • Автор: Анкар
  • Категории: (8