Using Chi-Square Tests for Goodness-of-Fit (Your Data, Not Your Jeans)

15 Май
0

In This Chapter

^ Understanding what goodness-of-fit really means

^ Using the Chi-square model to test for goodness-of-fit

^ Looking at the conditions for goodness-of-fit tests

Any phenomena in life may appear to be random in the short term, but actually occur according to some preconceived, preselected, or predestined model over the long term. For example, while you don’t know whether it will rain tomorrow, your local meteorologist can give you her model for the percentage of days that it rains, snows, is sunny, or cloudy, based on the last five years. Whether or not this model is still relevant this year is anyone’s guess, but it’s a model nonetheless. As another example, a biologist can produce a model for predicting the number of goslings raised by a pair of geese per year, even though you have no idea what the pair in your backyard will do. Is his model correct? Here’s your chance to find out.

In this chapter, you build models for the proportion of outcomes that fall into each category for a categorical variable. You then test these models by collecting data and comparing what you observe in your data to what you expect from the model. You do this through a goodness-of-fit test that’s based on the Chi-square distribution. In a way, a goodness-of-fit test is likened to a reality check of a model for categorical data.

Finding the Goodness-of-Fit Statistic

The general idea of a Goodness-of-fit Procedure involves determining what you expect to find and comparing it to what you actually observe in your own

Sample through the use of a test statistic. This test statistic is called the Goodness-of-fit test statistic, Because it measures how well your model (what you expected) fits your actual data (what you observed).

In this section, you see how to figure out the numbers that you should expect in each category given your proposed model, and you also see how to put those expected values together with your observed values to form the goodness-of-fit test statistic.

What’s observed Versus what’s expected

For an example of something that can be observed versus what’s expected, look no further than a bag of tasty M&M’S Milk Chocolate Candies. (A ton of different kinds of M&M’S are out there, and each kind has its own variation of colors and tastes. But for this study, any reference I give to M&M’S is to the original milk chocolate candy – my favorite.) The percentage of each color of M&M’S that appear in a bag is something Mars (the company that makes M&M’S) spends a lot of time thinking about. Mars does have specific percentages of each color that they want in their M&M’S bags, which it determines through comprehensive marketing research based on what people like and want to see. Mars then posts their current percentages for each color of M&M’S on their Web site. Table 15-1 shows the percentage of M&M’S of each color in 2006.

Table 15-1

Expected Percentage of Each Color of M&M’S

Milk Chocolate Candies (2006)

Color

Percentage

Brown

13%

Yellow

14%

Red

13%

Blue

24%

Orange

20%

Green

16%

Now that you know what to expect from a bag of M&M’S, the next question is how does Mars deliver? If you open a bag of M&M’S right now, would you get the percentages of each color that you’re supposed to get? You know from your previous studies in statistics that sample results vary (for a quick review of this idea, see Chapter 3). So you can’t expect each bag of M&M’S to

Have exactly the correct number of each color of M&M’S as listed in Table 15-1. However, in order to keep customers happy, Mars should get close to the expectations. How can you determine how close they do get?

You now know what percentages are expected to fall into each category in the entire population of all M&M’S (that means every single M&M’S Milk Chocolate Candy that’s currently being made), from Table 15-1. This set of percentages is called the Expected model For the data. You want to see whether the percentages in the expected model are actually occurring in the packages you buy. To start this process, you can take a sample of M&M’S (after all, you can’t check every single one in the population) and make a table showing what percentage of each color you observed. Then you can compare this table of observed percentages to the expected model.

The expected percentages are either given to you, as they are for the M&M’S, or you can figure them out by using math techniques. For example, if you’re examining a single die to determine whether or not it’s a fair die, you know that if the die is fair, you should expect % of the outcomes to fall into each category of 1, 2, 3, 4, 5, and 6.

As an example, I examined one 1.69-ounce bag of plain, milk-chocolate M&M’S (tough job, but someone has to do it), and you can see my results in Table 15-2. (Think of this bag as a random sample of M&M’S, even though it’s not technically the same as reaching into a silo filled with M&M’S and pulling out a true random sample of 1.69 ounces. For the sake of argument, one bag is okay.)

Table 15-2

Percentage of M&M’S Observed in One Bag (1.69 oz.)

Color

Number Observed

Percentage Observed

Brown

4

7.14%

Yellow

10

17.86%

Red

4

7.14%

Blue

10

17.86%

Orange

15

26.79%

Green

13

23.21%

TOTAL

56

100.00%

Now you look at what I observed in my sample (Table 15-2) and compare it to what I expected to get (Table 15-1, last column). Notice that I observed a lower percentage of brown and red M&M’S than expected and a lower percentage of blues than expected. I also observed a higher percentage of yellow, orange,

And green M&M’S than expected. You know that sample results vary by random chance, from sample to sample, and that the difference I observed may just be due to this chance variation. But could the differences indicate that the expected percentages, reported by Mars, aren’t being followed?

It stands to reason that if the differences between what you observed and what you expected are small, you should attribute that difference to chance and let the expected model stand. On the other hand, if the differences between what you observed and what you expected are large enough, you may have enough evidence to indicate that the expected model has some problems. How do you know which conclusion to make? The operative phrase is "if the differences are large enough." You need to quantify this term Large enough. Doing so takes a bit more machinery, so keep reading.

Calculating the goodness-of-fit statistic

The goodness-of-fit statistic is one number that puts together the total amount of difference between what you expect in each cell compared to the number you observe. The term Cell Is used to express each individual category within a table format. For example, with the M&M’S example, the first column of Tables 15-1 and 15-2 contain six cells, one for each color of M&M’S. For any cell, the number of items you observe in that cell is called the Observed cell count. The number of items you expect in that cell (under the given model) is called the Expected cell count For that cell. You get the expected cell count by taking the expected cell percentage times the sample size.

The expected cell count is just a proportion of the total, so it doesn’t have to be a whole number. For example, if you roll a fair die 200 times, you should expect to roll ones >6, or 16.67 percent, of the time. In terms of the number of ones you expect, it should be 0.1667 * 200 = 33.33. Use the 33.33 in your calculations for goodness-of-fit; don’t round to a whole number. Your final answer is more accurate that way.

The reason the goodness-of-fit statistic is based on the Number In each cell rather than the Percentage In each cell is because percents are a bit deceiving. If you know that 8 out of 10 people support a certain view, that’s 80 percent. But 80 out of 100 is also 80 percent. Which one would you feel is a more precise statistic? The 80 out of 100 percent, because it uses more information. Using percents alone disregards the sample size. Using the counts (the number in each group) keeps track of the amount of precision you have.

For example, if you roll a fair die, you expect the percentage of ones to be 34 If you roll that fair die 600 times, the expected Number Of ones will be 36 * 600 = 100. That number (100) is the expected cell count for the cell that represents the outcome of one. If you roll this die 600 times and get 95 ones, then 95 is the observed cell count for that cell.

The formula for the goodness-of-fit statistic is given by the following: ! ( ° EEh Where E Is the expected number in a cell and O Is the observed

All cells *-*

Number in a cell. The steps for this calculation are as follows:

1. For the first cell, find the expected number for that cell (E) By taking the percentage expected in that cell times the sample size.

2. Take the observed value in the first cell (O) Minus the number of items that are expected in that cell (E).

3. Square that difference.

4. Divide the answer by the number that’s expected in that cell.

5. Repeat steps 1 through 4 for each cell.

6. Add up the results to get the goodness-of-fit statistic.

The reason you divide by the expected cell count in the goodness-of-fit statistic (step four) is to take into account the magnitude of any differences you find. For example, if you expect 100 items to fall in a certain cell and you get 95, the difference is 5. But in terms of a percentage, this difference is only %oo = 5 percent. However, if you expected 10 items to fall into that cell and you observed 5 items, the difference is still 5, but in terms of a percentage, it’s %<> = 50 percent. This difference is much larger in terms of its impact. The goodness-of-fit statistic operates much like a percentage difference. The only added element is to square the difference to make it positive. (That’s done because whether you expected 10 and got 15, or whether you expected 10 and got 5 makes no difference to others, you’re still off by 50 percent.)

Table 15-3 shows the step-by-step calculation of the goodness-of-fit statistic for the M&M’S example, where O Indicates observed cell counts and E Indicates expected cell counts. To get the expected cell counts, you take the expected percentages shown in Table 15-1 and multiply by 56, because 56 is the number of M&M’S I had in my sample. The observed cell counts are the ones found in my sample, shown in Table 15-2.

Table 15-3

Goodness-of-Fit Statistic for M&M’S Example

Color O

E

O – E

(O – E)2

(O- E)) E

Brown 4

0.13 * 56 = 7.28

4 – 7.28 = -3.28

10.76

1.48

Yellow 10

0.14 * 56 = 7.84

10 – 7.84 = 2.16

4.67

0.60

Red 4

0.13 * 56 = 7.28

4 – 7.28 = -3.28

10.76

1.48

(continued)

Table 15-3 (continued)

Color

O

E

O -

E

(O – E)2

(O- E)) E

Blue

10

0.24 * 56 = 13.44

10 -

- 13.44 = -3.44

11.83

0.88

Orange

15

0.20 * 56 = 11.20

15 -

- 11.20 = 3.80

14.44

1.29

Green

13

0.16 * 56 = 8.96

13 -

- 8.96 = 4.04

16.32

1.82

TOTAL

56

56

7.55

The goodness-of-fit statistic for the M&M’S example turns out to be 7.55, the bolded number in the lower-right corner of Table 15-3. This number represents the total squared difference between what I expected and what I observed, adjusted for the magnitude of each expected cell count. The next question is how to interpret this value of 7.55. Is it large enough to indicate that colors of M&M’S in the bag aren’t following the percentages posted by Mars? The next section addresses how to make sense of these results.

Interpreting the Goodness-of-Fit Statistic By Using Chi-Square

After you get your goodness-of-fit statistic, your next job is to interpret it. To do this, you need to figure out the possible values you could have gotten and where your statistic fits in among them. You can accomplish this task with a Chi-square goodness-of-fit test.

The values of a goodness-of-fit statistic actually follow a Chi-square distribution with K - 1 degrees of freedom, where K Is the number of categories in your particular population (see Chapter 14 for the full details on Chi-square). You can use the Chi-square table (Table A-3 in the Appendix) to determine how far out your particular goodness-of-fit statistic is, compared to all the others that were possible to get. If your Chi-square statistic is large compared to other values on the Chi-square distribution, the model doesn’t fit; there’s too much of a difference between what you observed and what you expected under the model. However, if your goodness-of-fit statistic is small, you can’t reject the model. (What constitutes a high or low value of a Chi-square test statistic varies for each problem.) This section provides the details on using the Chi-square distribution to test for goodness-of-fit.

The goodness-of-fit statistic follows the main characteristics of the Chi-square distribution. The smallest possible value of the goodness-of-fit statistic is zero. If the M&M’s found in my sample (continuing the example from the previous section) followed the exact percentages found in Table 15-1, the goodness-of-fit statistic would be zero. That’s because the observed counts and the expected counts would be the same, so the values of the observed cell count minus the expected cell count would all be zero, so calculating the goodness-of-fit statistic here would result in zero.

The largest possible value of Chi-square isn’t specified, although some values are more likely to occur than others. Each Chi-square distribution has its own set of likely values, as you can see in Figure 15-1. (Figure 15-1 shows a simulated Chi-square distribution with 6 – 1 = 5 degrees of freedom (relevant to the M&M’s example). This figure basically gives a breakdown of all the possible values you could have for the goodness-of-fit statistic in this situation and how often they occur. You can see on Figure 15-1 that a Chi-square test statistic of 7.55 isn’t unusually high, indicating that the model for M&M’s colors probably can’t be rejected. However, more particulars are needed before you can formally make that conclusion.

Checking the conditions before you start

Every statistical technique seems to have a catch, and this case is no exception. In order to use the Chi-square distribution to interpret your goodness-of-fit statistic, you have to be sure you have enough information to work with

In each cell. The stat gurus usually recommend that the expected count for each cell turns out to be greater than or equal to five. If it doesn’t, one option is to combine categories together to increase the numbers.

In the M&M’S example, the expected cell counts are all above seven (see Table 15-3), so the conditions are met. If this weren’t the case, you could use a larger sample size, because you calculate the expected cell counts by taking the expected percentage in that cell times the sample size. If you increase the sample size, you increase the expected cell count. A higher sample size also increases your chances of detecting a real deviation from the model. This idea is related to the power of the test (see Chapter 3 for information on power).

After you collect your data, it’s not really right to go back and take a new and larger sample. It’s best to set up your sample size ahead of time, and you can do this by determining what sample size you need to get the expected cell counts to be at least five. For example, if you roll a fair die, you expect 3*6 of the outcomes to be ones. If you only take a sample of six rolls, you have an expected cell count of >6 * 6 = 1, which isn’t enough. However, if you roll the die 30 times, your expected cell count is >6 * 30 = 5, which is just enough to meet the condition.

The steps of the Chi-square goodness-of-fit test

Assuming the necessary condition is met (see the previous section), you can get down to actually conducting a formal goodness-of-fit test.

The general version of the null hypothesis for the goodness-of-fit test is Ho: The model holds for all categories, versus the alternative hypothesis Ha: The model doesn’t hold for at least one category. Each situation will dictate what proportions should be listed in Ho for each category. (For example, if you’re rolling a fair die, you have Ho: proportion of 1s = >6; proportion of 2s = >6; . . . ; proportion of 6s = J6.)

Following are the general steps for the Chi-square goodness-of-fit test, with the M&M’S example illustrating how you can carry out each step:

1. Write down Ho using the percentages that you expect in your model for each category.

Using a subscript to indicate the proportion (p) Of M&M’s you expect to fall into each category (see Table 15-1), your null hypothesis is Ho: pBrown = 0.13, pYellow = 0.14, pReD = 0.13, pBlue = 0.24, pOrange = 0.20, and Pgreen = 0.16. All these proportions must hold in order for the model to be upheld.

2. Write your Ha: This model doesn’t hold for at least one of the percentages.

Your alternative hypothesis, Ha, in this case, would be: One (or more) of the probabilities given in Ho isn’t correct. In other words you know that at least one of the colors of M&M’S has a different proportion of colors than what is stated in the model.

3. Calculate the goodness-of-fit statistic using the steps in the previous section.

The goodness-of-fit statistic for M&M’S, from the previous section, is 7.55. As a reminder, you take the observed number in each cell minus the expected number in that cell, square it, and divide by the expected number in that cell. Do that for every cell in the table and add up the results. For the M&M’S example that total is equal to 7.55, the goodness-of-fit statistic.

4. Look up the Chi-square distribution with K - 1 degrees of freedom, where K Is the number of categories you have (use Table A-3 in the Appendix).

You compare this statistic (7.55) to the Chi-square distribution with 6 – 1 = 5 degrees of freedom (because you have K = 6 possible colors of M&M’S).

Looking at Figure 15-1 you can see that the value of 7.55 is nowhere near the high end of this distribution, so you likely don’t have enough evidence to reject the model provided by Mars for M&M’S colors.

5. Find the p-value of your goodness-of-fit statistic.

You can use Table A-3 in the Appendix to find the p-value (the probability of being beyond your test statistic; see Chapter 3) of your test statistic using the Chi-square distribution. (For more info on the Chi-square distribution, see Chapter 14.)

Because the Chi-square table (Table A-3 in the Appendix) can only list a certain number of results for each of the degrees of freedom, the exact P-value for your test statistic may fall between two P-values listed on the table.

To find the P-value for the test statistic in the M&M’S example (7.55), you go to Table A-3 (Appendix) and find the row for 5 degrees of freedom and look at the numbers (the degrees of freedom is K - 1 = 6 – 1 = 5, where K Is the number of categories). You see that the number 7.55 is less than the first value in the row (9.24), which has a p-value of 0.10. (Find the P-value by looking at the column heading above the number.) So the P-value for 7.55, which is the area to the right of 7.55 on Figure 15-1, must be greater than 0.10, because 7.55 is to the left of 9.24 on that Chi-square distribution.

Many computer programs exist (online or via a graphing calculator) that will find exact p-values for a Chi-square test, saving time and headaches when you have access to them (the technology, not the headaches). Using one such online "p-value calculator" I found that the exact p-value for the goodness-of-fit test for the M&M’S example (test statistic 7.55, 5 degrees of freedom for Chi-square) is 0.1828 = 0.18. To find online p-value calculators, simply type in the name of the distribution and the word P-value In an Internet search engine. For this example, type in Chi-square p-value.

6. If your p-value is less than your predetermined cutoff (a), Reject Ho. The model doesn’t hold. If your p-value is greater than A, You can’t reject the model.

A typical value of a is 0.05. Some data analysts might use a higher value (up to 0.10) and others might go lower (for example 0.010.) See Chapter 3 for more information on choosing a and comparing your P-value to it.

Going again to the M&M’S example, the p-value, 0.18, is greater than 0.05, so you fail to reject Ho. You can’t say the model is wrong. So, Mars does appear to deliver on the percentages of M&M’S of each color, as advertised. At least you can’t say they don’t. (I’m sure Mars already knew that.)

ABE# While some hypothesis tests are two-sided tests, the goodness-of-fit test is

Always a right-tailed test, meaning that you have a greater than sign (>) in the alternative hypothesis, Ha (see Chapter 3 for the skinny on hypothesis testing). You’re only looking at the right tail of the Chi-square distribution when you’re doing a goodness-of-fit test. That’s because a small value of the goodness-of-fit statistic means that the observed data and the expected model don’t differ much, so you stick with the model. If the value of the goodness-of-fit statistic is way out on the right tail of the Chi-square distribution, however, that’s a different story. That situation means the difference between what you observed and what you expected is larger than what you should get by chance, and, therefore, you have enough evidence to say the expected model is wrong.

Комментарии закрыты.