PEHL 557

Class Notes

Becoming Acquainted with Statistical Concepts

Student Learning Outcomes

At the completion of this instructional unit students will be able to:

  1. State, explain, and provide practical examples of the three types of observations that different statistical procedures allow.
  2. Explain differences between positive and negative correlations.
  3. Describe and explain differences between three different sampling methods.
  4. Give examples to illustrate understanding of the two principle questions that statistics answer.
  5. Explain the concept of probability and how this concept is used when conducting experimental research.
  6. Show an understanding of type I and type II errors by providing a practical example.

Need to Understand

Even if you aren't planning to conduct experimental research you need to be able to understand it.

Q. Statistics provide an objective way of interpreting a collection of observations. What do we mean by objective and how else might we interpret?

A. Objective implies reliability. Two observers would draw the same conclusions. In contrast subjective observations imply observer bias.

Q. What are three different types of observations that different statistical procedures allow?

A. 1. description of characteristics of data

2. testing of relationships between sets of data

3. testing of differences among sets of data

Q. Write 3 samples of studies that would use each of these types of statistical procedures

A. e.g. average score of students in this class on the midterm would use mean. Examining relationship between fitness and test scores in class would use correlational statistics. Examining differences between midterm scores of males and females in this class would use ANOVA (Those were my answers, but could you tell me in class three examples of your own, huh?)

Q. What would you conclude if I told you that a correlation showed that for every five points a student increased on the fitness scale she increased 3 points on test scores.

A. A perfect relationship existed. Represented graphically it would appear as a straight line. In figures would be a perfect positive correlation, or r = 1.00.

Q. Could you conclude that fitness "caused" the improved test scores? If not, why not?
A. Now way, dude...but why not?

Q. What would you conclude if I said that for every 5 point increase in fitness your test scores decreased 3 points?

A. Jocks are dumb? Also, a perfect negative correlation.

Q. What would you conclude if I said I'd done a correlation between these scores and found that r = .60?

A. Some relationship but rather low. Why is it low? .6 x .6 = 36% of relationship is explained by these two factors. Rather a low relationship.

Q. What would you suggest if I told you I wanted to see if there was a "relationship" between the teaching method I use with one class and the different teaching method I use with another class?

A. Use of word "relationship" is misleading because I am really interested in differences.

Sample Selection

Q. When researchers conduct studies they work with samples. What is a sample? What do we need to be concerned about when choosing a sample?

A. Sample is a group of subjects. Ideally they should be representative of the population of interest. Examples?

Q. What are the names of three methods for selecting subjects? Can you explain them?

A. Random sampling - use table rather than hat pin.

Q. See p. 426. Who can explain how to select subjects using this table?

Q. Why might you want to use the method called stratified random sampling?

A. Because of special interest in certain types of people (males, college students, Eskimos, etc.)

Q. Why use systematic sampling?

A. Useful when selecting from large population. Can choose every 100th person on a list.

Realistically, it's rarely possible to get true random sampling unless the population you are interested in is very small. In graduate student research it's common to use existing groups.

Q. What limitation do you think we are placing on our study because of these difficulties? It's a threat to validity that we discussed earlier. What type of validity and why?

A. External validity. The ability to generalize.

Q. After selecting our subjects we need to get them into experimental groups. What technique do we use for this?

A. Random assignment. How to do it? Use random #s table again.

Q. Post hoc explanations are what?

A. After the fact. An explanation that occurs following the experiment rather than prior to.

What statistics can tell us

1. Is the effect or relationship a reliable one? Is it significant?

2. How meaningful is the effect or relationship?

Q. Text tells us that #1 always takes precedence over #2. Can you see why?

A. Because if it is not significant we have no interest in meaningfulness.

The importance of this question was confirmed by one former student of this class who remarked that her relationships were also mostly significant but not very meaningful!

Q. We also learn that #2 should always be asked if #1 is true. Why?

A. Because very small differences can be significant but in reality not be very meaningful. For example, a 5-day a week running program might produce significantly better improvements in VO2 uptake than a 2-day a week, but if the differences are very small is it worth the much greater effort to get them?

Categories of statistical techniques

Q. Although the text makes a clear distinction between statistical techniques to test relationships and statistical techniques to test differences, the authors of the text urged caution. Why?

A. Because the distinction was more a matter of convenience rather than difference. Both techniques are based on similar theoretical principles. Main point to realize is that correlation between two variables does not indicate causation. In fact no statistic alone indicates causation.

Q. Thus you could have a significant t or F value and still not be able to conclude causation. Why?

A. Causation depends on the total experimental setting. There may be factors affecting internal validity. For example?

Probability

Q. Is a term we use conversationally in relation to the likelihood of some even occurring, e.g. rain, snow, Seahawks winning, etc. What can you tell me about this term as used in experimental settings?

A. Relates to the confidence we have in our findings. When our statistics indicate that something is significant by referring to probability we can indicate our confidence in this finding.

Q. So my research yields a t of 4.35 which I am told is significant at p < .05 (is less than). What does this mean?

A. That if I repeated the experiment, only in 5 times out of 100 would a similar finding occur by chance.

Q. This level of chance is known as the xxxx level?

A. Alpha level

Q. When do we usually decide on our alpha level?

A. Set it prior to the experiment although very common for researchers to report the alpha level at which their findings were significant. However, as your authors note you can also report the meaningfulness of any statistic by calculating effect size. In the end though do remember that any statistics have to be evaluated in relation to the entire experimental arrangements before drawing any conclusions. (What's that expression? There are lies, damn lies, and statistics)

Q. Tell me about Type I and Type I errors

A. Relates to chances of making incorrect decisions in the experiment. Accepting false Ho or rejecting true Ho.

Q. See chart below. Who can explain it to me?

A. You should be able to give me practical examples.

Ho True Ho False
Accept Correct decision Type II Error
Reject Type I Error Correct

Q. Which is more important? What is the relationship?

A. Depends on the question being studied. Is it desirable to detect small differences or only to detect large differences?

The two errors work in opposition. As you control one more you increase your chances of making the other. Must consider the consequences of making an error.

Here is more on probability than you ever wished to know...

Type I and Type II Errors - The Inside Story

Suppose you are interested in studying whether free weights are a more effective way of developing strength than using weight machines. You randomly select subjects for the study then randomly assign them to one of two groups - labeled "Free Weights" or "Weight Machines. Each group trains twice a week for eight weeks before you conduct a test of their strength. You collect all the individual scores from each group, total them, then analyze the data in a t-test to evaluate whether the means of the groups do differ significantly.

Now, as graduate students in PEHLS you know that prior to beginning your experiment you need to set a probability level (Your statistics require you to do this because it will be used in your calculations!). Typical figures used are .05 and .01. What do these mean? Well, if you use .05 it means that if this study was repeated 100 times, the probability exists that on 5 occasions you might observe differences between the two groups that could be attributed to chance factors rather than the training method. What, you ask, are "chance factors"? You really don't know, because if you did you'd probably try to control for them before they had their effect on your experiment. For example, even when using random selection it's quite possible that on a few occasions you have a selection bias. Maybe you happened to assign a bunch of enthusiastic weight trainers into your free weights group. Or perhaps, some of the subjects in your machine group spend the evening before the test at the Best Western and don't perform as well on your strength test as they are really capable of performing. These are possible reasons but basically you won't ever know all the chance factors influencing you're study. Just accept that chance factors exist and can sometimes be responsible for the results you observe.

Okay, you conduct the experiment and using a p<.05 you find that your statistics show that there were significant differences between the means of the two groups and that the free weight group developed more strength. Barring any extraneous factors you are aware of that caused these findings you would be correct to conclude that free weights (within the limitations of this study) were a more effective training method for developing strength than weight machines.

Another way of looking at your findings is to say that you are 95% confident that the differences in strength that you observed were due to the differing effects of the two training methods. Remember, by setting your probability level at .05 you are saying that if you repeated the study 100 times, 95 of those times the differences you observe would be due to training method. However, a 5% probability exists that you would observe differences that were not the result of the training methods but solely due to chance factors (factors that are not controllable). The problem is - you can never be sure if the one occasion you conducted your study was one of the 95 times that it would be correct to attribute differences to training methods or one of the 5 times that the differences should be attributed to chance. You don't know this and you can never know this. That's why you can never be absolutely certain about your experimental findings but have to include a probability level.

How does all this relate to Type I and Type II errors? Read on, it gets even more exciting. Let's go back to your study and the fact that you found a significant difference. You are 95% confident that free weights were a more effective method of improving strength. But there stills exists that 5% chance that you are making an erroneous conclusion (relax, remember we just said it was because of chance factors, not because you did anything wrong). This, folks, is a Type I error. Look at the chart in your text. You are making a decision to reject the null hypotheses (Ho) when in fact it is true (but those darn chance factors messed you up!).

Okay, so you say, no one's going to make a monkey out of me. When I make conclusions I'm going to be darn sure that they're correct. What do you do? You decide to adopt the more stringent probability level of .01. This means that if you repeated the study 100 times, and found differences each time between the training methods, only once would the difference be attributable to chance. Ninety-nine percent of the time the differences you observe would indeed be due to the fact that free weights do improve strength more than weight machines. This sounds better doesn't it - oh, that life were that simple!

You probably proposed this study because you found evidence to suggest that free weights were a more effective way of developing strength than weight machines. This is what you think and what you want to find. And although you'd be delighted to find that free weights are far more effective, you'd also be interested in discovering that they are just a little more effective. How easily you can discover little differences as well as big differences is a question that relates to the probability level you set for the experiment.

If you set a probability level of .01, while you are protecting yourself from concluding that differences exist when in fact they may be due to chance, you are also limiting your prospects of finding any differences. You see, the probability level of .01 means that unless you find big differences you are likely to end up concluding there are no differences. In effect you are excluding the possibility of finding little differences. And if in reality differences (albeit little) did exist between the two training methods, to conclude that there were no differences would also be an error - in fact a Type II error. Look at the chart in your text and notice that you are making the decision to accept the Ho when differences do really exist. If instead of using a .01 probability level you used a probability level of .05 you would be much more likely to see little differences as well as big differences.

Maybe you can now begin to see that you as a researcher are being forced into a compromise situation. While you may have concerns with making mistaken conclusions about the effectiveness of free weights you also don't want to eliminate the possibility of finding some effect. As you limit possibilities for making Type I errors you increase the likelihood of making Type II errors - and vice versa. A lot depends on the question you are examining. In some research it is valuable to find any differences even at the risk that chance factors might be responsible. For example, in developing general fitness training programs you might be happy to accept the risk of errors. In contrast, with some research it's essential to be absolutely certain that the independent variable is solely responsible for any effect observed on the dependent variable, (e.g. using drugs or medical procedures where costs are high and there is concern about the risk of side effects). And that folks, is how they get the milk in the coconut.

(Revised 2/3/99)


BACK TO TOP OF PAGE

BACK TO PEHL 557 CLASS NOTES


Page constructed by Stephen C. Jefferies

jefferis@cwu.edu