On Probability & Frequency of Distribution

When describing the normal distribution as an underlying distribution of events in a sample, the researcher is really describing the probability of any given number or event to take place within the distribution itself. At first, one can become mired in the fact that distributions have to be generated for any given data set on a particular research topic, but once realizing that the relative frequency in a distribution of n is in fact “its probability” (Hinkle, Wiersma, & Jurs, 2003, p. 160). Therefore, one can begin to make inferences from the data that yield insights into future performance or behavior in a similar sample, depending on the research performed.

For any practitioner, it is important to realize this fact when conducting basic research so as to maximize the understanding of data collected. Among educational staff, in a K-12 district for example, being able to better understand what is actually analyzed, how, and what it all means is crucial to creating interventions or program enhancements that actually work. Using inferential statistics, and sampling correctly, can lead to strong data correlations, and therefore, yield more effective outcomes for the sample researched in the first place.

Many examples of introductory statistics seem to include descriptions of drawing slips of paper. (Hinkle et al., 2003) What is powerful, as an insight drawn from this exercise, is when the researcher realizes that the act of drawing pieces of paper is actually the inferences being made (in a future setting, where data is used for prediction) is very powerful.


Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the Behavioral Sciences. 5th Ed. Boston: Houghton Mifflin.



On Correlation vs. Causation

Using linear regression techniques to accurately predict a score on a selected distribution of scores is tremendously helpful to K-12 educators, especially central office personnel.

Assessment data, while abundant, is often not connected to future predictions, nor is subject matter performance correlated with other variables. For example, performance in one subject should be able to predict performance in another, using linear regression and prediction.

It is important to note, however, that for many compared variables, correlation does not indicate causation. “The researcher must still interpret the statistics in the content of the situation” (Hinkle, Wiersma, & Jurs, 2003, p. 122), and therefore, for the educational practitioner, making inferences about data can be challenging, if not erroneous.

Correlation vs. causation
Correlation vs. causation. The rain didn’t cause the kid to put on rain boots. It’s correlated to the rain. In fact, his mom put the rain boots on, and the rain had nothing to do with it…


For example, a practitioner might notice a correlation between scores on the ACT and scores on the SAT exams between students. It might then be tempting to make predictions on success in college based on the positive correlation between the two distributions (just because both are used for college entrance purposes) and make the claim that high scores on the SAT yields a high graduation rate from college.

Likely, no such comparison can be made. The prediction in this case is not related to the two distributions under study, and therefore the research cannot claim correlation or prediction. Many fall into this trap, and it would be best if practitioners better understood that correlation between X and Y “does not imply” (Hinkle et al., 2003, p. 122) causation.

Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the Behavioral Sciences. 5th Ed. Boston: Houghton Mifflin.



Leisure Time Use and Academic Correlates of Alcohol Abuse among Teenagers in Rural Pennsylvania

In the provided article (see attached at the bottom) on leisure time use and academic correlates of alcohol abuse among teenagers in rural Pennsylvania, we learn about the connections between alcohol and social and vocational attitudes.

According to the article, “adolescents’ use of alcohol is becoming more problematic” (Pendorf, 1992) but “it may not be prudent for adults to suggest that alcohol use is entirely bad” when considering its connections to social growth such as self-discovery, acquiring self-esteem and friends (Pendorf, 1992).

Here, we explore a few of the correlations between variables.

Which two variables are most strongly related? Explain.

The two variables that are the most strongly related are consuming hard liquor (on some frequent basis according to the continuous scales of measurement) and participating in social entertainment activities like attending parties, movies, etc. With a correlation of hard liquor to entertainment at 0.37, indicating that consumers of hard liquor are likely to participate in these types of activities, the next closest correlation was between beer consumption (again, on some frequent use basis) to entertainment activities with a factor of 0.28. Students that reported “use of any type of alcohol weekly or more often were classified as heavy users, or abusers” (Pendorf, 1992) and one can assume these data are correlated as well to skew the data somewhat.

What type of relationship exists between those involved in vocational activities and the consumption or reported consumption of beer, wine, and hard liquor? Explain.

The relationship between the consumption of beer, wine, and hard liquor and engaging in vocational activities like holding, or searching for a job is positive, with a reported positive correlation all around. Beer correlates to vocational activities at 0.22, wine at 0.17, and hard liquor at 0.18.

One can assume that having access to vocational income more than likely results in the ability to procure alcohol of some kind, either through social connections at the vocational activity, or through financial income leading to purchase of alcohol from some source. According to the data, it certainly seems that individuals that report being involved with working, or “who are at least looking for work, are heavier users of alcohol” (Pendorf, 1992).

Is involvement in religious activity related to consumption or reported consumption of beer, wine, and hard liquor? Explain.

Involvement in religious activity seems to be negatively correlated to consumption of alcohol, regardless of type (as in beer, wine, or hard liquor). All three report negative correlations, meaning that consumers of alcohol (as reported), from the sample set of data including high school teenagers, are less likely to engage in religious activities if they report consuming alcohol at some measured rate.

The research data does not correlate frequency of consumption, as in yearly, weekly, daily, and correlations to activity. It would certainly be interesting to see if heavy users (daily), versus light users (annually) differ in terms of correlations between consumption and various activities.


Pendorf, J. E. (1992). Leisure time use and academic correlates of alcohol abuse among high school students. Journal of Alcohol and Drug Education.

Original Article: Pendorf, J. E. (1992). Leisure time use and academic correlates of alcohol abuse among high school students. Journal of Alcohol and Drug Education


On the Standard Normal Distribution

The idea of standardized testing has many people confused in general, especially in education where students in K-12 are required (at least in the State of Ohio) to sit for annual exams in various subjects including language arts, mathematics, social studies, and science. The resulting scores from these tests are standardized, and normalized against all other scores among the sample population (in this case, all students in the State of Ohio). Many have mixed feelings about standardized testing, but most likely do not fully understand what a standard score actually is.

In order to accurately compare assessment scores across multiple distributions (in this case, scores on the language arts test with the social studies test) it is required to standardize, and in some cases normalize the scores themselves for analysis. From Hinkle, Wiersma, & Jurs (2003) we know that a standard score, or z score, is the mean subtracted from the raw score divided by the standard deviation for a selected distribution. “The z score indicates he number of standard deviations a corresponding raw score is above or below the mean” (Hinkle, Wiersma, & Jurs, 2003, p. 71), and therefore, we can tell whether or not a score is above or below the mean, and by how far.

Normalizing the score considers the normal distribution itself. The normal distribution “is not determined by any specific even in nature, and it does not reflect a specific law of nature” (Hinkle, Wiersma, & Jurs, 2003, p. 80), rather it is a model describing the normal, or usual, distribution of many sets of data. By normalizing the standard distribution, one can compare distributions across many sets of data, and in so doing, better analyze trends, outliers, and central tendencies. Furthermore, in the standardized normal distribution model, it is possible to ascertain how many scores in the distribution fall within one or more deviations away from the mean.

In manufacturing, for example, many strive to optimally produce with defect rates in the six-sigma range, or 3.4 defects per one million products (or whatever they produce). These ranges are extremely small, and at the very end of the normalized distribution, but helpful in making decisions about production, quality assurance, and interventions to fix defects.

In K-12 education, we would be lucky to achieve numbers far less than in the six-sigma range, understanding that 3 deviations away from the mean includes 99% of all scores in a given normalized distribution. In six sigma operations, that percentage is now 99.99966%. For example, if the metric was high school graduation, and a school district graduated 99% of its student population, it would still only graduate at the three-sigma level.

Understanding what a normal distribution is, and what standardized data is all about would serve any K-12 educator or practitioner well, and better prepare them to discuss what scores actually mean with colleagues, parents, and community members.

Most educators (classroom teachers, etc.) use a simple percentage scale for scores, that often translate into grades, but simply comparing achievement in one subject area with another is not a fair comparison, since the scores are not standardized, nor normalized. Perhaps we should consider standardizing and normalizing all scores and grades within K-12 in order to better understand and compare student performance across subjects?

More on this later…


Hinkle, D. E., Wiersma, W., & Jurs, S.G. (2003). Applied Statistics for the Behavioral Sciences (5th ed.). Boston, MA: Houghton Mifflin.