|
| |
Analysis of Variance (ANOVA)
The Summary from class is outlined below. It outlines
our textbook, " Lind. (2005). Statistical techniques in business & economics
(11th ed). New York: McGraw-Hill, Chapter 8 &(12th ed.) Chapter. 12."
Analysis of Variance (ANOVA) is use under these conditions.
-
The sampled populations follow the normal distribution.
-
The populations have equal standard deviations.
-
The samples are randomly selected and are independent.
-
Testing two or more samples
Characteristics of F-Distribution:
-
The F distribution is used to test whether two or more
sample means came from the same or equal populations
-
There is a Family of F-Distributions
-
Each member of the family is determined by two parameters:
the numerator degrees of freedom and the denominator degrees of freedom.
-
F cannot be negative, and it is a continuous distribution.
-
The F distribution is positively skewed.
-
Its values range from 0 to infinity . As F goes to infinity
the curve approaches the X-axis
|
Contributed by Gary Hale, UoP, 2005
Chapter 12: Analysis of Variance (ANOVA)
The probability distribution used in this chapter is the F distribution.
This probability distribution is used as the distribution of the test
statistic for many situations. It is used to test whether two
samples are from populations having equal variances, and it is also
applied when there is needed a comparison of several population means
simultaneously. The simulation comparison of several population means is
called analysis of variance (ANOVA). In both of these situations, the
populations must follow a normal distribution, and the data must be at
least interval-scale (Lind, D.M., Marchal, W.G. & Wathen, S.A., 2004).
There are five
characteristics of the F distribution:
-
There is a “family”
of F distributions. A particular member of the family is determined
by two parameters: the degrees of freedom in the numerator and the
degrees of freedom in the denominator. The shape of the distribution
is illustrated by the graph below. There is one F distribution for
the combination of 29 degrees of freedom in the numerator and 28
degrees of freedom in the denominator. There is another F
distribution for 19 degrees in the numerator and 6 degrees of
freedom in the denominator. The shape of the curves change as the
degrees of freedom change (Lind et al., 2004).

-
The F distribution
is continuous, which means that it can assume an infinite number of
values between zero and positive infinity (Lind et al., 2004).
-
The F distribution
can not be negative. The smallest value F can assume is 0.
(Lind et al., 2004).
-
It is positively
skewed. The long tail of the distribution is to the right hand side.
As the number of degrees of freedom increases in both the numerator
and
denominator the distribution approaches a normal distribution (Lind
et al., 2004).
-
It is asymptotic.
As the values of X increases, the F curve approaches the X axis
but never touches it. This is similar to the behavior of the normal
distribution
(Lind et al., 2004).
The F distribution is
used to test the hypothesis that the variance of one normal population
equals the variance of another normal population. The F distribution is
also used to test assumptions for some statistical tests. To use that
test we assume that the variances of two normal populations are the
same. The F distribution provides a means for conducting a test
regarding the variances of two normal populations (Lind et al., 2004).
Regardless of whether
we want to determine whether one population has more variation than
another population or validate an assumption for a statistical test, we
first state the null hypothesis. The null hypothesis is that the
variance of one normal population equals the variance of the other
normal population. The alternate hypothesis could be that the variances
differ. The null hypothesis and the alternate hypothesis for a two sided
test are:

To conduct the test, we
select a random sample of n1 observation from one population, and a
sample of n2 observations from the second population. The statistic is
defined as shown:

The terms
s2/1 and s2/2 are the respective sample variances. If the null
hypothesis is true, the test statistic follows the F distribution with
n1 – 1 and n2 – 1 degrees of freedom. In order to reduce the size of the
table of critical values, the larger sample variance is placed in the
numerator; therefore, the tabled F ratio is always larger than 1.00. The
right-tail critical value is the only one required. The critical value
of F for a two-tailed test is found by dividing the significance level
in half (a/2) and then referring to the appropriate degrees of freedom
(Lind et al., 2004).
The usual practice is to determine the F ratio by putting the larger of
the two sample variances in the numerator. This will force the F ratio
to be at lease 1.00. This allows us to always use the right tail of the
F distribution, avoiding the need for more extensive F tables (Lind et
al., 2004).
|
Test for Equal Variances

- The null hypothesis is rejected if the computed value of the F test
statistic is greater than the critical value from the table.
Analysis of Variance Procedure
- Ho = The Null Hypothesis states the population means are the same.
- Ha = The Alternative Hypothesis says that at least one of the means is
different.
- The Test Statistic is the F distribution.
- The Decision rule is to reject the null hypothesis if F (computed) is
greater than F (table) with numerator and denominator degrees of freedom.
To Compute the F value we would do the following steps. To simplify
the measures use the ANOVA Table
| Source |
Sum of Squares |
Degrees of Freedom |
Means Squares |
F |
P |
| Treatments |
SST |
k-1 |
SST/(k-1) = MST |
MST/MSE |
P |
| Error |
SSE |
n-k |
SSE/(n-k) = MSE |
|
|
| Total |
SStotal |
n-1 |
|
|
|
- If there are k populations being sampled, the numerator degrees of
freedom is k – 1.
- If there are a total of n observations, the denominator degrees of
freedom is n – k.
- The test statistic is computed by:

- SS Total (SSTot) is the total sum of squares.

-
SST is the treatment sum of squares

Tc is the column total,
nc is the number of observations in each column,
Sum of X is the sum of all the observations, and
n the total number of observations.
-
SSE is the sum of squares error
SSE = SStotal -SST
We then make Inferences About Treatment Means:
- If and when we reject the null hypothesis, we are saying we can not
prove that the means are equal. When more than two samples, we may
want to know which treatment means differ. One of the simplest
procedures to tell which treatment means differ is through the use of
confidence intervals.
- Confidence Interval for the Difference Between Two Means: where t
is obtained from the t table with degrees of freedom (n - k).
MSE = [SSE/(n - k)]

Two-Factor ANOVA
- For the two-factor ANOVA we test whether there is a significant
difference between the treatment effect and whether there is a difference in
the blocking effect.
- Let Br be the block totals (r for rows)
- Let SSB represent the sum of squares for the blocks where:

| Source |
Sum of Squares |
Degrees of Freedom |
Means Squares |
F |
P |
| Treatments |
SST |
k-1 |
SST/(k-1) = MST |
MST/MSE |
P |
| Error |
SSB |
b-1 |
SSB/(b-1n-k) = MSB |
MSB/MSE |
|
| |
SSE |
(k-1) (b-1) |
SSE/((k-1)(b-1) = MSE |
|
|
| Total |
SStotal |
n-1 |
|
|
|
| |
|