The separation of variance attributable to one cause from the variance attributable to others. By partitioning the total variance of a set of observations into parts due to particular factors, for example, sex, treatment group, etc, and comparing variances (mean squares) by way of F-tests, differences between means can be assessed. The simplest analysis of this type involves a one-way design, in which N subjects are allocated, usually at random, to the k different levels of a single factor. The total variation in the observations is then divided into a part due to differences between level means (the between groups sum of squares) and a part due to the differences between subjects in the same group (the within groups sum of squares, also known as the residual sum of squares). These terms are usually arranged as an analysis of variance table.

If the means of the populations represented by the factor levels are the same, then within the limits of random variations, the between groups mean square and within groups mean square, should be the same. Whether this is so can, if certain assumptions are met, be assessed by a suitable F-test are that the response variable is normally distributed in each population and that the populations have the same variance. Essentially an example of ageneralized linear model with an identity link function and normally distributed errors. (Syn: therapeutic trial) A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety. The term is subject to wide variation in usage, from the first use in humans without any control treatment to a rigorously designed and executed experiment involving test and control treatments and randomization. Several phases of clinical trials are distinguished:

Phase I trial Safety and pharmacologic profiles. The first introduction of a candidate vaccine or a drug into a human population to determine its safety and mode of action. In drug trials, this phase may include studies of dose and route of administration. Phase I trials usually involve fewer than 100 healthy volunteers. Phase II trial Pilot efficacy studies. Initial trial to examine efficacy usually in 200 to 500 volunteers; with vaccines, the focus is on immunogenicity, and with drugs, on demonstration of safety and efficacy in comparison to other existing regimens. Usually but not always, subjects are randomly allocated to study and control groups. Phase III trial Extensive clinical trial. This phase is intended for complete assessment of safety and efficacy. It involves larger numbers, perhaps thousands, of volunteers, usually with random allocation to study and control groups, and may be a multicenter trial. Phase IV trial With drugs, this phase is conducted after the national drug registration authority (e.g., the Food and Drug Administration in the United States) has approved the drug for distribution or marketing. Phase IV trials may include research designed to explore a specific pharmacologic effect, to establish the incident of adverse reactions, or to determine the effects of long-term use. Ethical review is required for phase IV clinical trials, but not for routine post marketing surveillance. The probability that an event occurs given the outcome of some other event. Usually written, Pr(A l B). For example, the probability of a person being colour blind given that the person is male is about 0.1, and the corresponding probability given that the person is female is approximately 0.0001. It is not, of course, necessary that Pr(A l B) = Pr(A l B); the probability of having spots given that a patient has measles, for example, is very high, the probability of measles given that a patient has spots is, however, much less. If Pr(A l B) = Pr(A l B) then the eventsA and B are said to be independent. is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable . The methodologies of scientific studies therefore need to control for these factors to avoid what is known as a type 1 error : A 'false positive' conclusion that the dependent variables are in a causal relationship with the independent variable . Such a relation between two observed variables is termed a spurious relationship . Thus, confounding is a major threat to the validity of inferences made about cause and effect, i.e. internal validity , as the observed effects should be attributed to the confounder rather than the independent variable.

By definition, a confounding variable is associated with both the probable cause and the outcome. The confounder is not allowed to lie in the causal pathway between the cause and the outcome: If A is thought to be the cause of disease C, the confounding variable B may not be solely caused by behaviour A; and behaviour B shall not always lead to behaviour C. An example: Being female does not always lead to smoking tobacco, and smoking tobacco does not always lead to cancer. Therefore, in any study that tries to elucidate the relation between being female and cancer should take smoking into account as a possible confounder. In addition, a confounder is always a risk factor that has a different prevalence in two risk groups (e.g. females/males). (Hennekens, Buring & Mayrent, 1987). An index that quantifies the linear relationship between a pair of variables. In a bivariate normal distribution, for example, the parameter, p. An estimator of p obtained from n sample values of the two variables of interest, (x1, y1), (x2, y2),...,(xn,yn), is Pearson's product moment correlation coefficient, r, given by

The coefficient takes values between -1 and 1, with the sign indicating the direction of the relationship and the numerical magnitude its strength. Values of -1 and 1 indicate that the sample values fall on a straight line. A value of zero indicates the lack of any linear relationship between the two variables. The range of possible values for a measurement (e.g. the set of possible responses to a question, the physically possible range for a set of body weights). Measurement scales can be classified according to the quantitative character of the scale:

- dichotomous scale - one that arranges items into either of two mutually exclusive categories, e.g. yes/no, alive/dead.

- nominal scale - classification into unordered qualitative categories, e.g. race, religion, country of birth. Measurements of individual attributes are purely nominal scales, as there is no inherent order to their categories.

- ordinal scale - classification into ordered qualitative categories, e.g. grade, where the values have a distinct order but their categories are qualitative in that there is no natural (numerical) distance between their possible values.

- interval scale -an equal interval involves assignment of values with a natural distance between them, so that a particular distance (interval) between two values in another region of the scale. Examples include Celsius and Fahrenheit temperature, date of birth.

- ratio scale - a ratio is an interval scale with a true zero point, so that ratios between values are meaningfully defined. Examples are absolute temperature, weight, height, blood count, and income, as in each case it is meaningful to speak of one value as being so many times greater or less than another value. A method that allows the hazard function to be modeled on a set of explanatory variables without making restrictive assumptions about the dependence of the hazard function on time. The model involved is

where x1, x2, ...,xq are the explanatory variables of interest, and h(t) the hazard function. The so-called baseline hazard function, a(t), is an arbitrary function of time. For any two individuals at any point in time the ratio of the hazard functions is a constant. Because the baseline hazard function, a(t), does not have to be specified explicitly, the procedure is essentially a distribution free method. Estimates of the parameters in the model, i.e. ß1, ß2,...,ßq are usually obtained by maximum likelihood estimation, and depend only on the order in which events occur, not on the exact times of their occurrence. a selected subset of a population. A sample may be random or nonrandom and may be representative or nonrepresentative. Several types of samples exist:area sample - a method of sampling that can be used when the numbers in the population are unknown. The total area to be sampled is divided into subareas, e.g. by means of a grid that produces squares on a map; these subareas are then numbered and sampled, using a table of random numbers.

cluster sample - each unit selected is a group of persons (all persons in a city block, a family, a school, etc.) rather than an individual.

grab sample (sample of convenience) - samples selected by easily employed but basically nonprobabilistic methods. It is improper to generalize from the results of a survey based upon such a sample, for there is no way of knowing what types of bias may have been present.

probability (random) sample -all individuals have a known chance of selection. They may all have an equal chance of being selected, or, if a stratified sampling method is used, the rate at which individuals from several subsets are sampled can be varied so as to produce greater representation of some classes than others.

simple random sample - a form of sampling design in which n distinct units are selected from the N units in the population in such a way that every possible combination of n units is equally likely to be the sample selected. With this type of sampling design the probability that the ith population unit is included in the same, so that theinclusion probability is the same for each unit. Designs other than this one may also give each unit equal probability of being included, both other here does each possible sample of n units have the same probability.

stratified random sample - this involves dividing the population into distinct subgroups according to some important characteristic, such as age or socioeconomic status, and selecting a random sample out of each subgroup. If the proportion of the sample drawn from each of the subgroups or strata, is the same as the proportion of the total population contained in each stratum, then all strata will be fairly represented with regard to numbers of persons in the sample.

systematic sample - the procedure of selecting according to some simple, systematic rule, such as all persons whose names begin with specified alphabetic letters, born on certain dates, or located at specified points on a list. A systematic sample may lead to errors that invalidate generalizations. used to describe the measurement of the steepness, incline, gradient, or grade of a straight line. A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on a line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line. The slope of a line in the plane containing the x and y axes is generally represented by the letter m, and is defined as the change in the y coordinate divided by the corresponding change in the x coordinate, between two distinct points on the line. This is described by the following equation:

m = Δy / Δx

If y is a linear function of x, then the coefficient of x is the slope of the line created by plotting the function. Therefore, if the equation of the line is given in the form y = mx + b then m is the slope. This form of a line's equation is called the slope-intercept form, because b can be interpreted as the y-intercept of the line, the y-coordinate where the line intersects the y-axis. a concept in inferential statistics and descriptive statistics. More properly, it is "the sum of the squared deviations". Mathematically, it is an unscaled, or unadjusted measure of variability. When scaled for the number of degrees of freedom, it estimates the variance, or spread of the observations about their mean value. The distance from any point in a collection of data, to the mean of the data, is the deviation. This can be written as Xi - Xbar, where Xiis the ith data point, and Xbar is the estimate of the mean. If all such deviations are squared, then summed, as in:

we have the "sum of squares" for these data. (C) There could be some nonlinear relationship between weight and demispan

The justification is that Pearson correlation only look at linear relationships. The zero value means that there is no linear relation but there could be a non linear one.

For example if points are (-3,9),(-2,4),(-1,1), (0,0),(1,1),(2,4),(3,9) then the Pearson correlation is zero but Y=X squared.