# "Math, statistics & psychology" course essay topic. Please explain the following in simple laymen's terms: How can you assess the amount of variation in one variable that is accounted for by the other? Also, why is this important to know?

In an experimental study (in the field of psychology or indeed any scientific field) variables are broadly split into two types: dependent and independent.

The dependent variable is the outcome variable of interest in the study. Changes in the dependent variable across various scenarios are measured, the scenarios being governed by experimenter-controlled adjustments to the independent variables.

Hence the independent variables are altered and controlled in a measured way in the experiment and the effect on the dependent variable is measured accordingly.

An assumption made when carrying out a scientific study is that there is an underlying relationship or equation relating the dependent and independent variables. There is also an understanding that, when observing real-world values of the variables, random noise will be naturally present. The experimenter hopes to see through the experimental noise to the underlying relationship between the variables. Bias in the measurement process should either be anticipated with accuracy or be kept to a minimum by taking sufficient precautions. Controlled randomized studies are the best way to achieve this, where the independent variables are fully controlled by the experimenter, study subjects are randomized to reduce individual bias and outcomes are measured with accuracy and good forward planning.

The decision as to which variables to choose as independent ones and which as dependent is governed of course by practical constraints, but most importantly the causal direction should make logical sense. It should be that the experimenter is assessing how the independent variables cause associated change in the dependent variable(s) and not the other way round.

The mathematical method to asses the causal relationship between dependent and independent variables is called statistical regression analysis.

In this, the dependent variable (Y) is regressed on the independent variable (X) (X and Y might be unit-valued or could also be vector-valued, that is have more than one component variable). By using the Gaussian method of least squares, the measured variation in Y conditional on the controlled variation in X is quantified so that, in the most basic case we can write the linear relationship

Y ~ a + bX

that is Y is approximately (where approximate means stochastic as opposed to deterministic) related causally to X by a straight line with intercept 'a' and slope 'b'. If X describes/causes Y in a statistically significant way, the slope coefficient 'b' is significantly different from zero. This is found if the F-test for the regression analysis gives a p-value less than or equal to a type I error alpha (alpha is usually set to 5%, p =0.05, or tighter still at 1%, p=0.01).

If X has a significant effect on Y then it accounts for variation observed when measuring Y. In that case 'b' would be statistically significantly different from zero. In cases where there are a few independent variables included in a study (this is of course common), those with numerically larger 'b' coefficients (amongst variables of comparable significance) are of more practical significance in that they result in bigger changes in Y.

If the slope 'b' for any independent variable is not seen to be significantly different from zero in the regression analysis, this suggests that the independent variable in question provides no relevant measurable information about Y.

The aim of scientific studies is to find independent variables that explain away (stochastic) variability in the dependent variable(s). If all independent variables of value are found then simple white noise should be left once the variability due to the independent variables is subtracted out of the total variability of the dependent variable. This is a perfect statistical model, rarely seen in practice.

Once dependent variables can be stochastically modelled by independent variables they depend causally on, the dependent variables can be used as surrogates for the independent variables. This is particularly useful if, say, the independent variables are expensive or difficult to measure. More generally, causal trees of measurable variables build up a picture of a field of study. If any key variable on the tree is missing in future studies, its value can be imputed using knowledge of the causal tree worked on in earlier studies. Meta analysis where research results from similar competing studies are merged together aim to rework the tree to become more accurate and hence more useful.

I attach a link to a bbc programme website, cats versus dogs, where they worked with teams of scientists to compare differential behaviours between cats and dogs. The key independent variable is type of animal, and the dependent variables in each experiment address various attributes we typically associate with one or other animal. These experiments challenge traditional perceptions about traits the two animals have.