A test is reliable if it gets the same value over and over. A test is valid if it is truly measuring what the researcher thinks it is measuring.
A test can be reliable but not valid. Let's say I wanted to measure how smart people are by measuring their heads. I'd get the same value (in inches or centimeters or whatever) every time I measured their head so the test would be reliable. But I wouldn't be actually measuring their intelligence so the test wouldn't be valid.
A test cannot be valid if it's not reliable. If the test is not reliable, that means it gives different results every time I do it. If it keeps giving different results, it cannot possibly be measuring what I think it is. Let's say I give a person multiple tests to measure intelligence and they get wildly different results the tests. Clearly, the test is not really measuring intelligence because if it truly measured intelligence it would have to yield results that were nearly the same every time (because we assume a person's intelligence doesn't change from moment to moment).
The answer above gives an excellent definition of reliable and unreliable tests, what they are, and how they work. Because of this, I am going to elaborate a little more on your question. An unreliable test is simply unreliable, but how and why can a reliable test become or be unreliable?
First, I will talk about how a test is conducted. (Even though your question is about the Social Sciences, it is a little easier to understand if I talk about Science. A bit further down I will change the process over to the Social Sciences.)
Let us begin with an elementary science experiment: I place Plant A in the sunlight. I place Plant B under a box. Over the course of a week, I monitor the plant in order to discover how important light is to the growth of the plant.
Part of the reliability of the test is based on whether all the control factors are the same. Jimmy, who is running the test, must keep all other variables the same. Both plants need to be watered on the same schedule that is optimal for the plant type. Jimmy must also maintain the integrity of the test- He cannot leave the box off on some days when "he forgets." If he does so, the test becomes unreliable because the integrity has not been maintained- it is invalid. Invalid tests are unreliable because no conclusions can be drawn from the test.
If Jimmy maintains the integrity of the test, then it becomes valid- thus reliable on the surface. However, researchers know that one test could have bad results. (Maybe one plant is particularly hardy or has a genetic mutation.) By adding more experiments with the results all ending up the same, it makes the test more reliable. (If I conducted the plant experiment three times and all three results matched my first experiment, I would increase my test's reliability. If the test is run 300 times and 1/2 of the time the plant under the box lived and half of the time it died- the test would be deemed unreliable. Researchers would recognize something in the experiment is askew and must be resolved, and the experiment re-run before conclusions could be drawn. The closer the experiment gets to 100%- same results every time, the more reliable the experiment.
In the Social Sciences tests are not be as easy to control, because most of the time you are dealing with some form of human nature. All efforts in an experiment must be made to keep the integrity of the test- such as using a control group, subjects should be random (blind or double blind protocols), ethics must be maintained. If these are not followed- the test is considered invalid thus unreliable.
Results from a test that is invalid (unreliable) cannot be made reliable. The process whereby the results were obtained is corrupted, thus they cannot be used.
However, if the experiment is conducted properly and on the surface is considered valid, but the results are conflicting, it becomes unreliable. By performing multiple tests, the reliability of the test in increased. Like the plant experiment- if the test is performed 300 times and the 90% of the results are the same- the reliability is proven. The larger the discrepancy between experiments the more unreliable the test becomes.
For example: Let us say I observe a Kindergarten Class to determine if self-control is an important skill to classroom learning in Kindergarten. My research assistant notes the behavior of each child- focus, following directions, and behavior issues (temper tantrums) are observed and documented over the course of a month. A different research assistant (who knows nothing of the first research assistant's work) is then assigned to take each Kindergartner into a room and put a cookie on a plate. He then tells the Kindergartner that he is going to leave the room. He will come back in three minutes. If the cookie is still on the plate, the child will get two cookies instead of one, but if the child eats the cookie, he/she will not get another. The research assistant leaves the room and the child is observed- filmed or monitored by a third research assistant. The child's reaction to the test is noted and the ending result of the experiment is documented. (Who received two cookies and who ate the one?) The results are compared to the first researcher's notes. Did the students who did better in class wait for the second cookie?
As long as the protocols are maintained this is a valid experiment. However, in order to prove the reliability of the research, the study should be conducted multiple times with exactly the same protocols. The higher the frequency of similar results, the more reliable the test/results. If the results vary widely with each experiment, or if a separate researcher finds conflicting results through a second experiment- the test is considered unreliable. However, if a third and fourth researcher find the same results as the first researcher, the reliability is returned and the second researcher's results are called into question. (This is common in the Social Sciences.)