Your hear alot today about "Data Dredging"? What is it and how can it be used or useful? Is it something that individuals or companies do? What is this term all about?
Data dredging is one form of mining, or fishing, for large volumes of information with the purpose of finding out matching variables and establish a correlation that would help prove how one thing affects or influences another. Dredging means "to scoop out". This is exactly what some companies try to do with data in order to try and prove a claim as true. This may sound like a common practice, and it is done a lot, but it is not traditional nor always well-intended. If one needs so badly to make a connection between one thing and another chances are that you are trying to connect two dots that may not go together except by chance. Hence, why try to establish a correlation that does not occur naturally through quantitative research?
In typical quantitative research you establish a hypothesis or theory, then you check for trends, and identify the key variables that affect each other either in a linear research, or in a comparative research, a volumetric research or as a hierarchy. The point is, you let the data speak for itself by allowing for the variables to come together under a controlled scenario.
Dredging is going backwards, it basically is diving into all available data and hope for a connection or two to come up. Once something comes up, whether the connection popped up by chance or was created under iffy circimstances, unethical researches would claim a correlation and, to make it seem legit, point to quantitative research by stating the dreadful words "research shows", or "scientific analysis shows". In reality, nothing concrete that can be predictable nor valid has actually come up; just a fluke.
An example: think about those diet companies that love to correlate a magic plant or fruit to immediate weight loss. Since they are not FDA approved, they can go under the medical radar without consequences. However, those who want to claim FDA in their labels must submit to specific research parameters. Enter dredging. Those weightloss companies will look into years and years of reports until they find ONE that will show that there is a connection (as big or as little as it may be) between the magical fruit and weight loss. What do they do next? They put those "scientific research" claims on their labels and then they will sell the product for a higher price as a result of it being "research-based".
How could dredging be useful? Dredging could be definitely useful in the medical community when there is a sudden outbreak, pandemic, or epidemic and there is a sudden need to look into years of medical data to determine if there has ever been any historical connection between two variables. For example, the Ebola virus was something that came out of nowhere and began to kill people in masses. Surely, the medical community needed to do a super fast research of years back to see when else has there ever been a report of Ebola breakouts and whether there are any known connections of what may have caused breakouts. This type of dredging is excellent.
Therefore, since data is "there" and dredging, as a practice, is not an illegal thing to do, surely some research groups engage in this practice for the purposes discussed previously, that is, to drag out a correlation or two to help support a claim. Yet, a truly-followed scientific process precludes that this practice is not used because, when research is well-done, dredging is completely unnecessary.