Validity refers to the point up to which the test that is being taken is actually testing and measuring what is supposed to. There is no specific definition other than this, considering that the etymology of the word "valid" is a disambiguation of the word "strength" or "strong". So the word valid entails the "substance" of the material contained in the test.
The most important aspects of validity for the tester should be the factors that will guarantee the tester that the assessment tool that is being used is going to generate data that is needed to best serve the tester.
Messick (1989, 1996) is your "go-to" source when it comes to aspects of validity. According to his article Validity of Performance Assessment he cites the following six importance factors of validity:
- Content validity- Does assessment task represent the actual construct that it is testing? Are the questions really asking what they are supposed to be asking?
- Substantial/Substansive validity- Does the test sample questions from different types of domains and skills or is it testing just one skill in many different ways? If so, the data will be eschewed.
- How are the tasks being evaluated/assessed? (Assessment validity/Structure Scoring)- There should be enough content knowledge in order to be able to appropriately assess and "score" a task.
- Does the task have Generalizability?- Tests should never be "too exclusive". The tasks within should correlate to other tasks and the test should be like a science project:it can be flexibly changed and contoured to whatever change is needed.
- External Factors-How are the empirical relationships between the scores of this assessment and other assessments.
- Consequential Aspects of Validity-Is the data interpretation free from bias and pre-conceived notions? Is the data going to be used for relevant information related to the tested tasks?
All of these aspects of statistical validity ensure that the assessment is asking what it is supposed to ask, and that the information contained targets the skills that need to be tested. It also prevents biased interpretation of data or irrelevant use of data for ulterior purposes.