Classical and Item-Response Theories
Comparing Classical and Item-Response Theories
Item analysis is one of the most important elements of test construction. Statistical techniques can be used to carefully examine how a test item functions. This analysis can reveal how easy or difficult an item is and how well the item discriminates between test-takers. Item analysis can also reveal whether an item functions similarly when administered to different populations or when it is translated into another language. Classical test theory (CTT) and item-response theory (IRT) are the two commonly used methods of assessing test item characteristics.
Read a selection of your colleagues’ postings.
Respond by Day 6 to at least two of your colleagues’ postings in one or more of the following ways:
- Ask a probing question.
- Share an insight from having read your colleague’s posting.
- Offer and support an opinion.
- Validate an idea with your own experience.
- Make a suggestion.
- Expand on your colleague’s posting.
Classmate 1 (Samantha):
“CTT and IRT are both widely used methods of item analysis in testing. CTT, or classical test theory, aims to develop the reliability of psychological tests through the performance of the test taker, as well as through the difficulty level of the test items (Pearson, 2015). This testing theory uses three scores to compile an overall result. These scores are the test score (observed score), true score, and error score. The test score, or observed score, is just that: the raw score the test-taker received by taking the assessment. The true score is the expected score the test-taker will achieve. Finally, the error score is the difference in the observed score and the true score. All three of these scores combine to give an overall view of the individual scoring on a testing instrument. CTT has been widely used across decades of research and is available to most testers (Hambleton & Jones, 1993).
CTT has three main disadvantages. Firstly, it is dependent on population samples for its scores. This means that reliability estimates and item difficulty are reliant on test scores from a sample population (Pearson, 2015). Another disadvantage is that the scores are difficult to compare across different test measures, because the scaling is not appropriate from one test to another (Kean & Reilly, 2014). The scale reliability, computed by Cronbach’s coefficient alpha, increases as more testing items are added (Kean & Reilly, 2014). This can lead to an unnecessarily long assessment.
IRT, or item response theory, differs from CTT by addressing each individual test item rather than the test as a whole. Just as its name suggests, IRT focuses on the relationship between the difficulty of the test and the test taker’s abilities. Latent traits are assumed to have influence on the answering of the test questions (Olufemi, 2013). A huge benefit of IRT is the ability to tailor the questions while still maintaining the ability to compare tests across individuals (Pearson, 2015).
Disadvantages of IRT is that is complex. It requires time and effort to examine each testing item, which can be challenging if a quick analysis is required. Another disadvantage is the large sample size needed for estimation, compared to the relatively small sample size utilized in CTT (Hambleton & Jones, 1993).
For my final project, I’m trying to discern whether honesty is an inherent trait or a learned behavior. IRT would allow me to tailor my questions to fit more populations. For instance, I could compare different cultures who may hold different values on honesty. I could also include “marker” questions, which would help to detect any false “good” answers on the test, thus decreasing erroneous or biased answers. The benefit of being able to tailor the test provides such a benefit when dealing with many different populations.
CTT is a quicker test to analyze and in the confines of this class, time is of the essence. CTT is simpler in its concept and I believe the test I’ve created aligns more appropriately with CTT. Utilizing a Likert scale, I have an ordinal measure; that is, the scores are able to be ranked (5 scores higher than 4, 4 scores higher than 3, etc.). That ordinal measure fits with the CTT model. With 50 questions on my test, it is quite lengthy. However, this increases the reliability of my test in CTT (Kean & Reilly, 2014).”
Classmate 2 (Amy):
“Classical Test Theory (CTT) is a psychometric test used to develop the reliability of psychological tests and assessments. CTT is measured through individual’s test taking ability and the difficulty level of the questions. Test reliability provides a more in depth score, which is essentially the main aim of the classical theory. An advantage of CTT is this type of testing provides organizations with more valuable ways of the group working together verses individuals. A disadvantage of CTT is mistakes occurring within the process of testing, such as too little candidates which does not fit well into the CTT assessments model. A second disadvantage is: Scoring in classical test theory does not take into account item difficulty. A third disadvantage is: adaptive testing: CTT does not support adaptive testing in most cases.
Item Response Theory, IRT, is different from CTT in it provides accurate test scoring and developing assessment and questionnaires in which it assesses the individual test items rather than the entire test (Anastasi, 1997). An advantage of IRT is it is more accurate than CRT and can improve individuals test scores as it is targeted on abilities, attitudes or other traits (Thissen, 1988). IRT is utilized in the school system and is most likely recognized in standardized testing and Scholastic Aptitude Test (SAT). A disadvantage of IRT is the complexity and excessive amount of research information (Clark, 2006).
My final project will be measuring the skills and talents verses the volunteer positions they volunteer in to find if they are in a position which utilizes their skills and talents. I have constructed a questionnaire which will be scored utilizing the Likert Scale. While IRT would provide a better assessment from the results received from the questionnaire, CTT would be the better model to utilize due time constraints and possibly less complex.”
I had the same understanding of the IRT, as you indicated, particularly on the individualized testing of the responses. It becomes easier to identify the impact of each response, thereby determining its validity in a test (Suzuki et al. 345). However, as you put it, it is always significantly complex…