Hansen, Mary A
(2004)
Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
One aspect of evaluating model-data fit in the context of Item Response Theory involves assessing item fit using chi-square goodness-of-fit tests. In the current study, a goodness-of-fit statistic appropriate for assessing item fit on performance-based assessments was investigated. The statistic utilized a pseudo-observed score distribution, that used examinees' entire posterior distributions of ability to form item fit tables. Due to dependencies in the pseudo-observed score distribution, or pseudocounts, the statistic could not be tested for significance using a theoretical chi-square distribution. However, past research suggested that the Pearson and likelihood ratio forms of the pseudocounts-based statistic (c2* and G2*) may follow scaled chi-square distributions.The purpose of this study was to determine whether item and sample characteristics could be used to predict the scaling corrections needed to rescale c2* and G2* statistics, so that significance tests against theoretical chi-square distributions were possible. Test length (12, 24, and 36 items) and number of item score category levels (2 to 5-category items) were manipulated. Sampling distributions of c2* and G2* statistics were generated, and scaling corrections obtained using the method of moments were applied to the simulated distributions. Two multilevel equations for predicting the scaling corrections (a scaling factor and degrees of freedom value for each item) were then estimated from the simulated data.Overall, when scaling corrections were obtained with the method of moments, sampling distributions of rescaled c2* and G2* statistics closely approximated theoretical chi-square distributions across test configurations.Scaling corrections obtained using multilevel prediction equations did not adequately rescale simulated c2* distributions for 2- to 5-category tests, or simulated G2* distributions for 2- and 3- category tests. Applications to real items showed that the prediction equations were inadequate across score category levels when c2* was used, and for 2- and 3-category items when G2* was used. However, for 4- and 5-category tests, the predicted scaling corrections did adequately rescale empirical sampling distributions of G2* statistics. In addition, applications to real items indicated that use of the multilevel prediction equations with G2* would result in correct identification of item misfit for 5-category, and potentially 4-category items.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
13 December 2004 |
Date Type: |
Completion |
Defense Date: |
14 July 2004 |
Approval Date: |
13 December 2004 |
Submission Date: |
11 December 2004 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Education > Psychology in Education |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Goodness-of-Fit; Graded Response Model; Item Fit; Item Response Theory |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-12112004-230948/, etd-12112004-230948 |
Date Deposited: |
10 Nov 2011 20:10 |
Last Modified: |
19 Dec 2016 14:38 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/10311 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |