Çok Değişkenlik Kaynaklı Rasch Ölçme Modeli ve Hiyerarşik Puanlayıcı Modeli İle Kestirilen Puanlayıcı Parametrelerinin Karşılaştırılmas

Müge Uluman; Ezel Tavşancıl

doi:10.15869/itobiad.296489

Research Article

Comparing Parametersof Many Facetrasch Measurement Model And Hierarchial Rater Model

Year 2017, Volume: 6 Issue: 2, 777 - 798, 25.04.2017

Müge Uluman Ezel Tavşancıl

Abstract

This
study aims at estimating the parameters with many facet Rasch measurement model
(MFRMM) and hierarchical rater model (HRM) and evaluating together the rater
severty/leniency and parameters obtained from both models if responses given by
the same examinees for open-ended items are scored by multiple raters. In the
scope of collecting study data, the scores assigned by five secondary school
mathematics teachers for responses to eight open-ended items by 380 students,
aged 15, from 10 schools in Çankaya District of Ankara province were used
during the 2nd semester of the 2012-2013 academic year. The study revealed that
rater parameters of MFRMM and HRM were similar in general. According to the
deviation in formation criteria for both models; it was concluded that HRM
provides better fit the data than MFRMM and the structure of assigned multiple
scores regarding one single response to one single item is reflected beter by
the HRM.

Keywords

Rasch Measurement Model, Hierarchical Rater Model, Rater Parameters

References

Airasian, P.W. (2001). Classroom assessment: Concepts and applications. Boston: McGraw-Hill.
Akın, Ö. & Baştürk, R. (2012). Keman Eğitiminde Temel Becerilerin Rasch Ölçme Modeli İle Değerlendirilmesi. Pamukkale Üniversitesi Eğitim Fakültesi Dergisi, 31 (31), 175-187.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Atılgan, H. (2005b). Müzik öğretmenliği özel yetenek seçme sınavının çok-yüzeyli rasch modeli ile analizi (İnönü üniversitesi örneği). Eurasian Journal of Educational Measurement, 20, 62 – 73.
Brennan, R.L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27-34.
Brennan, R.L. (1997). A Perspective on the history of generabability theory. Educational Measurement: Issues and Practice, 16(4), 14-20.
Brennan, R.L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education. 24(1), 1-21.
Cardinet, J., Tourneur, Y. & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18(4), 183-204.
Casabianca, J.M. & Junker, B. (2013). Hierarchical rater models for longitudinal assessments. Annual Meeting of the National Council for Measurement in Education’da sunulan bildiri. San Francisco, California.
Casabianca, J.M. & Junker, B. (2014). The hierarchical rater model for evaluating changes in traits over time. 121st Annual Convention of the American Psychological Association, Division 5: Evaluation, Measurement and Statistics’te sunulan bildiri. Washington D.C.
Christensen, R., Johnson, W., Branscum, A. & Hanson, T.E. (2011). Bayesian ideas and data analysis: An introduction for scientists and statisticians. CRC Press, USA.
DeCarlo, L.T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42(1), 53-76.
DeCarlo, L.T. (2010). Studies of a latent class signal detection model for constructed response scoring II: Incomplete and hierarchical designs. ETS Research Report Series, (08). Princeton, NJ: Educational Testing Service.
DeCarlo, L.T., Kim, Y.K. & Johnson, M.S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48(3), 333-356.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221.
Engelhard, G. (1994). Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112.
Engelhard, G. & Myford, C.M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model. ETS Research Report Series, (01). Princeton, NJ: Educational Testing Service.
Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (1995). Bayesian data analysis. New York, NY: Chapman & Hall.
Iramaneerat, C., Myford, C.M., Yudkowsky, R. & Lowenstein, T. (2009). Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents. Advances İn Health Sciences Education,14(4), 575-594.
Jonsson, A. & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144.
Kastner, M. & Stangla, B. (2011). Multiple choice and constructed response tests: Do test format and scoring matter? Procedia-Social and Behavioral Sciences, 12, 263-273.
Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. USA: Academic Press.
Kim, Y.K. (2009). Combining constructed response items and multiple choice items using a hierarchical rater model (Doktora Tezi). Teachers College, Columbia University.
Liddle, A.R. (2007). Information criteria for astrophysical model selection. Monthly Notices of the Royal Astronomical Society: Letters, 377(1), 74-78.
Linacre, J.M. (1989). Many facet rasch measurement (Doktora tezi). University of Chicago, Chicago.
Linacre, J.M., Wright B.D. & Lunz M.E. (1990). A Facets Model of Judgmental Scoring. Memo 61. MESA Psychometric Laboratory. University of Chicago. www.rasch.org/memo61.html.
Linacre, J.M. (1994). Many-facet Rasch measurement. Chicago: Mesa Press.
Linacre, J.M. (2003). The hierarchical rater model from a Rasch perspective. Rasch Measurement Transactions (Transactions of the Rasch Measurement SIG American Educational Research Association), 17(2), 928.
Lund, J.L. & Veal, M.L. (2013). Assessment-driven ınstruction in physical education with web resource: A standards-based approach to promoting and documenting learning. Human Kinetics.
Lynch, B.K. & McNamara, T.F. (1998). Using G-theory and many-facet rasch measurement in the development of performance assessments of the ESL speaking skills of ımigrants. Language Testing, 15(2), 158-180.
Mariano, L.T. (2002). Information accumulation, model selection and rater behavior in constructed response student assessments (Doktora Tezi). Carnegie Mellon University, Pennsylvania.
Mariano, L.T. & Junker, B.W. (2007). Covariates of the rating process in hierarchical models for multiple ratings of test items. Journal of Educational and Behavioral Statistics, 32, 287–314.
Mertler, C.A. (2001). Designing scoring rubrics for your classroom. Practical Asssessment Reaserch and Evaluation, 7(25), 1-10.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23.
Nakamura, Y. (2000). Many facet rasch based analsis of communıcative language testing results. Journal of Communication Students, 12, 3-13.
Patz, R. J. & Junker, B. W. (1999a). The hierarchical rater model for rated test items and its application to large-scale assessment data. Annual meeting of the American Educational Research Association’nda sunulan bildiri. Montreal, Quebec, Canada.
Patz, R.J. & Junker, B.W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Patz R.J., Junker B.W. & Johnson M.S. (2000) The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment Data. Revised AERA Paper.
Patz, R.J., Junker, B.W., Johnson, M.S. & Mariano, L.T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341384.
Popham, W.J. (1997). What's wrong-and what's right-with rubrics. Educational Leadership, 55, 72-75
Popham, W.J. (2008). Classroom assessment what teachers need to know. USA: Pearson Education.
Quinlan, A.M. (2011). A complete guide to rubrics: assessment made easy for teachers, kd college. R&L Education.
Roid, G.H. & Haladyna T.M. (1982). A technology for test-ıtem writing. New York: Academic Pres.
Rodriquez, M. C. (2002). Choosing An Item Format. Tindal, G. ve Haladyna, T.M. (Ed.). Large-Scale Assessment Programs For All Students (213-231). New Jersey: Lawrence Erlbaum Associates Publishers.
Spiegelhalter, D., Thomas, A., Best, N. & Lunn, D. (2003). WinBUGS user manual.
Stevens, D. & Levi, A. (2005). Introduction to rubrics. Sterling, Va.: Stylus Pub.
Sudweeks, R.R., Reeve, S. & Bradshaw, W.S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239-261.
Turner, J. (2003). Examining on art portfolio assessment using a many facet rasch measurement model (yayınlanmamış doktora tezi). Boston College, Boston.
Verhelst, N. & Verstralen, H. (2001). IRT models for multiple raters. A. Boomsma, T. Snijders, and M. van Duijn, (Ed.). In essays in ıtem response modeling. New York: Springer-Verlag.
Wilson, M. & Hoskens, M. (2001). The rater bundle model. Journal of Educational and Behavioral Statistics, 26, 283–306.

Çok Değişkenlik Kaynaklı Rasch Ölçme Modeli ve Hiyerarşik Puanlayıcı Modeli İle Kestirilen Puanlayıcı Parametrelerinin Karşılaştırılmas

Year 2017, Volume: 6 Issue: 2, 777 - 798, 25.04.2017

Müge Uluman Ezel Tavşancıl

Abstract

Bu araştırmada, açık
uçlu maddelere ilişkin, aynı sınananlar tarafından verilen yanıtların, birden
fazla puanlayıcı tarafından puanlanması durumunda, çok değişkenlik kaynaklı
Rasch ölçme modeli (ÇDKRÖM) ve hiyerarşik puanlayıcı modeli (HPM) ile
puanlayıcı katılık/cömertlik ve değişkenlik parametrelerinin kestirilmesi ve
her iki modele ilişkin parametrelerin birlikte değerlendirilmesi amaçlanmıştır.
Temel araştırma modelindeki araştırmanın verileri, 2012-2013 eğitim-öğretim
yılı ikinci döneminde Ankara ili Çankaya ilçesinde yer alan, 10 okulda öğrenim
gören, 15 yaş grubu 380 öğrencinin sekiz açık uçlu maddeye verdikleri yanıtlara
beş ortaöğretim matematik öğretmeni tarafından atanmış puanlardan oluşmaktadır.
Araştırma sonucunda, ÇDKRÖM ve HPM puanlayıcı parametre sonuçlarının genel
olarak benzer olduğu saptanmıştır. Her iki modele ait sapma bilgi kriteri
değerlerine göre; HPM’nin ÇDKRÖM’e göre araştırma verilerine daha iyi uyum
sağladığı, tek bir maddenin tek bir yanıtına ilişkin atanan çoklu puanlara ait
bir yapının HPM'yle daha iyi yansıtıldığı sonucuna ulaşılmıştır.

Keywords

Çok değişkenlik kaynaklı Rasch ölçme modeli, hiyerarşik puanlayıcı modeli, puanlayıcı katılık/cömertlik ve değişkenlik parametreleri

References

Airasian, P.W. (2001). Classroom assessment: Concepts and applications. Boston: McGraw-Hill.
Akın, Ö. & Baştürk, R. (2012). Keman Eğitiminde Temel Becerilerin Rasch Ölçme Modeli İle Değerlendirilmesi. Pamukkale Üniversitesi Eğitim Fakültesi Dergisi, 31 (31), 175-187.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Atılgan, H. (2005b). Müzik öğretmenliği özel yetenek seçme sınavının çok-yüzeyli rasch modeli ile analizi (İnönü üniversitesi örneği). Eurasian Journal of Educational Measurement, 20, 62 – 73.
Brennan, R.L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27-34.
Brennan, R.L. (1997). A Perspective on the history of generabability theory. Educational Measurement: Issues and Practice, 16(4), 14-20.
Brennan, R.L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education. 24(1), 1-21.
Cardinet, J., Tourneur, Y. & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18(4), 183-204.
Casabianca, J.M. & Junker, B. (2013). Hierarchical rater models for longitudinal assessments. Annual Meeting of the National Council for Measurement in Education’da sunulan bildiri. San Francisco, California.
Casabianca, J.M. & Junker, B. (2014). The hierarchical rater model for evaluating changes in traits over time. 121st Annual Convention of the American Psychological Association, Division 5: Evaluation, Measurement and Statistics’te sunulan bildiri. Washington D.C.
Christensen, R., Johnson, W., Branscum, A. & Hanson, T.E. (2011). Bayesian ideas and data analysis: An introduction for scientists and statisticians. CRC Press, USA.
DeCarlo, L.T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42(1), 53-76.
DeCarlo, L.T. (2010). Studies of a latent class signal detection model for constructed response scoring II: Incomplete and hierarchical designs. ETS Research Report Series, (08). Princeton, NJ: Educational Testing Service.
DeCarlo, L.T., Kim, Y.K. & Johnson, M.S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48(3), 333-356.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221.
Engelhard, G. (1994). Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112.
Engelhard, G. & Myford, C.M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model. ETS Research Report Series, (01). Princeton, NJ: Educational Testing Service.
Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (1995). Bayesian data analysis. New York, NY: Chapman & Hall.
Iramaneerat, C., Myford, C.M., Yudkowsky, R. & Lowenstein, T. (2009). Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents. Advances İn Health Sciences Education,14(4), 575-594.
Jonsson, A. & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144.
Kastner, M. & Stangla, B. (2011). Multiple choice and constructed response tests: Do test format and scoring matter? Procedia-Social and Behavioral Sciences, 12, 263-273.
Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. USA: Academic Press.
Kim, Y.K. (2009). Combining constructed response items and multiple choice items using a hierarchical rater model (Doktora Tezi). Teachers College, Columbia University.
Liddle, A.R. (2007). Information criteria for astrophysical model selection. Monthly Notices of the Royal Astronomical Society: Letters, 377(1), 74-78.
Linacre, J.M. (1989). Many facet rasch measurement (Doktora tezi). University of Chicago, Chicago.
Linacre, J.M., Wright B.D. & Lunz M.E. (1990). A Facets Model of Judgmental Scoring. Memo 61. MESA Psychometric Laboratory. University of Chicago. www.rasch.org/memo61.html.
Linacre, J.M. (1994). Many-facet Rasch measurement. Chicago: Mesa Press.
Linacre, J.M. (2003). The hierarchical rater model from a Rasch perspective. Rasch Measurement Transactions (Transactions of the Rasch Measurement SIG American Educational Research Association), 17(2), 928.
Lund, J.L. & Veal, M.L. (2013). Assessment-driven ınstruction in physical education with web resource: A standards-based approach to promoting and documenting learning. Human Kinetics.
Lynch, B.K. & McNamara, T.F. (1998). Using G-theory and many-facet rasch measurement in the development of performance assessments of the ESL speaking skills of ımigrants. Language Testing, 15(2), 158-180.
Mariano, L.T. (2002). Information accumulation, model selection and rater behavior in constructed response student assessments (Doktora Tezi). Carnegie Mellon University, Pennsylvania.
Mariano, L.T. & Junker, B.W. (2007). Covariates of the rating process in hierarchical models for multiple ratings of test items. Journal of Educational and Behavioral Statistics, 32, 287–314.
Mertler, C.A. (2001). Designing scoring rubrics for your classroom. Practical Asssessment Reaserch and Evaluation, 7(25), 1-10.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23.
Nakamura, Y. (2000). Many facet rasch based analsis of communıcative language testing results. Journal of Communication Students, 12, 3-13.
Patz, R. J. & Junker, B. W. (1999a). The hierarchical rater model for rated test items and its application to large-scale assessment data. Annual meeting of the American Educational Research Association’nda sunulan bildiri. Montreal, Quebec, Canada.
Patz, R.J. & Junker, B.W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Patz R.J., Junker B.W. & Johnson M.S. (2000) The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment Data. Revised AERA Paper.
Patz, R.J., Junker, B.W., Johnson, M.S. & Mariano, L.T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27(4), 341384.
Popham, W.J. (1997). What's wrong-and what's right-with rubrics. Educational Leadership, 55, 72-75
Popham, W.J. (2008). Classroom assessment what teachers need to know. USA: Pearson Education.
Quinlan, A.M. (2011). A complete guide to rubrics: assessment made easy for teachers, kd college. R&L Education.
Roid, G.H. & Haladyna T.M. (1982). A technology for test-ıtem writing. New York: Academic Pres.
Rodriquez, M. C. (2002). Choosing An Item Format. Tindal, G. ve Haladyna, T.M. (Ed.). Large-Scale Assessment Programs For All Students (213-231). New Jersey: Lawrence Erlbaum Associates Publishers.
Spiegelhalter, D., Thomas, A., Best, N. & Lunn, D. (2003). WinBUGS user manual.
Stevens, D. & Levi, A. (2005). Introduction to rubrics. Sterling, Va.: Stylus Pub.
Sudweeks, R.R., Reeve, S. & Bradshaw, W.S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239-261.
Turner, J. (2003). Examining on art portfolio assessment using a many facet rasch measurement model (yayınlanmamış doktora tezi). Boston College, Boston.
Verhelst, N. & Verstralen, H. (2001). IRT models for multiple raters. A. Boomsma, T. Snijders, and M. van Duijn, (Ed.). In essays in ıtem response modeling. New York: Springer-Verlag.
Wilson, M. & Hoskens, M. (2001). The rater bundle model. Journal of Educational and Behavioral Statistics, 26, 283–306.

There are 50 citations in total.

Details

Journal Section	Articles
Authors	Müge Uluman Ezel Tavşancıl
Publication Date	April 25, 2017
Published in Issue	Year 2017 Volume: 6 Issue: 2

Cite

APA	Uluman, M., & Tavşancıl, E. (2017). Çok Değişkenlik Kaynaklı Rasch Ölçme Modeli ve Hiyerarşik Puanlayıcı Modeli İle Kestirilen Puanlayıcı Parametrelerinin Karşılaştırılmas. İnsan Ve Toplum Bilimleri Araştırmaları Dergisi, 6(2), 777-798. https://doi.org/10.15869/itobiad.296489

Download Cover Image

Article Files

Full Text

Journal of the Human and Social Science Researches is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY NC).

Turkey Journal of Theological Studies is under the İtobiad.