A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Meral Alkan; Nuri Doğan

doi:10.21031/epod.1210917

Research Article

Year 2023, Volume: 14 Issue: 2, 106 - 117, 30.06.2023

Meral Alkan Nuri Doğan

https://doi.org/10.21031/epod.1210917

Abstract

References

Atılgan, H. (2008). Using generalizability theory to assess the score realibility of the special ability selection examinations for music education programmes in higher education. International Journal of Research and Method Education, 31(1), 63-76. https://doi.org/10.1080/17437270801919925.
Atılgan, H., Kan, A. & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme. (5. Baskı). Anı Yayıncılık.
Balbağ, M., Leblebicier, K., Karaer G., Sarıkahya E. & Erkan Ö. (2016). Türkiye'de fen eğitimi ve öğretimi sorunları. Eğitim ve Öğretim Araştırmaları Dergisi, 5(3), 1-12. http://www.jret.org/FileUpload/ks281142/File/02.m._zafer_balbag.pdf
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM
Bernardin, H. J. & Villanova, P. (2005). Research streams in rater self-efficacy. Group and Organizational Management, 30, 61-88. https://doi.org/10.1177/1059601104267675
Biemer, L. (1993). Trends-social studies /authentic assessment. Educational Leadership, 50 (8). https://www.ascd.org/el/articles/-authentic-assessment
Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
Demir, E. (2010). Uluslararası öğrenci değerlendirme programı (PISA) bilişsel alan testlerinde yer alan soru tiplerine göre Türkiye’de öğrenci başarıları (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi.
EARGED (2010). PISA 2009 projesi, ulusal ön raporu. 15 Mart 2011 tarihinde http://earged.meb.gov.tr/pdf/pisa2009rapor.pdf adresinden erişilmiştir.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercises Science, 5(1), 13-34. https://doi.org/10.1207/S15327841MPEE0501_2
Güler, N. (2013). Eğitimde ölçme ve değerlendirme (5. Baskı). Pegem Akademi.
Hathcoat, J. D., & Penn, J. D. (2012). Generalizability of student writing across multiple tasks: A challenge for authentic assessment. Research & Practice in Assessment, 7, 16-28. https://files.eric.ed.gov/fulltext/EJ1062689.pdf
Karasar, N. (1998). Araştırmalarda rapor hazırlama yöntemi. Pars Matbaacılık
Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task and scoring method contribution. Language Testing in Asia 11, Article 30 https://doi.org/10.1186/s40468-021-00134-5
Konak, Ö. A. (2010). Eğitim ve öğretim etkinlikleri üzerine. Cito Eğitim: Kuram ve Uygulama Dergisi, 10, 4-5.
Kutlu, Ö. (2006). Üst düzey zihinsel süreçleri belirleme yolları: Yeni durum belirleme yaklaşımları. Çağdaş Eğitim Dergisi, 31(335), 15-21. https://search.trdizin.gov.tr/tr/yayin/detay/74516/
Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. ETS. http://www.ets.org/Media/Research/pdf/RM-04-07.pdf
Mcbee, M., & Barnes, L. (1998), The generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194. https://doi.org/10.1207/s15324818ame1102_4
MEB (2017). Akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) projesi. 1 Eylül 2022 tarihinde http://abide.meb.gov.tr/proje-hakkinda.asp adresinden erişilmiştir.
Mushquash, C., & O’Connor, B.P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods 38, 542–547 https://doi.org/10.3758/BF03192810
Nalbantoğlu, F. & Gelbal, S. (2011). İletişim becerileri istasyonu örneğinde genellenebilirlik kuramıyla farklı desenlerin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 509-518. http://www.efdergi.hacettepe.edu.tr/shw_artcl-718.html
OECD (2012). PISA 2009 technical report, PISA, OECD Publishing. http://dx.doi.org/10.1787/9789264167872-en
OECD (2017), OECD (2017), PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, PISA, OECD Publishing http://dx.doi.org/10.1787/9789264281820-en
ÖSYM (2013). Açık uçlu sorularla deneme sınavı: Soru/cevap kitapçığının yayımlanması www.osym.gov.tr/belge/1-19413/acik-uclu-sorularla-deneme-sinavi-sorucevap-kitapcigini-.html adresinden erişim sağlanmıştır.
ÖSYM. (2017). Açık uçlu sorular hakkında bilgilendirme ve açık uçlu soru örnekleri. https://www.osym.gov.tr/TR,12909/2017-lisans-yerlestirme-sinavlari-2017-lys-acik-uclu-sorular-hakkinda-bilgilendirme-ve-acik-uclu-soru-ornekleri-05012017.html adresinden erişim sağlanmıştır.
Özçelik, D. A. (2010). Ölçme ve değerlendirme. Pegem Akademi.
Polat, M. & Turhan, N. (2021) Applying generalizability theory in language testing: Comparing nested and crossed scoring designs in the assessment of speaking skills, International Journal of Curriculum and Instruction,13(3), 3344–3358. https://ijci.globets.org/index.php/IJCI/article/view/825/409
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94 (1), 31-37. https://doi.org/10.5951/MT.94.1.0031
Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing 22(1) 1-30. https://doi.org/10.1191/0265532205lt295oa
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970 https://doi.org/10.1037/0021-9010.85.6.956
Sharma, F. & Weathers, D. (2003). Assessing generalizability of scales used in cross-national research. International Journal of Research in Marketing, 20, 287-295. http://dx.doi.org/10.1016/S0167-8116(03)00038-7
Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications
Smith, Teresa A. (1997 March 24-28). The Generalizability of Scoring TIMSS Open-Ended Items. (Report). Annual Meeting of the American Educational Research Association, Chicago, USA
Turgut, F. M. (1992) Eğitimde ölçme ve değerlendirme metotları. (9. Baskı). Saydam Matbaacılık.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287. https://doi.org/10.1177/026553229801500205
Wexley, K. N. & Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275. https://psycnet.apa.org/doi/10.1111/j.2044-8325.1985.tb00200.x
Zorba, İ. (2020). Personel alımında kullanılan bir yazılı sınav sonucunun genellenebilirlik kuramındaki farklı desenlerle karşılaştırılması (Yayımlanmamış yüksek lisans tezi). Ankara Üniversitesi.

A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Year 2023, Volume: 14 Issue: 2, 106 - 117, 30.06.2023

Meral Alkan Nuri Doğan

https://doi.org/10.21031/epod.1210917

Abstract

This study compares the different designs obtained through four raters’ scoring the open-ended items used in PISA 2009 reading literacy altogether or alternately according to the Generalizability Theory. The sample of the research was composed of 362 students (out of 4996 students participating in PISA 2009) who responded to the items of reading skills and who were scored by more than one rater. Two designs were created so as to be used in generalizability theory in the study. One of them was the crossed design symbolized as “s x i x r” (student x item x rater), in which students are scored by each rater in terms of the same skills. The second was the nested design symbolized as “(r:s) x i”, where each rater scored only a group of students and raters are nested in students and the items were crossed with these variables. On comparing the s x i x r design with (r:s) x i design, it was found that the relative and absolute error variances estimated for (r:s) x i design were smaller than those for s x i x r design and that therefore the G and Phi coefficients took on bigger values. On increasing the number of raters in both designs, the G and Phi coefficients also increased in the D study. While acceptable values of G and Phi coefficients were reached on reducing the number of raters by half in Booklet 2, raising the number of raters seemed more appropriate in Booklet 8.

Keywords

Generalizability theory, reliability, G study, D study, PISA 2009

References

Atılgan, H. (2008). Using generalizability theory to assess the score realibility of the special ability selection examinations for music education programmes in higher education. International Journal of Research and Method Education, 31(1), 63-76. https://doi.org/10.1080/17437270801919925.
Atılgan, H., Kan, A. & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme. (5. Baskı). Anı Yayıncılık.
Balbağ, M., Leblebicier, K., Karaer G., Sarıkahya E. & Erkan Ö. (2016). Türkiye'de fen eğitimi ve öğretimi sorunları. Eğitim ve Öğretim Araştırmaları Dergisi, 5(3), 1-12. http://www.jret.org/FileUpload/ks281142/File/02.m._zafer_balbag.pdf
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM
Bernardin, H. J. & Villanova, P. (2005). Research streams in rater self-efficacy. Group and Organizational Management, 30, 61-88. https://doi.org/10.1177/1059601104267675
Biemer, L. (1993). Trends-social studies /authentic assessment. Educational Leadership, 50 (8). https://www.ascd.org/el/articles/-authentic-assessment
Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
Demir, E. (2010). Uluslararası öğrenci değerlendirme programı (PISA) bilişsel alan testlerinde yer alan soru tiplerine göre Türkiye’de öğrenci başarıları (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi.
EARGED (2010). PISA 2009 projesi, ulusal ön raporu. 15 Mart 2011 tarihinde http://earged.meb.gov.tr/pdf/pisa2009rapor.pdf adresinden erişilmiştir.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercises Science, 5(1), 13-34. https://doi.org/10.1207/S15327841MPEE0501_2
Güler, N. (2013). Eğitimde ölçme ve değerlendirme (5. Baskı). Pegem Akademi.
Hathcoat, J. D., & Penn, J. D. (2012). Generalizability of student writing across multiple tasks: A challenge for authentic assessment. Research & Practice in Assessment, 7, 16-28. https://files.eric.ed.gov/fulltext/EJ1062689.pdf
Karasar, N. (1998). Araştırmalarda rapor hazırlama yöntemi. Pars Matbaacılık
Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task and scoring method contribution. Language Testing in Asia 11, Article 30 https://doi.org/10.1186/s40468-021-00134-5
Konak, Ö. A. (2010). Eğitim ve öğretim etkinlikleri üzerine. Cito Eğitim: Kuram ve Uygulama Dergisi, 10, 4-5.
Kutlu, Ö. (2006). Üst düzey zihinsel süreçleri belirleme yolları: Yeni durum belirleme yaklaşımları. Çağdaş Eğitim Dergisi, 31(335), 15-21. https://search.trdizin.gov.tr/tr/yayin/detay/74516/
Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. ETS. http://www.ets.org/Media/Research/pdf/RM-04-07.pdf
Mcbee, M., & Barnes, L. (1998), The generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194. https://doi.org/10.1207/s15324818ame1102_4
MEB (2017). Akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) projesi. 1 Eylül 2022 tarihinde http://abide.meb.gov.tr/proje-hakkinda.asp adresinden erişilmiştir.
Mushquash, C., & O’Connor, B.P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods 38, 542–547 https://doi.org/10.3758/BF03192810
Nalbantoğlu, F. & Gelbal, S. (2011). İletişim becerileri istasyonu örneğinde genellenebilirlik kuramıyla farklı desenlerin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 509-518. http://www.efdergi.hacettepe.edu.tr/shw_artcl-718.html
OECD (2012). PISA 2009 technical report, PISA, OECD Publishing. http://dx.doi.org/10.1787/9789264167872-en
OECD (2017), OECD (2017), PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, PISA, OECD Publishing http://dx.doi.org/10.1787/9789264281820-en
ÖSYM (2013). Açık uçlu sorularla deneme sınavı: Soru/cevap kitapçığının yayımlanması www.osym.gov.tr/belge/1-19413/acik-uclu-sorularla-deneme-sinavi-sorucevap-kitapcigini-.html adresinden erişim sağlanmıştır.
ÖSYM. (2017). Açık uçlu sorular hakkında bilgilendirme ve açık uçlu soru örnekleri. https://www.osym.gov.tr/TR,12909/2017-lisans-yerlestirme-sinavlari-2017-lys-acik-uclu-sorular-hakkinda-bilgilendirme-ve-acik-uclu-soru-ornekleri-05012017.html adresinden erişim sağlanmıştır.
Özçelik, D. A. (2010). Ölçme ve değerlendirme. Pegem Akademi.
Polat, M. & Turhan, N. (2021) Applying generalizability theory in language testing: Comparing nested and crossed scoring designs in the assessment of speaking skills, International Journal of Curriculum and Instruction,13(3), 3344–3358. https://ijci.globets.org/index.php/IJCI/article/view/825/409
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94 (1), 31-37. https://doi.org/10.5951/MT.94.1.0031
Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing 22(1) 1-30. https://doi.org/10.1191/0265532205lt295oa
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970 https://doi.org/10.1037/0021-9010.85.6.956
Sharma, F. & Weathers, D. (2003). Assessing generalizability of scales used in cross-national research. International Journal of Research in Marketing, 20, 287-295. http://dx.doi.org/10.1016/S0167-8116(03)00038-7
Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications
Smith, Teresa A. (1997 March 24-28). The Generalizability of Scoring TIMSS Open-Ended Items. (Report). Annual Meeting of the American Educational Research Association, Chicago, USA
Turgut, F. M. (1992) Eğitimde ölçme ve değerlendirme metotları. (9. Baskı). Saydam Matbaacılık.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287. https://doi.org/10.1177/026553229801500205
Wexley, K. N. & Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275. https://psycnet.apa.org/doi/10.1111/j.2044-8325.1985.tb00200.x
Zorba, İ. (2020). Personel alımında kullanılan bir yazılı sınav sonucunun genellenebilirlik kuramındaki farklı desenlerle karşılaştırılması (Yayımlanmamış yüksek lisans tezi). Ankara Üniversitesi.

There are 37 citations in total.

Details

Primary Language	English
Subjects	Test Theories
Journal Section	Articles
Authors	Meral Alkan 0000-0001-9497-3660 Nuri Doğan 0000-0001-6274-2016
Publication Date	June 30, 2023
Acceptance Date	June 12, 2023
Published in Issue	Year 2023 Volume: 14 Issue: 2

Cite

APA	Alkan, M., & Doğan, N. (2023). A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 14(2), 106-117. https://doi.org/10.21031/epod.1210917

Download Cover Image

Article Files

Full Text