A dialectic on validity: Explanation-focused and the many ways of being human

Bruno D. Zumbo

doi:10.21449/ijate.1406304

Araştırma Makalesi

A dialectic on validity: Explanation-focused and the many ways of being human

Yıl 2023, Cilt: 10 Sayı: Special Issue, 1 - 96, 27.12.2023

Bruno D. Zumbo

https://doi.org/10.21449/ijate.1406304

Öz

In line with the journal volume’s theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description of test validity. There is no single agreed-upon definition of test validity; however, there is a marked coalescing of explanation-centered views at the meta-level. The second part of the essay focuses on the author's development of an explanation-focused view of validity theory with aligned validation methods. The confluence of ideas that motivated and influenced the development of a coherent view of test validity as the explanation for the test score variation and validation is the process of developing and testing the explanation guided by abductive methods and inference to the best explanation. This description also includes a new re-interpretation of true scores in classical test theory afforded by the author’s measure-theoretic mental test theory development—for a particular test-taker, the variation in observed test-taker scores includes measurement error and variation attributable to the different ecological testing settings, which aligns with the explanation-focused view wherein item and test performance are the object of explanatory analyses. The final main section of the essay describes several methodological innovations in explanation-focused validity that are in response to the tensions and changes in assessment in the last 25 years.

Anahtar Kelimeler

Validity, Validation, Test theory, Assessment consequences, True score

Kaynakça

Addey, C., Maddox, B., & Zumbo, B.D. (2020) Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs), Assessment in Education: Principles, Policy & Practice, 27(6), 588-606. https://doi.org/10.1080/0969594X.2020.1843136
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1974). Standards for educational and psychological tests. American Psychological Association.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, & NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html
American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51(2, Pt.2), 1 38. https://doi.org/10.1037/h0053479
Anastasi, A. (1950). The concept of validity in the interpretation of test scores. Educational and Psychological Measurement, 10, 67–78. https://doi.org/10.1177/001316445001000105
Anastasi, A. (1954). Psychological testing (1st ed.). Macmillan.
Angoff, W.H. (1988). Validity: An evolving concept. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 19-32). Lawrence Erlbaum Associates.
Bazire, M., & Brézillon, P. (2005). Understanding Context Before Using It. In: Dey, A., Kokinov, B., Leake, D., Turner, R. (eds) modeling and using context. CONTEXT 2005. Lecture notes in computer science, vol. 3554. Springer. https://doi.org/10.1007/11508373_3
Bingham, W.V. (1937). Aptitudes and aptitude testing. Harper.
Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061 1071. https://doi.org/10.1037/0033 295X.111.4.1061
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Scholten, A.Z., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135–170). IAP Information Age Publishing.
Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press.
Bronfenbrenner, U. (1994). Ecological models of human development. In T. Huston & T.N. Postlethwaith (Eds.), International enclyclopedia of education, 2nd ed., Vol. 3 (pp. 1643-1647). Elsevier Science.
Buckingham, B.R. (1921). Intelligence and its measurement: A symposium. Journal of Educational Psychology, 12, 271–275.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56(2), 81 105. https://doi.org/10.1037/h0046016
Carnap R. (1935). Philosophy and logical syntax. American Mathematical Society.
Chen, M.Y., & Zumbo, B.D. (2017). Ecological framework of item responding as validity evidence: An application of multilevel DIF modeling using PISA data. In: Zumbo, B., Hubley, A. (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_4
ChoGlueck, C. (2018). The error is in the gap: Synthesizing accounts for societal values in science. Philosophy of Science, 85(4), 704-725. https://doi.org/10.1086/699191
Clark, A. (1998). Being there: Putting brain, body, and world together again. MIT press.
Clark, A. (2011). Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.
Courtis, S.A. (1921). Report of the standardization committee. Journal of Educational Research, 4(1), 78–90.
Cronbach, L.J. (1971). Test validation. In: R.L. Thorndike (ed.) Educational measurement, 2nd ed. (pp. 443-507). American Council on Education.
Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum Associates, Inc.
Cronbach, L.J. (1989). Construct validation after thirty years. In R.L. Linn (ed.) Intelligence: Measurement, theory, and public policy: Proceedings of a symposium in honor of Lloyd G. Humphreys (pp. 147-171). University of Illinois Press.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press. https://doi.org/10.1017/CBO9780511524059
de Ayala, R.J. (2009). [Review of Handbook of Statistics, Volume 26: Psychometrics, by C.R. Rao & S. Sinharay]. Journal of the American Statistical Association, 104(487), 1281–1283. http://www.jstor.org/stable/40592308
Dewey, J. (1938). Logic: the theory of inquiry. Holt.
Douglas H. (2000) Inductive risk and values in science. Philosophy of Science, 67, 559–79. https://doi.org/10.1086/392855
Douglas, H. (2003). The Moral Responsibilities of Scientists (Tensions between Autonomy and Responsibility). American Philosophical Quarterly, 40(1), 59 68. http://www.jstor.org/stable/20010097
Douglas, H. (2004). The Irreducible Complexity of Objectivity. Synthese 138, 453–473. https://doi.org/10.1023/B:SYNT.0000016451.18182.91
Douglas, H. (2009). Science, policy, and the value-free ideal. University of Pittsburgh Press.
Douglas, H. (2016), Values in science. In P. Humphries (ed.), The Oxford Handbook of Philosophy of Science (pp. 609 630). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199368815.013.28
Eid, M. (1996). Longitudinal confirmatory factor analysis for polytomous item responses: Model definition and model selection on the basis of stochastic measurement theory. Methods of Psychological Research Online, 1(4), 65-85.
Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65, 241-261. https://doi.org/10.1007/BF02294377
Elliott, K. (2011). Is a little pollution good for you?: incorporating societal values in environmental research. Oxford University Press.
Embretson S.E. (Whitely). (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197. https://doi.org/10.1037/0033-2909.93.1.179
Embretson, S. (1984). A general latent trait model for response processes. Psychometrika, 49(2), 175–186. https://doi.org/10.1007/BF02294171
Embretson, S. (1993). Psychometric models for learning and cognitive processes. In N. Frederiksen, R.J., Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 125– 150). Erlbaum.
Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380 396. https://doi.org/10.1037/1082-989X.3.3.380
Embretson, S.E. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449 455. https://doi.org/10.3102/0013189X07311600
Embretson, S.E. (2016), Understanding Examinees’ Responses to Items: Implications for Measurement. Educational Measurement: Issues and Practice, 35, 6 22. https://doi.org/10.1111/emip.12117
Embretson, S., Schneider, L.M., & Roth, D.L. (1986). Multiple processing strategies and the construct validity of verbal reasoning tests. Journal of Educational Measurement, 23, 13–32. https://doi.org/10.1111/j.1745-3984.1986.tb00231.x
Fine, A.I. (1984). The natural ontological attitude (pp. 261-277). In J. Leplin (ed.), Scientific realism. University of California Press.
Fox, J., Pychyl, T., & Zumbo, B.D. (1997). An investigation of background knowledge in the assessment of language proficiency. In A. Huhta, V. Kohonen, L. Kurki-Suonio, & S. Luoma, (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 1996 (pp. 367 – 383). University of Jyvaskyla Press.
Friedman, M. (1974). Explanation and scientific understanding. The Journal of Philosophy, 71(1), 5–19. https://doi.org/10.2307/2024924
Galupo, M.P., Mitchell, R.C., & Davis, K.S. (2018). Face validity ratings of sexual orientation scales by sexual minority adults: Effects of sexual orientation and gender identity. Archives of Sexual Behavior, 47(4), 1241–1250. https://doi.org/10.1007/s10508-017-1037-y
Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state trait analyses. Psychological Methods, 17(2), 255 283. https://doi.org/10.1037/a0026977
Giere, R.N. (1999). Science without Laws. University of Chicago Press.
Giere, R.N. (2006). Scientific perspectivism. University of Chicago Press. https://doi.org/10.7208/chicago/9780226292144.001.0001
Giere, R.N. (2010). Explaining science: A cognitive approach. University of Chicago Press.
Gigerenzer, G., Swijtink, Z.G., Porter, T.M., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge University Press.
Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
Goffman, E. (1964). The Neglected Situation. American Anthropologist, 66(6), 133–136. http://www.jstor.org/stable/668167
Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33(2), 234–246. https://doi.org/10.1111/j.2044-8317.1980.tb00610.x
Goldstein, H. (1994). Recontextualizing mental measurement. Educational Measurement: Issues and Practice, 12(1), 16-19, 43.
Goldstein H. (1995). Multilevel statistical models (2nd edition). Edward Arnold/Halstead Press.
Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42(2), 139 167. https://doi.org/10.1111/j.2044-8317.1989.tb00905.x
Green, B. F. (1990). A comprehensive assessment of measurement. Contemporary Psychology, 35, 850-851.
Green, C.D. (2015). Why psychology isn’t unified, and probably never will be. Review of General Psychology, 19(3), 207-214. https://doi.org/10.1037/gpr0000051
Guilford, J.P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6(4), 427-438. https://doi.org/10.1177/001316444600600401
Guion, R.M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11(3), 385–398. https://doi.org/10.1037/0735-7028.11.3.385
Gulliksen, H. (1950a). Intrinsic validity. American Psychologist, 5(10), 511 517. https://doi.org/10.1037/h0054604
Gulliksen, H. (1950b). Theory of mental tests. John Wiley & Sons Inc. https://doi.org/10.1037/13240-000
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika 26, 93–107. https://doi.org/10.1007/BF02289688
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. https://doi.org/10.1007/BF02288892
Haig, B.D. (1999). Construct validation and clinical assessment. Behaviour Change, 16, 64 - 73.
Haig, B.D. (2005a). Exploratory factor analysis, theory generation, and scientific method. Multivariate Behavioral Research, 40(3), 303-329.
Haig, B.D. (2005b). An abductive theory of scientific method. Psychological Methods, 10(4), 371–388. https://doi.org/10.1037/1082-989X.10.4.371
Haig, B.D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American journal of psychology, 122(2), 219-234.
Haig, B.D. (2014). Investigating the psychological world: Scientific method in the behavioral sciences. MIT Press.
Haig, B.D. (2018). Exploratory factor analysis, theory generation, and scientific method (pp. 65-88). In: Method matters in psychology. Studies in applied philosophy, epistemology and rational ethics, vol 45. Springer, Cham.
Haig, B.D. (2019). The importance of scientific method for psychological science. Psychology, Crime & Law, 25(6), 527–541. https://doi.org/10.1080/1068316X.2018.1557181
Haig, B.D. (in press). Repositioning construct validity theory: From nomological networks to pragmatic theories, and their evaluation by expiatory means. Perspectives on Psychological Science.
Haig, B.D., & Evers, C.W. (2016). Realist inquiry in social science. Sage.
Hattie, J., & Leeson, H. (2013). Future directions in assessment and testing in education and psychology. In K.F. Geisinger, B.A. Bracken, J.F. Carlson, J.-I. C. Hansen, N.R. Kuncel, S.P. Reise, & M.C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, vol. 3. testing and assessment in school psychology and education (pp. 591–622). American Psychological Association. https://doi.org/10.1037/14049-028
Hempel, C.G. (1965). Aspects of scientific explanation and other essays in the philosophy of science. The Free Press.
Hicks, D.J. (2014). A new direction for science and values. Synthese, 191(14), 3271–3295. http://www.jstor.org/stable/24026188
Higgins, N.C., Zumbo, B.D., & Hay, J.L. (1999). Construct validity of attributional style: Modeling context-dependent item sets in the attributional style questionnaire. Educational and Psychological Measurement, 59(5), 804 820. https://doi.org/10.1177/00131649921970152
Holman, B., & Wilholt, T. (2022). The new demarcation problem. Studies in history and philosophy of science, 91, 211-220. https://doi.org/10.1016/j.shpsa.2021.11.011
Hubley, A.M., & Zumbo, B.D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207 215. https://doi.org/10.1080/00221309.1996.9921273
Hubley, A.M., & Zumbo, B.D. (2011). Validity and the consequences of test interpretation and use. Social Indicators Research, 103(2), 219–230. https://doi.org/10.1007/s11205-011-9843-4
Hubley, A.M., & Zumbo, B.D. (2013). Psychometric characteristics of assessment procedures: An overview. In Kurt F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology, 1 (pp. 3 19). American Psychological Association Press. https://doi.org/10.1037/14047-001
Hubley, A.M., & Zumbo, B.D. (2017). Response processes in the context of validity: Setting the stage. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 1–12). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_1
Hull, C.L. (1935). The conflicting psychologies of learning: A way out. Psychological Review. 42(6), 491–516. https://doi.org/10.1037/h0058665
Jonson, J.L., & Plake, B.S. (1998). A historical comparison of validity standards and validity practices. Educational and Psychological Measurement, 58(5), 736 753. https://doi.org/10.1177/0013164498058005002
Kaldis, B. (2013). Kinds: natural kinds versus human kinds. In Encyclopedia of Philosophy and the Social Sciences,2, (pp. 515 518). SAGE Publications, Inc. https://doi.org/10.4135/9781452276052
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. https://doi.org/10.1037/0033-2909.112.3.527
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspective, 2(3), 135 170. https://doi.org/10.1207/s15366359mea0203_1
Kane, M. (2006). Validation. In R. Brennan (Ed.) Educational measurement (4th ed., pp. 17–64). American Council on Education and Praeger.
Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3-17. https://doi.org/10.1177/0265532211417210
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73. https://doi.org/10.1111/jedm.12000
Kane, M. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192
Kincaid, H. (2000). Global arguments and local realism about the social sciences. Philosophy of Science, 67(S3), S667-S678. https://doi.org/10.1086/392854
Koch, T., Eid, M., & Lochner, K. (2018). Multitrait-multimethod-analysis: The psychometric foundation of CFA-MTMM models. In P. Irwing, T. Booth, & D.J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 781 846). Wiley Blackwell. https://doi.org/10.1002/9781118489772.ch25
Koch, T., Schultze, M., Eid, M., & Geiser, C. (2014). A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods. Frontiers in Psychology, 5, Article 311. https://doi.org/10.3389/fpsyg.2014.00311
Kroc, E., & Zumbo, B.D. (2018). Calibration of measurements. Journal of Modern Applied Statistical Methods, 17(2), eP2780. https://digitalcommons.wayne.edu/jmasm/vol17/iss2/17/
Kroc, E., & Zumbo, B.D. (2020). A transdisciplinary view of measurement error models and the variations of X= T+ E. Journal of Mathematical Psychology, 98, 102372. https://doi.org/10.1016/j.jmp.2020.102372
Kuhn, T.S. (1962). The structure of scientific revolutions. University of Chicago Press.
Kuhn, T.S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.
Kuhn, T.S. (1977). The essential tension: Selected studies in scientific tradition and change. University of Chicago Press.
Kuhn, T.S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
Lakatos I. (1976). Falsification and the methodology of scientific research programmes. Can theories be refuted? (pp. 205–259). Springer.
Lane, S., Zumbo, B.D., Abedi, J., Benson, J., Dossey, J., Elliott, S.N., Kane, M., Linn, R., Paredes-Ziker, C., Rodriguez, M., Schraw, G., Slattery, J., Thomas, V., & Willhoft, J. (2009). Prologue: An Introduction to the Evaluation of NAEP. Applied Measurement in Education, 22(4), 309-316. https://doi.org/10.1080/08957340903221436
Lennon, R.T. (1956). Assumptions Underlying the Use of Content Validity. Educational and Psychological Measurement, 16(3), 294 304. https://doi.org/10.1177/001316445601600303
Lewis, C. (1986). Test theory and psychometrika: The past twenty-five years. Psychometrika, 51(1), 11–22. https://doi.org/10.1007/BF02293995
Li, Z., & Zumbo, B.D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343–370. https://www.uv.es/psicologica/articulos2.09/11LI.pdf
Lipton, P. (2004). Inference to the best explanation (2nd ed.). Routledge. https://doi.org/10.4324/9780203470855
Lissitz, R.W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448. https://doi.org/10.3102/0013189X07311286
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694 (Monograph Supp. 9).
Maddox, B. (2015). The neglected situation: assessment performance and interaction in context. Assessment in Education: Principles, Policy & Practice, 22(4), 427-443. https://doi.org/10.1080/0969594X.2015.1026246
Maddox, B., Zumbo, B.D. (2017). Observing testing situations: Validation as Jazz. In: B.D. Zumbo, A.M. Hubley (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_10
Maddox, B., Zumbo, B.D., Tay-Lim, B. S.-H., & Demin Qu, I. (2015). An anthropologist among the psychometricians: Assessment events, ethnography and DIF in the Mongolian Gobi. International Journal of Testing, 15(4), 291 309. https://doi.org/10.1080/15305058.2015.1017103
Markus, K.A. (1998). Science, measurement, and validity: Is completion of Samuel Messick's synthesis possible?. Social Indicators Research, 45, 7 34. https://doi.org/10.1023/A:1006960823277
MacCorquodale, K., & Meehl, P.E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2), 95 107. https://doi.org/10.1037/h0056029
Mehrens, W.A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16-18.
Messick, S. (1972). Beyond structure: In search of functional models of psychological process. Psychometrika, 37(4, Pt. 1), 357–375. https://doi.org/10.1007/BF02291215
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955- 966.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 33-45). Lawrence Erlbaum Associates.
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. https://doi.org/10.1037/0003-066X.50.9.741
Messick, S. (1998). Test validity: A matter of consequence [Special issue]. Social Indicators Research, 45, 35-44. https://doi.org/10.1023/A:1006964925094
Messick, S. (2000). Consequences of test interpretation and use: The fusion of validity and values in psychological assessment. In: Goffin, R.D., Helmes, E. (eds) Problems and solutions in human assessment. Springer. https://doi.org/10.1007/978-1-4615-4397-8_1
Millman, J. (1979). Reliability and validity of criterion-referenced test scores. In: R. Traub (Ed.), New directions for testing and measurement: Methodological developments. Jossey-Bass.
Mosier, C.I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191 205. https://doi.org/10.1177/001316444700700201
Nickles, T. (2017). Cognitive illusions and nonrealism: Objections and replies. In: Agazzi, E. (eds) Varieties of Scientific Realism: Objectivity and truth in science (pp. 151–163). Springer, Cham. https://doi.org/10.1007/978-3-319-51608-0_8
Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2
O'Leary, T.M., Hattie, J.A.C., & Griffin, P. (2017). Actual interpretations and use of scores as aspects of validity. Educational Measurement: Issues and Practice, 36, 16-23. https://doi.org/10.1111/emip.12141
Padilla, J.L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26, 136–144. https://doi.org/10.7334.psicothema2013.259
Padilla, J.L., & Benítez, I. (2017). A rationale for and demonstration of the use of DIF and mixed methods. In: Zumbo, B.D., Hubley, A.M. (eds) Understanding and investigating response processes in validation research (pp. 193–210). Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_1
Pellicano, E., & den Houting, J. (2022). Annual research review: Shifting from “normal science” to neurodiversity in autism science. Journal of Child Psychology and Psychiatry, 63, 381–396. https://doi.org/10.1111/jcpp.13534
Persson, J., & Ylikoski, P. (Eds.). (2007). Rethinking explanation (Boston Studies in the Philosophy of Science, Vol. 252). Springer.
Pitt, J.C. (Ed.) (1988). Theories of explanation. Oxford University Press.
Popham, W.J. (1997). Consequential validity: Right concern – wrong concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
Psillos, S. (2022). Realism and theory change in science. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/fall2022/entries/realism-theory-change/
Rao, C.R., & Sinharay, S. (Eds.). (2007). Handbook of statistics, Volume 26: Psychometrics. Elsevier.
Raykov, T. (1992), On structural models for analyzing change. Scandinavian Journal of Psychology, 33, 247-265. https://doi.org/10.1111/j.1467-9450.1992.tb00914.x
Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375-385. https://doi.org/10.1177/014662169802200407
Raykov, T. (1998b). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. https://doi.org/10.1177/014662169802200406
Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23(2), 120-126. https://doi.org/10.1177/01466219922031248
Raykov, T. (2001), Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
Raykov, T., & Marcoulides, G.A. (2011). Introduction to psychometric theory. Routledge.
Raykov, T., & Marcoulides, G.A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325–338. https://doi.org/10.1177/0013164415576958
Reichenbach H. (1977). Philosophie der Raum-Zeit-Lehre. In: Kamlah, A., Reichenbach, M. (eds) Philosophie der Raum-Zeit-Lehre. Hans Reichenbach, vol 2. Vieweg+Teubner Verlag, Wiesbaden.
Roberts, B.W. (2007). Contextualizing personality psychology. Journal of Personality, 75(6), 1071–1082. https://doi.org/10.1111/j.1467-6494.2007.00467.x
Rome, L., & Zhang, B. (2018). Investigating the effects of differential item functioning on proficiency classification. Applied psychological measurement, 42(4), 259–274. https://doi.org/10.1177/0146621617726789
Rozeboom, W.W. (1966). Foundations of the theory of prediction. Dorsey.
Rulon, P.J. (1946). On the validity of educational tests. Harvard Educational Review, 16, 290-296.
Salmon, W. (1990). Four decades of scientific explanation. University of Minnesota Press.
Schaffner, K.F. (2020). A comparison of two neurobiological models of fear and anxiety: A “construct validity” application? Perspectives on Psychological Science, 15(5), 1214-1227. https://doi.org/10.1177/1745691620920860
Schaffner, K.F. (1993). Discovery and explanation in biology and medicine. University of Chicago Press.
Searle, J.R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
Searle, J.R. (1979). Expression and meaning: Studies in the theory of speech acts. Cambridge University Press. https://doi.org/10.1017/CBO9780511609213
Sells, S.B. (ed.) (1963). Stimulus determinants of behavior. Ronald Press.
Shear, B.R., Zumbo, B.D. (2014). What counts as evidence: A review of validity studies in educational and psychological measurement. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 91-111). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_6
Shepard, L.A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405-450. https://doi.org/10.3102/0091732X019001405
Shepard, L.A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16, 5-8, 13, 24.
Sinnott-Armstrong, W., & Fogelin, R.J. (2010). Understanding arguments: An introduction to informal logic. Wadsworth Cengage Learning.
Sireci, S.G. (1998). The construct of content validity [Special issue]. Social Indicators Research 45, 83–117. https://doi.org/10.1023/A:1006985528729
Sireci, S.G. (2009). Packing and unpacking sources of validity evidence: History repeats itself again. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–37). IAP Information Age Publishing.
Sireci, S.G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104. https://doi.org/10.1111/jedm.12005
Sireci, S.G. (2020). De-“constructing” test validation. Chinese/English Journal of Educational Measurement and Evaluation, 1(1), Article 3. https://www.ce jeme.org/journal/vol1/iss1/3
Slaney, K.L., & Racine, T.P. (2013). What’s in a name? Psychology’s ever evasive construct. New Ideas in Psychology, 31(1), 4 12. https://doi.org/10.1016/j.newideapsych.2011.02.003
Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
Steyer, R. (1988). Conditional expectations: An introduction to the concept and its applications in empirical sciences. Methodika, 2, 53-78.
Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement models: representation, uniqueness, meaningfulness, identifiability, and testability. Methodika, 3, 25-60.
Steyer, R., Ferring, D., & Schmitt, M.J. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.
Steyer, R., Majcen, A.-M., Schwenkmezger, P., & Buchner, A. (1989). A latent state-trait anxiety model and its application to determine consistency and specificity coefficients. Anxiety Research, 1(4), 281–299. https://doi.org/10.1080/08917778908248726
Steyer, R., & Schmitt, M. (1990). Latent state-trait models in attitude research. Quality & Quantity, 24, 427–445. https://doi.org/10.1007/BF00152014
Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389-408. https://doi.org/10.1002/(SICI)1099 0984(199909/10)13:5<389::AID PER361>3.0.CO;2-A
Stone, J., & Zumbo, B.D. (2016). Validity as a pragmatist project: A global concern with local application. In: Aryadoust V., & Fox J. (eds.) Trends in language assessment research and practice (pp. 555–573). Cambridge Scholars Publishing.
Suppes, P. (1969). Models of data. In: Studies in the methodology and foundations of science. Synthese Library, vol 22. Springer. https://doi.org/10.1007/978-94-017-3173-7_2
Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12(3), 435-467. https://doi.org/10.1017/S0140525X00057046
Thagard, P. (1992). Conceptual revolutions. Princeton University Press. http://www.jstor.org/stable/j.ctv36zq4g
Tolman, C.W. (1991). Review of constructing the subject: Historical origins of psychological research [Review of the book Constructing the subject: Historical origins of psychological research, by K. Danziger]. Canadian Psychology, 32(4), 650–652. https://doi.org/10.1037/h0084651
Toulmin, S. (1958). The uses of argument. Cambridge University Press.
van Fraassen, B.C. (1980). The scientific image. Oxford University Press. https://doi.org/10.1093/0198244274.001.0001
van Fraassen, B.C. (1985). Empiricism in the philosophy of science. In: Churchland P.M., & Hooker C.A. (eds.) Images of science: Essays on realism and empiricism (pp. 245-308). University of Chicago Press.
van Fraassen, B.C. (2008). Scientific representation: Paradoxes of perspective. Oxford University Press.
van Fraassen, B.C. (2012). Modeling and measurement: The criterion of empirical grounding. Philosophy of Science, 79(5), 773–784. https://doi.org/10.1086/667847
Varela, F.J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. The MIT Press. https://doi.org/10.7551/mitpress/6730.001.0001
Wallin, A. (2007). Explanation and environment. In: Persson, J., Ylikoski, P. (eds) Rethinking explanation. Boston studies in the philosophy of science, (pp. 163-175), vol 252. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5581-2_12
Wapner, S., & Demick, J. (2002). The increasing contexts of context in the study of environment behavior relations. In R.B. Bechtel & A. Churchman (eds.) Handbook of environmental psychology (pp. 3–14). John Wiley & Sons, Inc.
Watson, J.B. (1913). Psychology as the behaviorist views it. Psychological Review, 20(2), 158–177. https://doi.org/10.1037/h0074428
Whitely (Embretson), S.E. (1977). Information-processing on intelligence test items: Some response components. Applied Psychological Measurement, 1, 465 476. https://doi.org/10.1177/014662167700100402
Wiley, D.E. (1991). Test validity and invalidity reconsidered. In: R.E. Snow & D.E. Wiley (Eds.), Improving inquiry in social science: a volume in honor of Lee J. Cronbach (pp. 75-107). Erlbaum.
Woitschach, P., Zumbo, B.D., & Fernández-Alonso, R. (2019). An ecological view of measurement: Focus on multilevel model explanation of differential item functioning. Psicothema, 31(2), 194–203. https://doi.org/10.7334/psicothema2018.303
Woodward, J. (1989). Data and phenomena. Synthese, 79, 393 472. https://doi.org/10.1007/BF00869282
Wu, A.D., & Zumbo, B.D. (2008). Understanding and using mediators and moderators. Social Indicators Research, 87, 367–392. https://doi.org/10.1007/s11205-007-9143-1
Wu, A.D., Zumbo, B.D., & Marshall, S.K. (2014). A method to aid in the interpretation of EFA results: An application of Pratt’s measures. International Journal of Behavioral Development, 38(1), 98-110. https://doi.org/10.1177/0165025413506143
Yang, Y., Read, S.J., & Miller, L.C. (2009). The concept of situations. Social and Personality Psychology Compass, 3(6), 1018 1037. https://doi.org/10.1111/j.1751 9004.2009.00236.x
Zimmerman, D.W. (1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395-412. https://doi.org/10.1007/BF02291765
Zimmerman, D.W., & Zumbo, B.D. (2001). The geometry of probability, statistics, and test theory. International Journal of Testing, 1(3 4), 283 303. https://doi.org/10.1080/15305058.2001.9669476
Zumbo, B.D. (Ed.). (1998). Validity theory and the methods used in validation: perspectives from the social and behavioral sciences. In: Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, [Special volume], Vol. 45, Issues 1-3. Springer International Publishing.
Zumbo, B.D. (1999). The simple difference score as an inherently poor measure of change: Some reality, much mythology. Advances in social science methodology, 5(1), 269-304.
Zumbo, B.D. (2005, July). Reflections on validity at the intersection of psychometrics, scaling, philosophy of inquiry, and language testing [Samuel J. Messick Memorial Award Lecture]. LTRC, the 27th Language Testing Research Colloquium, Ottawa, Canada.
Zumbo, B.D. (2007a). Validity: Foundational Issues and Statistical Methodology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 45–79). Elsevier.
Zumbo, B.D. (2007b). Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
Zumbo, B.D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R.W. Lissitz (ed.) The concept of validity: Revisions, new directions, and applications (pp. 65–82). IAP Information Age Publishing.
Zumbo, B.D. (2010, September). Measurement validity and validation: A meditation on where we have come from and the state of the art today [Invited address]. Presented at the International conference on outcomes measurement, US National Institutes of Health, Bethesda, MD.
Zumbo, B.D. (2015, November). Consequences, side effects and the ecology of testing: Keys to considering assessment “in vivo” [Plenary address]. Annual Meeting of the Association for Educational Assessment – Europe (AEAEurope), Glasgow, Scotland. https://youtu.be/0L6Lr2BzuSQ
Zumbo, B.D. (2016). Standard Setting Methodology [Invited address]. “Applied Physiology Physical Employment Standards - Current Issues and Challenges” at the Canadian Society for Exercise Physiology (CSEP) conference, Victoria, Canada.
Zumbo, B.D. (2017). Trending away from routine procedures, toward an ecologically informed in vivo view of validation practices. Measurement: Interdisciplinary Research and Perspectives, 15(3-4), 137–139. https://doi.org/10.1080/15366367.2017.1404367
Zumbo, B.D. (2018a, April). Methodologies used to ensure fairness and equity in the assessment of students’ educational outcomes [Invited presentation and panel session]. AERA Presidential Symposium “Methodology and equity: An international perspective” at the Annual Meeting of the American Educational Research Association (AERA), New York, NY.
Zumbo, B.D. (2018b, July). The reports of DIF’s death are greatly exaggerated; It is like a Phoenix rising from the ashes [Keynote Address]. The 11th Conference of the International Test Commission, Montreal, Canada.
Zumbo, B.D. (2019). Foreword: Tensions, Intersectionality, and What Is on the Horizon for International Large-Scale Assessments in Education. In B. Maddox (Ed.), International large-scale assessments in education: Insider research perspectives (pp. xii–xiv). Bloomsbury Publishing. https://doi.org/10.5040/9781350023635
Zumbo, B.D. (2021). A novel multimethod approach to investigate whether tests delivered at a test centre are concordant with those delivered remotely online [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. http://dx.doi.org/10.14288/1.0400581
Zumbo, B.D. (2023a). Validity theories, frameworks and practices in using tests and measures: an over-the-shoulder look back at validity while also looking to the horizon [Invited Address]. Ciclo Formazione Metodologica (FORME), Dipartimento di Psicologia, Università Cattolica Del Sacro Cuore. https://brunozumbo.com/?page_id=31
Zumbo, B.D. (2023b). Test validation and Bayesian statistical frameworks to estimate the magnitude and corresponding uncertainty of washback effects of test preparation [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. https://dx.doi.org/10.14288/1.0435197
Zumbo, B.D. (2023c, October). The Challenges and Promise of Embracing the Many Ways of Being Human: Toward an Ecologically Informed In Vivo View of Validation Practices [Invited Address]. Symposium on Inclusive Educational Assessment, Neurodiversity and Disability. Hughes Hall, University of Cambridge.
Zumbo, B.D., & Chan, E.K.H. (Eds.). (2014a). Validity and validation in social, behavioral, and health sciences. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-07794-9
Zumbo, B.D., & Chan, E.K.H. (2014b). Reflections on validation practices in the social, behavioral, and health sciences. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 321-327). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_19
Zumbo, B.D., & Chan, E.K.H. (2014c). Setting the stage for validity and validation in social, behavioral, and health sciences: Trends in validation practices. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 3-8). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_1
Zumbo, B.D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J.A. Bovaird, K.F. Geisinger, & C.W. Buckendahl (Eds.), High-stakes testing in education: Science and practice in K–12 settings (pp. 177–190). American Psychological Association. https://doi.org/10.1037/12330-011
Zumbo, B.D., & Gelin, M.N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23. URL: https://files.eric.ed.gov/fulltext/EJ846827.pdf
Zumbo, B. D., & Hubley, A. M. (2016). Bringing consequences and side effects of testing and assessment to the foreground. Assessment in Education: Principles, Policy & Practice, 23(2), 299–303. https://doi.org/10.1080/0969594X.2016.1141169
Zumbo, B.D., & Hubley, A.M. (Eds.). (2017). Understanding and investigating response processes in validation research. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5
Zumbo, B.D., & Kroc, E. (2019). A Measurement Is a Choice and Stevens’ scales of measurement do not help make it: A response to chalmers. Educational and Psychological Measurement, 79(6), 1184 1197. https://doi.org/10.1177/0013164419844305
Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and international educational achievement testing: A case of multi-level validation framed by the ecological model of item responding. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 341-362). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_18
Zumbo, B.D., Liu, Y., Wu, A.D., Shear, B.R., Olvera Astivia, O.L., & Ark, T.K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136 151. https://doi.org/10.1080/15434303.2014.972559
Zumbo, B.D., Maddox, B., & Care, N.M. (2023). Process and product in computer-based assessments: Clearing the ground for a holistic validity framework. European Journal of Psychological Assessment, 39(4), 252–262. https://doi.org/10.1027/1015-5759/a000748
Zumbo, B.D., & Padilla, J.-L. (2020). The interplay between survey research and psychometrics, with a focus on validity theory. In P.C. Beatty, D. Collins, L. Kaye, J.-L. Padilla, G.B. Willis, & A. Wilmot (Eds.), Advances in questionnaire design, development, evaluation and testing (pp. 593 612). John Wiley & Sons, Inc.. https://doi.org/10.1002/9781119263685.ch24
Zumbo, B.D., Pychyl, T.A., & Fox, J.A. (1993). Psychometric properties of the CAEL assessment, II: An examination of the dependability/reliability of placement decisions. Carleton Papers in Applied Language Studies, 10, 13-27.
Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: important advances in reliability and validity theory. In David Kaplan (ed.) The SAGE handbook of quantitative methodology for the social sciences (pp. 74-93). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311
Zumbo, B.D., & Shear, B.R. (2011, October). The concept of validity and some novel validation methods [Lecture/Workshop, half-day]. The 42nd annual Northeastern Educational Research Association (NERA) meeting, Rocky Hill, CT.

A dialectic on validity: Explanation-focused and the many ways of being human

Yıl 2023, Cilt: 10 Sayı: Special Issue, 1 - 96, 27.12.2023

Bruno D. Zumbo

https://doi.org/10.21449/ijate.1406304

Öz

Anahtar Kelimeler

Validity, Validation, Test Theory, Assessment Consequences, True Scores

Kaynakça

Addey, C., Maddox, B., & Zumbo, B.D. (2020) Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs), Assessment in Education: Principles, Policy & Practice, 27(6), 588-606. https://doi.org/10.1080/0969594X.2020.1843136
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1974). Standards for educational and psychological tests. American Psychological Association.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, & NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html
American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51(2, Pt.2), 1 38. https://doi.org/10.1037/h0053479
Anastasi, A. (1950). The concept of validity in the interpretation of test scores. Educational and Psychological Measurement, 10, 67–78. https://doi.org/10.1177/001316445001000105
Anastasi, A. (1954). Psychological testing (1st ed.). Macmillan.
Angoff, W.H. (1988). Validity: An evolving concept. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 19-32). Lawrence Erlbaum Associates.
Bazire, M., & Brézillon, P. (2005). Understanding Context Before Using It. In: Dey, A., Kokinov, B., Leake, D., Turner, R. (eds) modeling and using context. CONTEXT 2005. Lecture notes in computer science, vol. 3554. Springer. https://doi.org/10.1007/11508373_3
Bingham, W.V. (1937). Aptitudes and aptitude testing. Harper.
Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061 1071. https://doi.org/10.1037/0033 295X.111.4.1061
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Scholten, A.Z., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135–170). IAP Information Age Publishing.
Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press.
Bronfenbrenner, U. (1994). Ecological models of human development. In T. Huston & T.N. Postlethwaith (Eds.), International enclyclopedia of education, 2nd ed., Vol. 3 (pp. 1643-1647). Elsevier Science.
Buckingham, B.R. (1921). Intelligence and its measurement: A symposium. Journal of Educational Psychology, 12, 271–275.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56(2), 81 105. https://doi.org/10.1037/h0046016
Carnap R. (1935). Philosophy and logical syntax. American Mathematical Society.
Chen, M.Y., & Zumbo, B.D. (2017). Ecological framework of item responding as validity evidence: An application of multilevel DIF modeling using PISA data. In: Zumbo, B., Hubley, A. (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_4
ChoGlueck, C. (2018). The error is in the gap: Synthesizing accounts for societal values in science. Philosophy of Science, 85(4), 704-725. https://doi.org/10.1086/699191
Clark, A. (1998). Being there: Putting brain, body, and world together again. MIT press.
Clark, A. (2011). Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.
Courtis, S.A. (1921). Report of the standardization committee. Journal of Educational Research, 4(1), 78–90.
Cronbach, L.J. (1971). Test validation. In: R.L. Thorndike (ed.) Educational measurement, 2nd ed. (pp. 443-507). American Council on Education.
Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum Associates, Inc.
Cronbach, L.J. (1989). Construct validation after thirty years. In R.L. Linn (ed.) Intelligence: Measurement, theory, and public policy: Proceedings of a symposium in honor of Lloyd G. Humphreys (pp. 147-171). University of Illinois Press.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press. https://doi.org/10.1017/CBO9780511524059
de Ayala, R.J. (2009). [Review of Handbook of Statistics, Volume 26: Psychometrics, by C.R. Rao & S. Sinharay]. Journal of the American Statistical Association, 104(487), 1281–1283. http://www.jstor.org/stable/40592308
Dewey, J. (1938). Logic: the theory of inquiry. Holt.
Douglas H. (2000) Inductive risk and values in science. Philosophy of Science, 67, 559–79. https://doi.org/10.1086/392855
Douglas, H. (2003). The Moral Responsibilities of Scientists (Tensions between Autonomy and Responsibility). American Philosophical Quarterly, 40(1), 59 68. http://www.jstor.org/stable/20010097
Douglas, H. (2004). The Irreducible Complexity of Objectivity. Synthese 138, 453–473. https://doi.org/10.1023/B:SYNT.0000016451.18182.91
Douglas, H. (2009). Science, policy, and the value-free ideal. University of Pittsburgh Press.
Douglas, H. (2016), Values in science. In P. Humphries (ed.), The Oxford Handbook of Philosophy of Science (pp. 609 630). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199368815.013.28
Eid, M. (1996). Longitudinal confirmatory factor analysis for polytomous item responses: Model definition and model selection on the basis of stochastic measurement theory. Methods of Psychological Research Online, 1(4), 65-85.
Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65, 241-261. https://doi.org/10.1007/BF02294377
Elliott, K. (2011). Is a little pollution good for you?: incorporating societal values in environmental research. Oxford University Press.
Embretson S.E. (Whitely). (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197. https://doi.org/10.1037/0033-2909.93.1.179
Embretson, S. (1984). A general latent trait model for response processes. Psychometrika, 49(2), 175–186. https://doi.org/10.1007/BF02294171
Embretson, S. (1993). Psychometric models for learning and cognitive processes. In N. Frederiksen, R.J., Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 125– 150). Erlbaum.
Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380 396. https://doi.org/10.1037/1082-989X.3.3.380
Embretson, S.E. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449 455. https://doi.org/10.3102/0013189X07311600
Embretson, S.E. (2016), Understanding Examinees’ Responses to Items: Implications for Measurement. Educational Measurement: Issues and Practice, 35, 6 22. https://doi.org/10.1111/emip.12117
Embretson, S., Schneider, L.M., & Roth, D.L. (1986). Multiple processing strategies and the construct validity of verbal reasoning tests. Journal of Educational Measurement, 23, 13–32. https://doi.org/10.1111/j.1745-3984.1986.tb00231.x
Fine, A.I. (1984). The natural ontological attitude (pp. 261-277). In J. Leplin (ed.), Scientific realism. University of California Press.
Fox, J., Pychyl, T., & Zumbo, B.D. (1997). An investigation of background knowledge in the assessment of language proficiency. In A. Huhta, V. Kohonen, L. Kurki-Suonio, & S. Luoma, (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 1996 (pp. 367 – 383). University of Jyvaskyla Press.
Friedman, M. (1974). Explanation and scientific understanding. The Journal of Philosophy, 71(1), 5–19. https://doi.org/10.2307/2024924
Galupo, M.P., Mitchell, R.C., & Davis, K.S. (2018). Face validity ratings of sexual orientation scales by sexual minority adults: Effects of sexual orientation and gender identity. Archives of Sexual Behavior, 47(4), 1241–1250. https://doi.org/10.1007/s10508-017-1037-y
Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state trait analyses. Psychological Methods, 17(2), 255 283. https://doi.org/10.1037/a0026977
Giere, R.N. (1999). Science without Laws. University of Chicago Press.
Giere, R.N. (2006). Scientific perspectivism. University of Chicago Press. https://doi.org/10.7208/chicago/9780226292144.001.0001
Giere, R.N. (2010). Explaining science: A cognitive approach. University of Chicago Press.
Gigerenzer, G., Swijtink, Z.G., Porter, T.M., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge University Press.
Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
Goffman, E. (1964). The Neglected Situation. American Anthropologist, 66(6), 133–136. http://www.jstor.org/stable/668167
Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33(2), 234–246. https://doi.org/10.1111/j.2044-8317.1980.tb00610.x
Goldstein, H. (1994). Recontextualizing mental measurement. Educational Measurement: Issues and Practice, 12(1), 16-19, 43.
Goldstein H. (1995). Multilevel statistical models (2nd edition). Edward Arnold/Halstead Press.
Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42(2), 139 167. https://doi.org/10.1111/j.2044-8317.1989.tb00905.x
Green, B. F. (1990). A comprehensive assessment of measurement. Contemporary Psychology, 35, 850-851.
Green, C.D. (2015). Why psychology isn’t unified, and probably never will be. Review of General Psychology, 19(3), 207-214. https://doi.org/10.1037/gpr0000051
Guilford, J.P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6(4), 427-438. https://doi.org/10.1177/001316444600600401
Guion, R.M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11(3), 385–398. https://doi.org/10.1037/0735-7028.11.3.385
Gulliksen, H. (1950a). Intrinsic validity. American Psychologist, 5(10), 511 517. https://doi.org/10.1037/h0054604
Gulliksen, H. (1950b). Theory of mental tests. John Wiley & Sons Inc. https://doi.org/10.1037/13240-000
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika 26, 93–107. https://doi.org/10.1007/BF02289688
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. https://doi.org/10.1007/BF02288892
Haig, B.D. (1999). Construct validation and clinical assessment. Behaviour Change, 16, 64 - 73.
Haig, B.D. (2005a). Exploratory factor analysis, theory generation, and scientific method. Multivariate Behavioral Research, 40(3), 303-329.
Haig, B.D. (2005b). An abductive theory of scientific method. Psychological Methods, 10(4), 371–388. https://doi.org/10.1037/1082-989X.10.4.371
Haig, B.D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American journal of psychology, 122(2), 219-234.
Haig, B.D. (2014). Investigating the psychological world: Scientific method in the behavioral sciences. MIT Press.
Haig, B.D. (2018). Exploratory factor analysis, theory generation, and scientific method (pp. 65-88). In: Method matters in psychology. Studies in applied philosophy, epistemology and rational ethics, vol 45. Springer, Cham.
Haig, B.D. (2019). The importance of scientific method for psychological science. Psychology, Crime & Law, 25(6), 527–541. https://doi.org/10.1080/1068316X.2018.1557181
Haig, B.D. (in press). Repositioning construct validity theory: From nomological networks to pragmatic theories, and their evaluation by expiatory means. Perspectives on Psychological Science.
Haig, B.D., & Evers, C.W. (2016). Realist inquiry in social science. Sage.
Hattie, J., & Leeson, H. (2013). Future directions in assessment and testing in education and psychology. In K.F. Geisinger, B.A. Bracken, J.F. Carlson, J.-I. C. Hansen, N.R. Kuncel, S.P. Reise, & M.C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, vol. 3. testing and assessment in school psychology and education (pp. 591–622). American Psychological Association. https://doi.org/10.1037/14049-028
Hempel, C.G. (1965). Aspects of scientific explanation and other essays in the philosophy of science. The Free Press.
Hicks, D.J. (2014). A new direction for science and values. Synthese, 191(14), 3271–3295. http://www.jstor.org/stable/24026188
Higgins, N.C., Zumbo, B.D., & Hay, J.L. (1999). Construct validity of attributional style: Modeling context-dependent item sets in the attributional style questionnaire. Educational and Psychological Measurement, 59(5), 804 820. https://doi.org/10.1177/00131649921970152
Holman, B., & Wilholt, T. (2022). The new demarcation problem. Studies in history and philosophy of science, 91, 211-220. https://doi.org/10.1016/j.shpsa.2021.11.011
Hubley, A.M., & Zumbo, B.D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207 215. https://doi.org/10.1080/00221309.1996.9921273
Hubley, A.M., & Zumbo, B.D. (2011). Validity and the consequences of test interpretation and use. Social Indicators Research, 103(2), 219–230. https://doi.org/10.1007/s11205-011-9843-4
Hubley, A.M., & Zumbo, B.D. (2013). Psychometric characteristics of assessment procedures: An overview. In Kurt F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology, 1 (pp. 3 19). American Psychological Association Press. https://doi.org/10.1037/14047-001
Hubley, A.M., & Zumbo, B.D. (2017). Response processes in the context of validity: Setting the stage. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 1–12). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_1
Hull, C.L. (1935). The conflicting psychologies of learning: A way out. Psychological Review. 42(6), 491–516. https://doi.org/10.1037/h0058665
Jonson, J.L., & Plake, B.S. (1998). A historical comparison of validity standards and validity practices. Educational and Psychological Measurement, 58(5), 736 753. https://doi.org/10.1177/0013164498058005002
Kaldis, B. (2013). Kinds: natural kinds versus human kinds. In Encyclopedia of Philosophy and the Social Sciences,2, (pp. 515 518). SAGE Publications, Inc. https://doi.org/10.4135/9781452276052
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. https://doi.org/10.1037/0033-2909.112.3.527
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspective, 2(3), 135 170. https://doi.org/10.1207/s15366359mea0203_1
Kane, M. (2006). Validation. In R. Brennan (Ed.) Educational measurement (4th ed., pp. 17–64). American Council on Education and Praeger.
Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3-17. https://doi.org/10.1177/0265532211417210
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73. https://doi.org/10.1111/jedm.12000
Kane, M. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192
Kincaid, H. (2000). Global arguments and local realism about the social sciences. Philosophy of Science, 67(S3), S667-S678. https://doi.org/10.1086/392854
Koch, T., Eid, M., & Lochner, K. (2018). Multitrait-multimethod-analysis: The psychometric foundation of CFA-MTMM models. In P. Irwing, T. Booth, & D.J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 781 846). Wiley Blackwell. https://doi.org/10.1002/9781118489772.ch25
Koch, T., Schultze, M., Eid, M., & Geiser, C. (2014). A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods. Frontiers in Psychology, 5, Article 311. https://doi.org/10.3389/fpsyg.2014.00311
Kroc, E., & Zumbo, B.D. (2018). Calibration of measurements. Journal of Modern Applied Statistical Methods, 17(2), eP2780. https://digitalcommons.wayne.edu/jmasm/vol17/iss2/17/
Kroc, E., & Zumbo, B.D. (2020). A transdisciplinary view of measurement error models and the variations of X= T+ E. Journal of Mathematical Psychology, 98, 102372. https://doi.org/10.1016/j.jmp.2020.102372
Kuhn, T.S. (1962). The structure of scientific revolutions. University of Chicago Press.
Kuhn, T.S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.
Kuhn, T.S. (1977). The essential tension: Selected studies in scientific tradition and change. University of Chicago Press.
Kuhn, T.S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
Lakatos I. (1976). Falsification and the methodology of scientific research programmes. Can theories be refuted? (pp. 205–259). Springer.
Lane, S., Zumbo, B.D., Abedi, J., Benson, J., Dossey, J., Elliott, S.N., Kane, M., Linn, R., Paredes-Ziker, C., Rodriguez, M., Schraw, G., Slattery, J., Thomas, V., & Willhoft, J. (2009). Prologue: An Introduction to the Evaluation of NAEP. Applied Measurement in Education, 22(4), 309-316. https://doi.org/10.1080/08957340903221436
Lennon, R.T. (1956). Assumptions Underlying the Use of Content Validity. Educational and Psychological Measurement, 16(3), 294 304. https://doi.org/10.1177/001316445601600303
Lewis, C. (1986). Test theory and psychometrika: The past twenty-five years. Psychometrika, 51(1), 11–22. https://doi.org/10.1007/BF02293995
Li, Z., & Zumbo, B.D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343–370. https://www.uv.es/psicologica/articulos2.09/11LI.pdf
Lipton, P. (2004). Inference to the best explanation (2nd ed.). Routledge. https://doi.org/10.4324/9780203470855
Lissitz, R.W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448. https://doi.org/10.3102/0013189X07311286
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694 (Monograph Supp. 9).
Maddox, B. (2015). The neglected situation: assessment performance and interaction in context. Assessment in Education: Principles, Policy & Practice, 22(4), 427-443. https://doi.org/10.1080/0969594X.2015.1026246
Maddox, B., Zumbo, B.D. (2017). Observing testing situations: Validation as Jazz. In: B.D. Zumbo, A.M. Hubley (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_10
Maddox, B., Zumbo, B.D., Tay-Lim, B. S.-H., & Demin Qu, I. (2015). An anthropologist among the psychometricians: Assessment events, ethnography and DIF in the Mongolian Gobi. International Journal of Testing, 15(4), 291 309. https://doi.org/10.1080/15305058.2015.1017103
Markus, K.A. (1998). Science, measurement, and validity: Is completion of Samuel Messick's synthesis possible?. Social Indicators Research, 45, 7 34. https://doi.org/10.1023/A:1006960823277
MacCorquodale, K., & Meehl, P.E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2), 95 107. https://doi.org/10.1037/h0056029
Mehrens, W.A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16-18.
Messick, S. (1972). Beyond structure: In search of functional models of psychological process. Psychometrika, 37(4, Pt. 1), 357–375. https://doi.org/10.1007/BF02291215
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955- 966.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 33-45). Lawrence Erlbaum Associates.
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. https://doi.org/10.1037/0003-066X.50.9.741
Messick, S. (1998). Test validity: A matter of consequence [Special issue]. Social Indicators Research, 45, 35-44. https://doi.org/10.1023/A:1006964925094
Messick, S. (2000). Consequences of test interpretation and use: The fusion of validity and values in psychological assessment. In: Goffin, R.D., Helmes, E. (eds) Problems and solutions in human assessment. Springer. https://doi.org/10.1007/978-1-4615-4397-8_1
Millman, J. (1979). Reliability and validity of criterion-referenced test scores. In: R. Traub (Ed.), New directions for testing and measurement: Methodological developments. Jossey-Bass.
Mosier, C.I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191 205. https://doi.org/10.1177/001316444700700201
Nickles, T. (2017). Cognitive illusions and nonrealism: Objections and replies. In: Agazzi, E. (eds) Varieties of Scientific Realism: Objectivity and truth in science (pp. 151–163). Springer, Cham. https://doi.org/10.1007/978-3-319-51608-0_8
Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2
O'Leary, T.M., Hattie, J.A.C., & Griffin, P. (2017). Actual interpretations and use of scores as aspects of validity. Educational Measurement: Issues and Practice, 36, 16-23. https://doi.org/10.1111/emip.12141
Padilla, J.L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26, 136–144. https://doi.org/10.7334.psicothema2013.259
Padilla, J.L., & Benítez, I. (2017). A rationale for and demonstration of the use of DIF and mixed methods. In: Zumbo, B.D., Hubley, A.M. (eds) Understanding and investigating response processes in validation research (pp. 193–210). Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_1
Pellicano, E., & den Houting, J. (2022). Annual research review: Shifting from “normal science” to neurodiversity in autism science. Journal of Child Psychology and Psychiatry, 63, 381–396. https://doi.org/10.1111/jcpp.13534
Persson, J., & Ylikoski, P. (Eds.). (2007). Rethinking explanation (Boston Studies in the Philosophy of Science, Vol. 252). Springer.
Pitt, J.C. (Ed.) (1988). Theories of explanation. Oxford University Press.
Popham, W.J. (1997). Consequential validity: Right concern – wrong concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
Psillos, S. (2022). Realism and theory change in science. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/fall2022/entries/realism-theory-change/
Rao, C.R., & Sinharay, S. (Eds.). (2007). Handbook of statistics, Volume 26: Psychometrics. Elsevier.
Raykov, T. (1992), On structural models for analyzing change. Scandinavian Journal of Psychology, 33, 247-265. https://doi.org/10.1111/j.1467-9450.1992.tb00914.x
Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375-385. https://doi.org/10.1177/014662169802200407
Raykov, T. (1998b). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. https://doi.org/10.1177/014662169802200406
Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23(2), 120-126. https://doi.org/10.1177/01466219922031248
Raykov, T. (2001), Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
Raykov, T., & Marcoulides, G.A. (2011). Introduction to psychometric theory. Routledge.
Raykov, T., & Marcoulides, G.A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325–338. https://doi.org/10.1177/0013164415576958
Reichenbach H. (1977). Philosophie der Raum-Zeit-Lehre. In: Kamlah, A., Reichenbach, M. (eds) Philosophie der Raum-Zeit-Lehre. Hans Reichenbach, vol 2. Vieweg+Teubner Verlag, Wiesbaden.
Roberts, B.W. (2007). Contextualizing personality psychology. Journal of Personality, 75(6), 1071–1082. https://doi.org/10.1111/j.1467-6494.2007.00467.x
Rome, L., & Zhang, B. (2018). Investigating the effects of differential item functioning on proficiency classification. Applied psychological measurement, 42(4), 259–274. https://doi.org/10.1177/0146621617726789
Rozeboom, W.W. (1966). Foundations of the theory of prediction. Dorsey.
Rulon, P.J. (1946). On the validity of educational tests. Harvard Educational Review, 16, 290-296.
Salmon, W. (1990). Four decades of scientific explanation. University of Minnesota Press.
Schaffner, K.F. (2020). A comparison of two neurobiological models of fear and anxiety: A “construct validity” application? Perspectives on Psychological Science, 15(5), 1214-1227. https://doi.org/10.1177/1745691620920860
Schaffner, K.F. (1993). Discovery and explanation in biology and medicine. University of Chicago Press.
Searle, J.R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
Searle, J.R. (1979). Expression and meaning: Studies in the theory of speech acts. Cambridge University Press. https://doi.org/10.1017/CBO9780511609213
Sells, S.B. (ed.) (1963). Stimulus determinants of behavior. Ronald Press.
Shear, B.R., Zumbo, B.D. (2014). What counts as evidence: A review of validity studies in educational and psychological measurement. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 91-111). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_6
Shepard, L.A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405-450. https://doi.org/10.3102/0091732X019001405
Shepard, L.A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16, 5-8, 13, 24.
Sinnott-Armstrong, W., & Fogelin, R.J. (2010). Understanding arguments: An introduction to informal logic. Wadsworth Cengage Learning.
Sireci, S.G. (1998). The construct of content validity [Special issue]. Social Indicators Research 45, 83–117. https://doi.org/10.1023/A:1006985528729
Sireci, S.G. (2009). Packing and unpacking sources of validity evidence: History repeats itself again. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–37). IAP Information Age Publishing.
Sireci, S.G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104. https://doi.org/10.1111/jedm.12005
Sireci, S.G. (2020). De-“constructing” test validation. Chinese/English Journal of Educational Measurement and Evaluation, 1(1), Article 3. https://www.ce jeme.org/journal/vol1/iss1/3
Slaney, K.L., & Racine, T.P. (2013). What’s in a name? Psychology’s ever evasive construct. New Ideas in Psychology, 31(1), 4 12. https://doi.org/10.1016/j.newideapsych.2011.02.003
Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
Steyer, R. (1988). Conditional expectations: An introduction to the concept and its applications in empirical sciences. Methodika, 2, 53-78.
Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement models: representation, uniqueness, meaningfulness, identifiability, and testability. Methodika, 3, 25-60.
Steyer, R., Ferring, D., & Schmitt, M.J. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.
Steyer, R., Majcen, A.-M., Schwenkmezger, P., & Buchner, A. (1989). A latent state-trait anxiety model and its application to determine consistency and specificity coefficients. Anxiety Research, 1(4), 281–299. https://doi.org/10.1080/08917778908248726
Steyer, R., & Schmitt, M. (1990). Latent state-trait models in attitude research. Quality & Quantity, 24, 427–445. https://doi.org/10.1007/BF00152014
Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389-408. https://doi.org/10.1002/(SICI)1099 0984(199909/10)13:5<389::AID PER361>3.0.CO;2-A
Stone, J., & Zumbo, B.D. (2016). Validity as a pragmatist project: A global concern with local application. In: Aryadoust V., & Fox J. (eds.) Trends in language assessment research and practice (pp. 555–573). Cambridge Scholars Publishing.
Suppes, P. (1969). Models of data. In: Studies in the methodology and foundations of science. Synthese Library, vol 22. Springer. https://doi.org/10.1007/978-94-017-3173-7_2
Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12(3), 435-467. https://doi.org/10.1017/S0140525X00057046
Thagard, P. (1992). Conceptual revolutions. Princeton University Press. http://www.jstor.org/stable/j.ctv36zq4g
Tolman, C.W. (1991). Review of constructing the subject: Historical origins of psychological research [Review of the book Constructing the subject: Historical origins of psychological research, by K. Danziger]. Canadian Psychology, 32(4), 650–652. https://doi.org/10.1037/h0084651
Toulmin, S. (1958). The uses of argument. Cambridge University Press.
van Fraassen, B.C. (1980). The scientific image. Oxford University Press. https://doi.org/10.1093/0198244274.001.0001
van Fraassen, B.C. (1985). Empiricism in the philosophy of science. In: Churchland P.M., & Hooker C.A. (eds.) Images of science: Essays on realism and empiricism (pp. 245-308). University of Chicago Press.
van Fraassen, B.C. (2008). Scientific representation: Paradoxes of perspective. Oxford University Press.
van Fraassen, B.C. (2012). Modeling and measurement: The criterion of empirical grounding. Philosophy of Science, 79(5), 773–784. https://doi.org/10.1086/667847
Varela, F.J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. The MIT Press. https://doi.org/10.7551/mitpress/6730.001.0001
Wallin, A. (2007). Explanation and environment. In: Persson, J., Ylikoski, P. (eds) Rethinking explanation. Boston studies in the philosophy of science, (pp. 163-175), vol 252. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5581-2_12
Wapner, S., & Demick, J. (2002). The increasing contexts of context in the study of environment behavior relations. In R.B. Bechtel & A. Churchman (eds.) Handbook of environmental psychology (pp. 3–14). John Wiley & Sons, Inc.
Watson, J.B. (1913). Psychology as the behaviorist views it. Psychological Review, 20(2), 158–177. https://doi.org/10.1037/h0074428
Whitely (Embretson), S.E. (1977). Information-processing on intelligence test items: Some response components. Applied Psychological Measurement, 1, 465 476. https://doi.org/10.1177/014662167700100402
Wiley, D.E. (1991). Test validity and invalidity reconsidered. In: R.E. Snow & D.E. Wiley (Eds.), Improving inquiry in social science: a volume in honor of Lee J. Cronbach (pp. 75-107). Erlbaum.
Woitschach, P., Zumbo, B.D., & Fernández-Alonso, R. (2019). An ecological view of measurement: Focus on multilevel model explanation of differential item functioning. Psicothema, 31(2), 194–203. https://doi.org/10.7334/psicothema2018.303
Woodward, J. (1989). Data and phenomena. Synthese, 79, 393 472. https://doi.org/10.1007/BF00869282
Wu, A.D., & Zumbo, B.D. (2008). Understanding and using mediators and moderators. Social Indicators Research, 87, 367–392. https://doi.org/10.1007/s11205-007-9143-1
Wu, A.D., Zumbo, B.D., & Marshall, S.K. (2014). A method to aid in the interpretation of EFA results: An application of Pratt’s measures. International Journal of Behavioral Development, 38(1), 98-110. https://doi.org/10.1177/0165025413506143
Yang, Y., Read, S.J., & Miller, L.C. (2009). The concept of situations. Social and Personality Psychology Compass, 3(6), 1018 1037. https://doi.org/10.1111/j.1751 9004.2009.00236.x
Zimmerman, D.W. (1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395-412. https://doi.org/10.1007/BF02291765
Zimmerman, D.W., & Zumbo, B.D. (2001). The geometry of probability, statistics, and test theory. International Journal of Testing, 1(3 4), 283 303. https://doi.org/10.1080/15305058.2001.9669476
Zumbo, B.D. (Ed.). (1998). Validity theory and the methods used in validation: perspectives from the social and behavioral sciences. In: Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, [Special volume], Vol. 45, Issues 1-3. Springer International Publishing.
Zumbo, B.D. (1999). The simple difference score as an inherently poor measure of change: Some reality, much mythology. Advances in social science methodology, 5(1), 269-304.
Zumbo, B.D. (2005, July). Reflections on validity at the intersection of psychometrics, scaling, philosophy of inquiry, and language testing [Samuel J. Messick Memorial Award Lecture]. LTRC, the 27th Language Testing Research Colloquium, Ottawa, Canada.
Zumbo, B.D. (2007a). Validity: Foundational Issues and Statistical Methodology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 45–79). Elsevier.
Zumbo, B.D. (2007b). Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
Zumbo, B.D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R.W. Lissitz (ed.) The concept of validity: Revisions, new directions, and applications (pp. 65–82). IAP Information Age Publishing.
Zumbo, B.D. (2010, September). Measurement validity and validation: A meditation on where we have come from and the state of the art today [Invited address]. Presented at the International conference on outcomes measurement, US National Institutes of Health, Bethesda, MD.
Zumbo, B.D. (2015, November). Consequences, side effects and the ecology of testing: Keys to considering assessment “in vivo” [Plenary address]. Annual Meeting of the Association for Educational Assessment – Europe (AEAEurope), Glasgow, Scotland. https://youtu.be/0L6Lr2BzuSQ
Zumbo, B.D. (2016). Standard Setting Methodology [Invited address]. “Applied Physiology Physical Employment Standards - Current Issues and Challenges” at the Canadian Society for Exercise Physiology (CSEP) conference, Victoria, Canada.
Zumbo, B.D. (2017). Trending away from routine procedures, toward an ecologically informed in vivo view of validation practices. Measurement: Interdisciplinary Research and Perspectives, 15(3-4), 137–139. https://doi.org/10.1080/15366367.2017.1404367
Zumbo, B.D. (2018a, April). Methodologies used to ensure fairness and equity in the assessment of students’ educational outcomes [Invited presentation and panel session]. AERA Presidential Symposium “Methodology and equity: An international perspective” at the Annual Meeting of the American Educational Research Association (AERA), New York, NY.
Zumbo, B.D. (2018b, July). The reports of DIF’s death are greatly exaggerated; It is like a Phoenix rising from the ashes [Keynote Address]. The 11th Conference of the International Test Commission, Montreal, Canada.
Zumbo, B.D. (2019). Foreword: Tensions, Intersectionality, and What Is on the Horizon for International Large-Scale Assessments in Education. In B. Maddox (Ed.), International large-scale assessments in education: Insider research perspectives (pp. xii–xiv). Bloomsbury Publishing. https://doi.org/10.5040/9781350023635
Zumbo, B.D. (2021). A novel multimethod approach to investigate whether tests delivered at a test centre are concordant with those delivered remotely online [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. http://dx.doi.org/10.14288/1.0400581
Zumbo, B.D. (2023a). Validity theories, frameworks and practices in using tests and measures: an over-the-shoulder look back at validity while also looking to the horizon [Invited Address]. Ciclo Formazione Metodologica (FORME), Dipartimento di Psicologia, Università Cattolica Del Sacro Cuore. https://brunozumbo.com/?page_id=31
Zumbo, B.D. (2023b). Test validation and Bayesian statistical frameworks to estimate the magnitude and corresponding uncertainty of washback effects of test preparation [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. https://dx.doi.org/10.14288/1.0435197
Zumbo, B.D. (2023c, October). The Challenges and Promise of Embracing the Many Ways of Being Human: Toward an Ecologically Informed In Vivo View of Validation Practices [Invited Address]. Symposium on Inclusive Educational Assessment, Neurodiversity and Disability. Hughes Hall, University of Cambridge.
Zumbo, B.D., & Chan, E.K.H. (Eds.). (2014a). Validity and validation in social, behavioral, and health sciences. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-07794-9
Zumbo, B.D., & Chan, E.K.H. (2014b). Reflections on validation practices in the social, behavioral, and health sciences. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 321-327). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_19
Zumbo, B.D., & Chan, E.K.H. (2014c). Setting the stage for validity and validation in social, behavioral, and health sciences: Trends in validation practices. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 3-8). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_1
Zumbo, B.D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J.A. Bovaird, K.F. Geisinger, & C.W. Buckendahl (Eds.), High-stakes testing in education: Science and practice in K–12 settings (pp. 177–190). American Psychological Association. https://doi.org/10.1037/12330-011
Zumbo, B.D., & Gelin, M.N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23. URL: https://files.eric.ed.gov/fulltext/EJ846827.pdf
Zumbo, B. D., & Hubley, A. M. (2016). Bringing consequences and side effects of testing and assessment to the foreground. Assessment in Education: Principles, Policy & Practice, 23(2), 299–303. https://doi.org/10.1080/0969594X.2016.1141169
Zumbo, B.D., & Hubley, A.M. (Eds.). (2017). Understanding and investigating response processes in validation research. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5
Zumbo, B.D., & Kroc, E. (2019). A Measurement Is a Choice and Stevens’ scales of measurement do not help make it: A response to chalmers. Educational and Psychological Measurement, 79(6), 1184 1197. https://doi.org/10.1177/0013164419844305
Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and international educational achievement testing: A case of multi-level validation framed by the ecological model of item responding. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 341-362). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_18
Zumbo, B.D., Liu, Y., Wu, A.D., Shear, B.R., Olvera Astivia, O.L., & Ark, T.K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136 151. https://doi.org/10.1080/15434303.2014.972559
Zumbo, B.D., Maddox, B., & Care, N.M. (2023). Process and product in computer-based assessments: Clearing the ground for a holistic validity framework. European Journal of Psychological Assessment, 39(4), 252–262. https://doi.org/10.1027/1015-5759/a000748
Zumbo, B.D., & Padilla, J.-L. (2020). The interplay between survey research and psychometrics, with a focus on validity theory. In P.C. Beatty, D. Collins, L. Kaye, J.-L. Padilla, G.B. Willis, & A. Wilmot (Eds.), Advances in questionnaire design, development, evaluation and testing (pp. 593 612). John Wiley & Sons, Inc.. https://doi.org/10.1002/9781119263685.ch24
Zumbo, B.D., Pychyl, T.A., & Fox, J.A. (1993). Psychometric properties of the CAEL assessment, II: An examination of the dependability/reliability of placement decisions. Carleton Papers in Applied Language Studies, 10, 13-27.
Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: important advances in reliability and validity theory. In David Kaplan (ed.) The SAGE handbook of quantitative methodology for the social sciences (pp. 74-93). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311
Zumbo, B.D., & Shear, B.R. (2011, October). The concept of validity and some novel validation methods [Lecture/Workshop, half-day]. The 42nd annual Northeastern Educational Research Association (NERA) meeting, Rocky Hill, CT.

Toplam 229 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları, Ölçek Geliştirme, Psikolojik Metodoloji, Tasarım ve Analiz
Bölüm	Special Issue 2023
Yazarlar	Bruno D. Zumbo 0000-0003-2885-5724
Yayımlanma Tarihi	27 Aralık 2023
Gönderilme Tarihi	18 Aralık 2023
Kabul Tarihi	19 Aralık 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 10 Sayı: Special Issue

Kaynak Göster

APA	Zumbo, B. D. (2023). A dialectic on validity: Explanation-focused and the many ways of being human. International Journal of Assessment Tools in Education, 10(Special Issue), 1-96. https://doi.org/10.21449/ijate.1406304

Makale Dosyaları

Tam Metin

23824 23823 23825