Skip to main content

SPECIALTY GRAND CHALLENGE article

Front. Educ., 13 February 2017
Sec. Assessment, Testing and Applied Measurement

The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error

  • Faculty of Education and Social Work, School of Learning, Development, and Professional Practice, The University of Auckland, Auckland, New Zealand

Assessment faces continuing challenges. These challenges arise predominantly due to the inherent errors we make when designing, administering, analyzing, and interpreting assessments. A widely held assumption is that our psychometric methods lead to reliable and valid scores; however, this premise depends on students exercising 100% effort throughout a test event, with no cheating, and having had sufficient personal environmental support to produce best possible results (Dorans, 2012).

Inconveniently, research makes clear that cheating and lack of effort contaminate scores (Murdock et al., 2016; Wise and Smith, 2016). This is especially the case in low-stakes testing situations, such as institutional evaluations (Wise and Cotten, 2009), leading to inappropriate conclusions about the state of an organization or jurisdiction. Hence, while it is convenient to presume that statistical advances will account for such systematic sources of error, the reality is that much assessment takes place both “in vivo” and “in situ” during classroom activities (Zumbo, 2015). Thus, while psychometric methods work reasonably well in high-stakes examination or standardized testing contexts (i.e., “in vitro”), there is little guarantee that these assumptions hold true for what happens in classroom contexts. Thus, the psychometric and testing industry has much to do to develop methods of describing and accounting for the myriad complexities of classroom- or school-based dynamics.

This matters because a widespread policy framework of using assessment to guide or inform improvement (i.e., “assessment for learning” or “formative assessment”) requires teachers to assess students so as to identify the quality of student learning and appropriate changes to classroom practices. UK experts tend to argue that this can only be done through teacher–student interaction in the classroom or by involving students in the process of considering the merits of their own or peers’ work (Black et al., 2003; Harlen, 2007; Swaffield, 2011). Others consider that tests can contribute information about changes to teaching that lead to better learning outcomes, provided the tests go beyond rank order or total score reporting (Brown and Hattie, 2012) or if teachers spend time analyzing strengths and weaknesses (Carless, 2011).

Regardless of the type of assessment method, it is very difficult for pre-service teachers to learn how to assess formatively (Hill and Eyers, 2016). Indeed, even practicing teachers need expertise in curriculum and pedagogy to exercise command of multiple methods of assessment in such a way that all learners are helped to overcome the, sometimes idiosyncratic, challenges they face (Cowie and Harrison, 2016; Moon, 2016). Teachers in New Zealand and Netherlands have learned to use achievement data to guide school-wide improvements, provided experts give them help (Lai and Schildkamp, 2016). However, such efforts often take 2–3 years before changes can be seen in student performance. Thus, despite multiple studies which show that teachers believe in using assessment formatively (Barnes et al., 2015; Bonner, 2016), putting in place policy and resources to support formative assessment is difficult, meaning formative assessment is not a quick fix for improving outcomes for all learners.

The formative assessment policy agenda challenges the dominance of formal testing and teacher-centric methods of assessment, with expectations that effective learning takes place as students engage with learning targets, outcomes, or objectives, take ownership of their work, cooperate with peers, understand more deeply what quality is, and receive and generate appropriate feedback (Leahy et al., 2005). Inconveniently, involving students in assessment presents considerable challenge due to psychological and social factors that interfere with the student’s ability to accurately self-evaluate (Andrade and Brown, 2016) or to constructively peer evaluate and collaborate (Panadero, 2016; Strijbos, 2016). Indeed, evidence that student involvement in assessment develops self-regulatory abilities is weak (Dinsmore and Wilson, 2016). Feedback processes are complex, belying the simple notion that student “horses” will automatically learn once they are led to the “water” of feedback (Lipnevich et al., 2016). While novelty in assessment methods is being developed, especially through introduction of ICT (Katz and Gorin, 2016), it is true that students are not necessarily fans of new ways of being assessed for fear their performance will be impacted (Struyven and Devesa, 2016).

A second widespread policy initiative is to use assessments, especially standardized tests, to evaluate teachers, schools, and systems (Lingard and Lewis, 2016; Teltemann and Klieme, 2016). It is clear that such policies tend to have largely negative impact on the quality of teaching (Hamilton, 2003; Nichols and Harris, 2016), and perhaps more so among minority and lower socio-economic communities. Nonetheless, public acceptance of the legitimacy of using assessment scores to ascertain quality in schooling is reasonably high (Buckendahl, 2016). Using tests to evaluate schools and teaching is a relatively quick and low-cost political process (Linn, 2000). However, summative accountability use of assessments creates tensions for teachers (Bonner, 2016), with many teachers in high-stakes accountability environments having very negative views of such uses (Deneen and Brown, 2016). Using assessments formatively requires discovery of what students have “failed” to be good at, so as to inform further instruction (Hattie and Brown, 2008). This implies that a formative assessment ought to reveal lack of success, a problematic event if external accountability consequences are attached to the same result. Thus, if consequences for low scores are seen as unfair, then it is not surprising if teachers use multiple methods to ensure that scores increase. If accountability assessment scores are inflated through construct-irrelevant processes, then the meaning of an accountability assessment is problematic.

The choice of policy priorities within different jurisdictions strongly shapes the nature and power of assessment practices. For example, both Arabic and Chinese language societies strongly prioritize memorization of content as the dominant model of schooling and attach substantial social and economic benefits for successful performance on formal examinations (Hargreaves, 1997; OECD, 2011; Gebril, 2016; Kennedy, 2016). Anglo-Commonwealth countries strongly prioritize a child-centered, student-involved approach (Stobart, 2006), in which interactive teacher assessment practices have been prioritized as means of improving learning outcomes (Black and Wiliam, 1998). The United States has strong legal protection for special needs students (IDEA, 1997) who are entitled to differentiated assessment and evaluation practices (Tomlinson, 1999). These differences in social uses and styles of assessment complicate the meaning of a grade or score and create challenges for psychometric models that attempt to create universal explanations of performance.

Within societies that are highly homogenous in terms of ethnic and linguistic make-up (e.g., Finland, Japan, China), it may be reasonable to expect that common psychological and social factors influence assessment. This simplifies predicting and modeling those factors. However, when comparisons are made among culturally distinct groups in multicultural societies, which is more the case in economically developed societies and nations (Van de Vijver, 2016), the psychological factors influencing student response, teacher judgments, or test performance can vary significantly. For example, tendencies to self-effacement or self-enhancement are not equal across cultural groups (Suzuki et al., 2008), so the meaning of self-assessment has to be carefully evaluated (i.e., among collectivist groups modest self-reporting enhances group belongingness). In multicultural contexts, assessments that depend on classroom interactions between and among students and teachers are likely to be impacted by these different cultural standards as to the best way to communicate an evaluation of work. The capacity of teachers to appropriately collect, analyze, and plan in response to both formal and informal assessment data is generally weak (Xu and Brown, 2016). Quite prolonged and intensive professional development is needed to generate “assessment capable” teachers (Smith et al., 2014). Thus, assessors and assessments are challenged by the varying and subtle differences created by cultural difference.

Even the introduction of technological solutions that increase the authenticity, diversity, and efficiency of formal testing (Csapó et al., 2012; Katz and Gorin, 2016) does not necessarily improve student performance or solve problems in scoring. Students’ enthusiasm for a computerized activity does not automatically lead to valid conclusions about their proficiency. Students are often concerned that novel assessment practices (including peer assessment, self-assessment, portfolio, performance, or computer-based assessments) will have negative impacts on their performance simply because they are unsure as to how well they will do on a new method of evaluation (Struyven and Devesa, 2016). Consequently, students tend to retreat into a strong preference for conventional assessment practices (e.g., essays or multiple-choice questions). Furthermore, technology now permits data sharing and long-term tracking of student performance, which ought to improve our understanding of how students are improving in which areas. However, the existence of these electronic data raises concerns about privacy and protection; imagine possible negative implications if early poor performance is kept on record and used in evaluative decision-making, despite substantial subsequent progress (Tierney and Koch, 2016).

Thus, inconveniently, the field of testing, applied psychometrics, measurement, and assessment is faced with complex problems, which are not restricted to any one form of assessment or any one society in which assessment is deployed. The inconveniences outlined here are especially the case if we accept that the goal of assessment is to inform improvement and make valid decisions about learners and teachers. The need for accurate diagnostic prescriptions that teachers, students, and/or parents could use to inform improvement is paramount. These prescriptions need to occur close to and responsive to the real-time processes of classroom learning and teaching, which is a substantial problem. The great contribution of psychometrics to the field of education has been an explicit attention to the problem of error in all testing, measurement, and assessment processes. However, few tools are currently available to robustly estimate and account for the kinds of error that occur in real-time classroom observations, interactions, and interpretations. The inconvenient challenge for educators who would minimize the role assessment plays in curriculum is that high-quality tests and measurements are necessary for justice, fairness, and the well-being of individuals and society. The inconvenient challenge for policy makers is that many assessment processes are not reliable or dependable (e.g., essay examinations; Brown, 2010), nor do they account well for the many factors outlined here. Thus, many policy decisions based on inadequate tools or processes are invalid.

The future of assessment requires that we no longer ignore these inconvenient problems facing assessment, testing, and applied measurement. Rather, assessment has to turn constructively to deeply insightful investigations into these perennial problems. Teachers and students need to know where learning is and what is next. Policy makers and parents have a right to know what is working, who is learning, who needs help, what needs to change, and so on. Assessment and testing are how we as humans discover the answers to these questions. Hence, good schooling and good education need good testing or assessment, both in the sense of high-quality and rightly done.

Leaning heavily on validity theory (Messick, 1989; Kane, 2006), good assessment leads to defensible interpretations and actions. These uses depend on robust arguments based on relevant theories of curriculum, teaching, learning, and measurement and on trustworthy empirical evidence that has been subjected to scrutiny (i.e., statistical and/or social moderation). The need to bring greater skill and insight into assessments that inform classroom practice is essential. The success of the whole superstructure of schooling relies on the quality of judgments and evaluations carried out in the millions of classrooms of the world on an everyday basis. If this work is not done well, and if we do not know that it is not done well, we fail.

Hence, engaging in the difficult challenges of how assessment can help education, while also making a credible case for the scores or judgments generated by assessments, needs to be reported. Leaving this only to educational statisticians would be a mistake. Testing and measurement need to integrate with classroom teaching, learning, and curriculum if it is to support schooling and prevent politicians from making simplistic but wrong interpretations and uses of assessment. This is the Grand Challenge for this Section of the journal Frontiers in Education. How can assessment be made flexible enough to support real learning in vivo, while fulfilling all the diverse expectations society has for it? As Section Editor, I look forward to your contributions.

Author Contributions

The author confirms being the sole contributor of this work and approved it for publication.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This paper draws heavily on Brown and Harris (2016). An earlier version of this paper, presented as an inaugural professorial lecture at the Faculty of Education and Social Work, The University of Auckland, can be seen at doi: 10.17608/k6.auckland.4238792.v1.

References

Andrade, H. L., and Brown, G. T. L. (2016). “Student self-assessment in the classroom,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 319–334.

Google Scholar

Barnes, N., Fives, H., and Dacey, C. M. (2015). “Teachers’ beliefs about assessment,” in International Handbook of Research on Teacher Beliefs, eds H. Fives and M. Gregoire Gill (New York: Routledge), 284–300.

Google Scholar

Black, P., Harrison, C., Lee, C., Marshall, B., and Wiliam, D. (2003). Assessment for Learning: Putting It into Practice. Maidenhead: Open University Press.

Google Scholar

Black, P., and Wiliam, D. (1998). Assessment and classroom learning. Assess. Educ. 5, 7–74. doi:10.1080/0969595980050102

CrossRef Full Text | Google Scholar

Bonner, S. M. (2016). “Teachers’ perceptions about assessment: competing narratives,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 21–39.

Google Scholar

Brown, G. T. L. (2010). The validity of examination essays in higher education: issues and responses. High. Educ. Q. 64, 276–291. doi:10.1111/j.1468-2273.2010.00460.x

CrossRef Full Text | Google Scholar

Brown, G. T. L., and Harris, L. R. (2016). “The future of assessment research as a human and social endeavour,” in Handbook of Human and Social Factors in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 506–523.

Google Scholar

Brown, G. T. L., and Hattie, J. A. (2012). “The benefits of regular standardized assessment in childhood education: Guiding improved instruction and learning,” in Contemporary Debates in Childhood Education and Development, eds S. Suggate and E. Reese (London: Routledge), 287–292.

Google Scholar

Buckendahl, C. W. (2016). “Public perceptions about assessment in education,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 454–471.

Google Scholar

Carless, D. (2011). From Testing to Productive Student Learning: Implementing Formative Assessment in Confucian-Heritage settings. London: Routledge.

Google Scholar

Cowie, B., and Harrison, C. (2016). “Classroom processes that support effective assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 335–350.

Google Scholar

Csapó, B., Ainley, J., Bennett, R. E., Latour, T., and Law, N. (2012). “Technological issues for computer-based assessment,” in Assessment and Teaching of 21st Century Skills, eds P. Griffin, B. McGaw, and E. Care (Dordrecht, NL: Springer), 143–230.

Google Scholar

Deneen, C. C., and Brown, G. T. L. (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Educ. 3, 1225380. doi:10.1080/2331186X.2016.1225380

CrossRef Full Text | Google Scholar

Dinsmore, D. L., and Wilson, H. E. (2016). “Student participation in assessment: does it influence self-regulation?” in Handbook of Human and Social Factors in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 145–168.

Google Scholar

Dorans, N. J. (2012). The contestant perspective on taking tests: emanations from the statue within. Educ. Meas. 31, 20–37. doi:10.1111/j.1745-3992.2012.00250.x

CrossRef Full Text | Google Scholar

Gebril, A. (2016). “Educational assessment in Muslim countries: values, polices, and practices,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 420–435.

Google Scholar

Hamilton, L. (2003). Assessment as a policy tool. Rev. Res. Educ. 27, 25–68. doi:10.3102/0091732X027001025

CrossRef Full Text | Google Scholar

Hargreaves, E. (1997). The diploma disease in Egypt: learning, teaching and the monster of the secondary leaving certificate. Assess. Educ. 4, 161–176. doi:10.1080/0969594970040111

CrossRef Full Text | Google Scholar

Harlen, W. (2007). Assessment of Learning. Los Angeles: SAGE.

Google Scholar

Hattie, J. A., and Brown, G. T. L. (2008). Technology for school-based assessment and assessment for learning: development principles from New Zealand. J. Educ. Technol. Syst. 36, 189–201. doi:10.2190/ET.36.2.g

CrossRef Full Text | Google Scholar

Hill, M. F., and Eyers, G. (2016). “Moving from student to teacher: changing perspectives about assessment through teacher education,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 57–76.

Google Scholar

IDEA. (1997). Individuals with Disabilities Act, Pub.L. 101-476 C.F.R. § §1400 et seq.

Google Scholar

Kane, M. T. (2006). “Validation,” in Educational Measurement, 4th Edn, ed. R. L. Brennan (Westport, CT: Praeger), 17–64.

Google Scholar

Katz, I. R., and Gorin, J. S. (2016). “Computerising assessment: impacts on education stakeholders,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 472–489.

Google Scholar

Kennedy, K. J. (2016). “Exploring the influence of culture on assessment: the case of teachers’ conceptions of assessment in Confucian-Heritage Cultures,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Harris and L. R. Brown (New York: Routledge), 404–419.

Google Scholar

Lai, M. K., and Schildkamp, K. (2016). “In-service teacher professional learning: use of assessment in data-based decision-making,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 77–94.

Google Scholar

Leahy, S., Lyon, C., Thompson, M., and Wiliam, D. (2005). Classroom assessment minute by minute, day by day. Educ. Leadersh. 63, 18–24.

Google Scholar

Lingard, B., and Lewis, S. (2016). “Globalization of the Anglo-American approach to top-down, test-based educational accountability,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 387–403.

Google Scholar

Linn, R. L. (2000). Assessments and accountability. Educ. Res. 29, 4–16. doi:10.3102/0013189X029003004

CrossRef Full Text | Google Scholar

Lipnevich, A. A., Berg, D. A. G., and Smith, J. K. (2016). “Toward a model of student response to feedback,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 169–185.

Google Scholar

Messick, S. (1989). “Validity,” in Educational Measurement, 3rd Edn, ed. R. L. Linn (Old Tappan, NJ: MacMillan), 13–103.

Google Scholar

Moon, T. R. (2016). “Differentiated instruction and assessment: an approach to classroom assessment in conditions of student diversity,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 284–301.

Google Scholar

Murdock, T. B., Stephens, J. M., and Groteweil, M. M. (2016). “Student dishonesty in the face of assessment: who, why, and what we can do about it,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 186–203.

Google Scholar

Nichols, S. L., and Harris, L. R. (2016). “Accountability assessment’s effects on teachers and schools,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 40–56.

Google Scholar

OECD. (2011). Strong Performers and Successful Reformers in Education: Lessons from PISA for the United States. Paris, FR: OECD Publishing.

Google Scholar

Panadero, E. (2016). “Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 247–266.

Google Scholar

Smith, L. F., Hill, M. F., Cowie, B., and Gilmore, A. (2014). “Preparing teachers to use the enabling power of assessment,” in Designing Assessment for Quality Learning, eds C. M. Wyatt-Smith, V. Klenowski, and P. Colbert (Dordrecht, NL: Springer), 303–323.

Google Scholar

Stobart, G. (2006). “The validity of formative assessment,” in Assessment and Learning, ed. J. Gardner (London: SAGE), 133–146.

Google Scholar

Strijbos, J. W. (2016). “Assessment of collaborative learning,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 302–318.

Google Scholar

Struyven, K., and Devesa, J. (2016). “Students’ perceptions of novel forms of assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 129–144.

Google Scholar

Suzuki, L. K., Davis, H. M., and Greenfield, P. M. (2008). Self-enhancement and self-effacement in reaction to praise and criticism: the case of multiethnic youth. Ethos 36, 78–97. doi:10.1111/j.1548-1352.2008.00005.x

CrossRef Full Text | Google Scholar

Swaffield, S. (2011). Getting to the heart of authentic assessment for learning. Assess. Educ. 18, 433–449. doi:10.1080/0969594X.2011.582838

CrossRef Full Text | Google Scholar

Teltemann, J., and Klieme, E. (2016). “The impact of international testing projects on policy and practice,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 369–386.

Google Scholar

Tierney, R. D., and Koch, M. J. (2016). “Privacy in classroom assessment,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 267–283.

Google Scholar

Tomlinson, C. A. (1999). The Differentiated Classroom: Responding to the Needs of All Learners. Alexandria, VA: ASCD.

Google Scholar

Van de Vijver, F. (2016). “Assessment in education in multicultural populations,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 436–453.

Google Scholar

Wise, S. L., and Cotten, M. R. (2009). “Test-taking effort and score validity: the influence of student conceptions of assessment,” in Student Perspectives on Assessment: What Students Can Tell Us About Assessment for Learning, eds D. M. McInerney, G. T. L. Brown, and G. A. D. Liem (Charlotte, NC: Information Age Publishing), 187–205.

Google Scholar

Wise, S. L., and Smith, L. F. (2016). “The validity of assessment when students don’t give good effort,” in Handbook of Human and Social Conditions in Assessment, eds G. T. L. Brown and L. R. Harris (New York: Routledge), 204–220.

Google Scholar

Xu, Y., and Brown, G. T. L. (2016). Teacher assessment literacy in practice: a reconceptualization. Teach. Teach. Educ. 58, 149–162. doi:10.1016/j.tate.2016.05.010

CrossRef Full Text | Google Scholar

Zumbo, B. D. (2015). “Consequences, side effects and the ecology of testing: keys to considering assessment in vivo,” in Plenary Address to the 2015 Annual Conference of the Association for Educational Assessment—Europe (AEA-E), Glasgow, Scotland.

Google Scholar

Keywords: assessment, psychometrics, classroom assessment, formative assessment, error, culture, social behavior, psychological tests

Citation: Brown GTL (2017) The Future of Assessment as a Human and Social Endeavor: Addressing the Inconvenient Truth of Error. Front. Educ. 2:3. doi: 10.3389/feduc.2017.00003

Received: 27 November 2016; Accepted: 30 January 2017;
Published: 13 February 2017

Edited by:

Anastasiya A. Lipnevich, The City University of New York, USA

Reviewed by:

Eva Marie Ingeborg Hartell, KTH Royal Institute of Technology, Sweden

Copyright: © 2017 Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gavin T. L. Brown, Z3QuYnJvd24mI3gwMDA0MDthdWNrbGFuZC5hYy5ueg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.