Teachers' Conceptions of Assessment: A Global Phenomenon or a Global Localism

Brown, Gavin T. L.; Gebril, Atta; Michaelides, Michalis P.

doi:10.3389/feduc.2019.00016

ORIGINAL RESEARCH article

Front. Educ. , 07 March 2019

Sec. Assessment, Testing and Applied Measurement

Volume 4 - 2019 | https://doi.org/10.3389/feduc.2019.00016

This article is part of the Research Topic Cultural Contexts and Priorities in Assessment View all 6 articles

Teachers' Conceptions of Assessment: A Global Phenomenon or a Global Localism

$\r\nGavin T. L. Brown,*$ Gavin T. L. Brown^1,2^*

Atta Gebril³

Michalis P. Michaelides⁴

¹Department of Applied Educational Sciences, University of Umeå, Umeå, Sweden
²Faculty of Education and Social Work, The University of Auckland, Auckland, New Zealand
³Department of Applied Linguistics, The American University of Cairo, Cairo, Egypt
⁴Department of Psychology, University of Cyprus, Nicosia, Cyprus

How teachers conceive of the nature and purpose of assessment matters to the implementation of classroom assessment and the preparation of students for high-stakes external examinations or qualifications. It is highly likely that teacher beliefs arise from the historical, cultural, social, and policy contexts within which teachers operate. Hence, it may be that there is not a globally homogeneous construct of teacher conceptions of assessment. Instead, it is possible that a statistical model of teacher conceptions of assessment will always be a local expression. Thus, the objective of this study was to determine whether any of the published models of teacher assessment conceptions could be generalized across data sets from multiple jurisdictions. Research originating in New Zealand with the Teacher Conceptions of Assessment self-report inventory has been replicated in multiple locations and languages (i.e., English in New Zealand, Queensland, Hong Kong, and India; Greek in Cyprus; Arabic in Egypt; Spanish in Spain, Ecuador) and at different levels of instructional contexts (Primary, Secondary, Senior Secondary, and Teacher Education). This study conducts secondary data analyses in which eight previously published models of teacher conceptions of assessment were systematically compared across 11 available data sets. Nested multi-group confirmatory factor analysis (using Amos v25) was carried out to establish sequentially configural, metric, and scalar equivalence between models. Results indicate that only one model (i.e., India) had configural invariance across all 11 data sets and this did not achieve metric equivalence. These results indicate that while the inventory can be used cross-culturally after localized adaptations, there is indeed no single global model. Context, culture, and local factors shape teacher conceptions of assessment.

How teachers understand the purpose and function of assessment is closely related to how they implement it in their classroom practice. While using assessment for improving teaching and learning may be a sine qua non of being a teacher, the enactment of that belief depends on the socio-cultural context and policy framework within which teachers operate. Variation in those contexts is likely to change teacher conceptions of assessment meaning that while purposes (e.g., accountability or improvement) may be universal, their manifestation is unlikely to be so. This discrepancy creates significant problems for cross-cultural research that seeks to compare teachers working in different contexts. The lack of invariance in a statistical model is often used to indicate that the inventory eliciting responses is problematic. However, it might be due to the many variations in instructional contexts which do not lead to a universal statistical model. Building on this idea, the purpose of this paper is to examine teacher responses from 11 different jurisdictions to a common self-report inventory on the purposes and nature of assessment (i.e., the Teacher Conceptions of Assessment version 3 abridged).

Literature

Conceptions of Assessment

When educational policy needs to be implemented by teachers, how teachers conceive of that policy controls the focus of their attention and their understanding of the same material as well as influencing their behavioral responses to the policy (Fives and Buehl, 2012). Educational policy around assessment often seeks to use evaluative processes to improve educational outcomes. However, the same policies usually expect evaluation to indicate the quality of teaching and student learning. These two purposes strongly influence teacher conceptions of assessment.

The term conceptions is used in this study to refer to the cognitive beliefs about and affective attitudes toward assessment that teachers espouse, presumably in response to the policy and practice environments in which they work. This is consistent with the notion that teacher conceptions of educational processes and policies will shape decision making so that it makes sense and contributes to successful functioning within a specific environment (Rieskamp and Reimer, 2007).

A significant thread of research into the varying conceptions of assessment teacher might have can be seen in the work Brown and his colleagues have conducted. Brown's (2002) doctoral dissertation examined the conceptions of assessment New Zealand primary school teachers had. That work developed a self-administered, self-report survey form that examined four inter-correlated purposes of assessment (i.e., improvement, irrelevance, school accountability, and student accountability) (Brown, 2003). He reported (Brown, 2004b) that teachers conceived of assessment as primarily about supporting improvement in teaching and student learning and was clearly not irrelevant to their instructional activities. They accepted that it was somewhat about making students accountable, but rejected it as something that should make teachers and schools accountable.

Subsequently replication studies have been conducted in multiple jurisdictions. Other researchers have reported interview, focus group, and survey studies using different frameworks than that used by Brown. Consequently, major reviews of the research into teacher conceptions of assessment (Barnes et al., 2015; Fulmer et al., 2015; Bonner, 2016; Brown, 2016) have made it clear that teachers are aware of and react to the strong tension between using assessment for improved outcomes and processes in classrooms, and assessment being used to hold teachers and schools accountable for outcomes by employers or funders. The more pressure teachers are under to raise assessment scores, the less likely they are to see assessment as a formative process in which they might discover and experiment with different practices (Brown and Harris, 2009). Conversely, where educational policies keep consequences associated with assessment relatively low, such as in New Zealand (Crooks, 2010), the endorsement of assessment as a formative tool to support improvement is much greater.

Thus, because policy frameworks globally are seeking to increase the possibility of using assessment for improved outcomes (Berry, 2011a) and because researchers are willing to use previously reported inventories, there has been increasing research into teacher conceptions of assessment using the New Zealand Teacher Conceptions of Assessment inventory (Brown, 2003). Following the same line of research, this paper exploits a series of replication studies conducted by Brown and his colleagues in a wide variety of contexts internationally. Statistical models fitted to each data set were sometimes similar, but non-identical. The purpose of the current project was to conduct a systematic invariance study to determine if there are any generalizable models across the jurisdictions. This analysis could help us understand how teachers' beliefs about assessment are shaped across different jurisdictions and could also provide some guidelines for those working in international instructional contexts.

Contexts

Before examining data, it is important to describe the policy contexts in which the data were collected. The descriptions are correct for the time period in which the data were collected, but may no longer accurately describe the current realities. The contexts are grouped as to whether each jurisdiction is defined as being relatively low-stakes assessment environments (i.e., New Zealand, Queensland, Cyprus, and Catalonia) or examination-dominated (i.e., Hong Kong, Egypt, India, and Ecuador).

Low-Stakes Assessment Environments

The following section includes a description of the low-stakes instructional contexts from which data were collected, including New Zealand, Queensland, Cyprus, and Catalonia.

New Zealand

At the time of this study, the New Zealand Ministry of Education required schools to use assessment for improving the learning outcomes of students and provide guiding information to managers, parents, and governments about the status of student learning (Ministry of Education, 1994). Learning outcomes in all subject areas (e.g., language, mathematics, science, etc.) were defined by eight curriculum levels broken into multiple strands. The national policy required school assessments to indicate student performance relative to the expected curriculum level outcomes for each year of schooling (Ministry of Education, 1993, 2007). A range of nationally standardized but voluntary-use assessment tools were available for teachers to administer as appropriate (Crooks, 2010). Additionally, the Ministry of Education provided professional development programmes that focused on teachers' use of formative assessment for learning.

New Zealand primary school teachers made extensive use of informal and formal assessment methods primarily to change the way they taught their students and as a complement to evaluate their own teaching programmes. In contrast, the secondary school assessment environment, despite being governed by the same policy framework as the primary school system, was dominated by The National Qualifications Framework (NQF). Officially, school qualifications assessment (i.e., National Certificate of Educational Achievement [NCEA] Level 1) begins in the third year of secondary schooling (students nominally aged 15) (Crooks, 2010). However, the importance of the school qualifications has meant considerable washback effects, with much adoption of qualifications assessment systems in the first 2 years of secondary schooling. Furthermore, approximately half of the content in each subject was evaluated through school-based teacher assessments of student performances (i.e., internal assessments). This means that teachers act as assessors as well as instructors throughout the three levels of the NCEA administered in New Zealand secondary schools.

Queensland

At the time of the study, Queensland, similar to New Zealand, had an outcomes-based curriculum framework, limited use of mandatory national testing, and a highly-skilled teaching force. Primary school assessment policies (years 1–7) differed to that of secondary schooling (years 8–12). In general, years 1–10 were an “assessment-free zone” in that there were no common achievement standards or compulsory common assessments. There were formal tests of literacy and numeracy at years 3, 5, and 7 used for system-wide monitoring and reporting to the Federal Government. Because the tests were administered late in the school year (to maximize results for the year) and reporting happened at the start of the following school year, the impact of the tests on schools or teachers was relatively minimal.

Only in the final 2 years of senior secondary school (i.e., 11 and 12) is there a rigorous system of externally moderated school-based assessments indexed to state-wide standards. These in-school assessments for end of schooling certification are largely designed and implemented by secondary school teachers themselves, who also act as moderators. Most senior secondary-school teachers also teach classes in lower secondary. Therefore, it is highly likely that the role of being a teacher-assessor for the qualifications systems will influence teachers' assessment practices and beliefs, even for junior secondary classes in years 8–10.

Cyprus

Greek-Cypriot's education system aims for a gradual introduction and development of children's cognitive, value, psychokinetic and socialization domains (Cyprus Ministry of Education and Culture, 2002). The major function of assessment is a formative process within the teaching–learning cycle with the goal of improving outcomes for students and teacher practices. Assessments, while aiming to provide valid and reliable measurements, avoid selection or rejection of students through norm-referencing (Papanastasiou and Michaelides, 2019). This is achieved through the qualitative notes and observations teachers make, the use of student self-assessments, and a combination of standardized and teacher-developed tests (Cyprus Ministry of Education and Culture, 2002).

Consequently, the assessment function within the Cypriot education system is relatively low stakes (Michaelides, 2014). In primary school, assessments are mainly classroom-based with grades recorded by the teachers for each student primarily for internal monitoring of student progress achievement rather than for formal reporting. The Ministry of Education provides tests for a number of subjects, which teachers use alongside their own assessments. However, in Grades 7–9 (i.e., gymnasium), formal testing increases through teacher-designed tests and school-wide end-of-year exams in core subjects (Solomonidou and Michaelides, 2017). This practice continues into senior secondary Grades 10–12 in both the lyceums and technical schools. While the government does not mandate compulsory, large-scale assessments, senior students voluntarily participate in international exams or national competitions.

Grade 12 culminates in high stakes national examinations that certify high school graduation and generate scores for access to public universities and tertiary institutions in both Cyprus and Greece. Unsurprisingly, these end-of-high-school national exams are favorably evaluated by students and the public in general (Michaelides, 2008, 2014).

Catalonia

Catalonia is an autonomous community within Spain and data for this study were collected there. The Catalonian school system has 6 years of Primary School and 4 years of Secondary School (Ley de Ordenación General del Sistema Educativo, 1990). At the end of 10 years of compulsory schooling, students enroll in either basic vocational education, technical vocational education, or 2 year high schooling in preparation for university or superior vocational education. Assessment policy prioritizes low-stakes, school-based, continuous, formative, and holistic practices. Promotion decisions after Grade 6 and 10 are based on teaching staff consensus concerning students' holistic learning progress, without recourse to external evaluation. Assessments in vocational and technical education emphasize authentic and practical skills. However, at the end of post-compulsory university preparation, a university entrance examination is administered.

High-Stakes Examination Jurisdictions

The following part describes the high-stakes instructional contexts from which data were collected, including Hong Kong, Egypt, India, and Ecuador. Teachers in these contexts face substantial challenges if called upon to implement policies that seek to modify or change the role of summative examination.

Hong Kong

Since the end of the British rule in Hong Kong in 1997, the education system has systematically worked toward ensuring access for all students to 12 years of schooling (achieved in 2012). Like other jurisdictions influenced by the UK Assessment Reform Group, Hong Kong has discussed extensively the use of assessment for learning (Berry, 2011b), while at the same time maintaining a strong examination system and culture (Choi, 1999). Dependence on the validity of formal examinations has arisen from multiple factors, including the British public examinations and a strong sense that without examinations a meritocratic society would not be possible (Cheung, 2008). Unsurprisingly, the “assessment for learning” agenda, despite formal support from government agencies [Curriculum Development Council (CDC), 2001] has struggled to gain foothold against the hegemony of examinations; a case of soft vs. hard policy (Kennedy et al., 2011). Carless (2011), a strong advocate of assessment for learning, has accepted that summative testing is inevitable but has called for the formative and diagnostic use of summative testing.

Egypt

Egypt education is dominated by examinations used to select students for access to further opportunities (Hargreaves, 2001; Gebril and Eid, 2017). At the time of this study, end-of-year exams were the only mechanism in public schools to move students from one educational stage to the next. Higher exam scores result in placement in better schools at the end of primary education (Grade 6), while higher scores in the Grade 9 final exam place students in higher-esteemed general secondary schools or technical/vocational schools. Finally, the end of secondary school exam determines the university and academic programme into which students can join. In addition to the benefits individual students experience through high exam scores, schools themselves gain rewards when their students appear in the lists of high-performing students. Thus, high achievement in general gains respect in, for, and from families and schools.

Comprehensive Assessment (CA) was introduced by the Ministry of Education to balance the overwhelming effect of summative examinations. The CA initiative expected teachers to embed assessment activities within instruction and make it an ongoing learning-oriented process making use of alternative assessment tools. Despite the potential of this approach, the CA policy was stopped because of many challenges including teachers' lack of assessment literacy and difficult work conditions in schools. Nonetheless, the Egyptian government is still seeking to modify instructional and assessment practices by removing all formal exams in primary schools before Grade 4 (Gebril, 2019).

India

Consistent with other federal systems, education in India is a state-level responsibility. The post-primary school system (NUEPA, 2014) consists of secondary (Class 9–10) and upper secondary (Class 11–12) schools. Teachers are largely highly qualified with many holding postgraduate or higher qualifications. However, classes are large with an average of 50 pupils per room. Unfortunately, enrolment beyond elementary schooling is not universal, with drop-out more pronounced among girls. Secondary school qualifications are generally administered by various central boards (e.g., Central Board of Secondary Examination, CBSE; Indian Secondary Certificate of Education, ISCE; Senior Secondary Certificate, SSC). Despite their unique flavors, central boards generally have similar evaluative processes making use of high-stakes summative examinations at the end of Classes 10 (end of secondary) and 12 (end of upper secondary).

Efforts to diversify student evaluation beyond examination performance have resulted in Central Boards developing assessment schemes for determining children's all-round development. For example, Continuous and Comprehensive Evaluation (CCE) developed by the CBSE is a school-based assessment scheme that exercises frequent and periodic assessments to supplement end-of-year final examinations (Ashita, 2013). Thus, although there is school-based assessment in Indian schools it is a form of summative assessment that combines coursework, mid-course tests, and final examinations to create an overall grade.

Ecuador

Ecuador is a multilingual and multicultural South American country with more than 16,000,000 inhabitants. Most people (>60%) live in urban areas. Recent Ecuadorian legislation (2011) provides schooling up to age 15; this is made up of 6 years primary and 4 years secondary schooling. The final 2 years are either in a general or vocational senior high school (OECD, 2016). In 2006, an immense renewal project was launched by the government that created many new schools and provided a high level of technological resources.

Schooling is generally characterized by strong traditional conventions examination and pedagogical practices. Teachers are regarded as having strong authority over the classroom. Promotion at the end of the year depends on gaining at least 70% on the end-of-year examination.

Methods

Participants

Practicing teachers were surveyed in all jurisdictions except the Catalonia study. Table 1 provides descriptive information concerning when data were collected, sample size, scales collected simultaneously with the TCoA, the sampling mechanism, the subjects taught, and the representativeness of the sample to the population of the jurisdiction. Data were collected between 2002 and 2017 and in most studies teachers responded to additional scales. The New Zealand and Queensland sampling was representative via the school, not the teacher. Sampling otherwise was convenience but was national in Cyprus, India, and Ecuador.

TABLE 1

Table 1. Participant characteristics by jurisdiction.

Testing of sample equivalence to the population was rare (New Zealand, Cyprus, and India) and was generally limited to teacher sex. Only five studies identified the subject taught by the teacher. Unsurprisingly, languages (English, Chinese, and Arabic), mathematics, and science accounted for the largest proportions of subjects taught.

Instrument

The Teacher Conceptions of Assessment inventory (TCoA-III) was developed iteratively in New Zealand with primary school teachers to investigate how they understand and use assessments (Brown, 2003). The TCoA inventory is a self-reported survey that allows teachers to indicate their level of agreement with statements related to the four main purposes of assessment. The inventory allows teachers to indicate whether and how much they agree that assessment is used for improved teaching and learning, assessment evaluates students, assessment evaluates schools and teachers, or assessment is irrelevant. The 50-item New Zealand model consisted of nine factors, seven of which were subordinate to improvement and irrelevance (Figure 1). The superordinate improvement and irrelevance factors were correlated with the two accountability factors (i.e., school and student). The structure and items of the full TCoA-III are available in a data codebook and dictionary (Brown, 2017). An abridged version of 27 items (TCoA-IIIA), which has the same structure as the full version, consists of three items per factor and was validated with a large sample of Queensland primary teachers (Brown, 2006). The items for the TCoA-IIIA are listed in Table 2.

FIGURE 1

Figure 1. TCoA-III Model with New Zealand Primary School Results. (Figure 1 from Brown (2004b) reprinted by permission of Taylor and Francis Ltd, http://www.tandfonline.com).

TABLE 2

Table 2. TCoA-IIIA items grouped by factor.

In all studies using the TCoA reported in this paper, participants indicated their level of agreement or disagreement on a bipolar ordinal rating scale. The Hong Kong study used a four-point balanced rating scale in which 1 = strongly agree, 2 = agree, 3 = disagree, and 4 = strongly disagree. In Cyprus, a balanced six-point agreement scale was used, coded: 1 = completely disagree, 2 = disagree to a large degree, 3 = disagree somewhat, 4 = agree somewhat, 5 = agree to a large degree, and 6 = completely agree. In all other jurisdictions (i.e., New Zealand, Queensland, Catalonia, Egypt, India, and Ecuador) a six-point, positively packed, agreement rating scale was used. This scale has two negative options (i.e., strongly disagree, mostly disagree) and four positive options (slightly agree, moderately agree, mostly agree, and strongly agree). Positive packing has been shown to increase variance when it is likely that participants are positively biased toward a phenomenon (Lam and Klockars, 1982; Brown, 2004a). Such a bias is likely when teachers are asked to evaluate the assessment policies and practices for the jurisdiction in which they work. Successful publication of the inventory (Brown, 2004b) led to a number of replication studies including New Zealand secondary teachers (Brown, 2011), Queensland primary and secondary teachers (Brown et al., 2011b), Hong Kong (Brown et al., 2009), Cyprus (Brown and Michaelides, 2011), Egypt (Gebril and Brown, 2014), Catalonia (Brown and Remesal, 2012), India secondary and senior secondary teachers (Brown et al., 2015), Ecuador (Brown and Remesal, 2017). Hence, data from 11 different sets of teachers from eight different jurisdictions are available for this study.

Data Models

When the TCoA-III was administered in new jurisdictions, different configural models arose out of the data. In some cases the differences were small involving the addition of a few paths or trimming of items. For other jurisdictions, substantial changes were required to generate a valid model. These best-fit models for each jurisdiction are briefly described here. Note that all studies were conducted and published individually with ethical clearances obtained by each study's author team, usually by the author resident in the jurisdiction. The analyses reported in this paper are all based on secondary analysis of anonymized data; hence, no further ethical clearance was required.