Download - Measuring Quality in Kindergarten Classrooms: Structural ... · sation and exposure to literacy has a profound influence on children’s early language develop-ment (Copple & Bredekamp,

Measuring Quality in Kindergarten Classrooms:Structural Analysis of the Classroom Assessment

Scoring System (CLASS K–3)

Lia E. Sandilos

Department of Communication Sciences and Disorders, Temple University

James C. DiPerna

Department of Educational Psychology, Counseling, and Special Education,The Pennsylvania State University

The Family Life Project Key Investigators

The Pennsylvania State University and University of North Carolinaat Chapel Hill

Research Findings: The purpose of the current study was to evaluate the structural validity of scores

on a measure of global classroom quality, the Classroom Assessment Scoring System (CLASS K–3;

Pianta, La Paro, & Hamre, 2008). Using observational data from a sample of 417 kindergarten

classrooms from the southern and mid-Atlantic regions of the United States, we used confirmatory

factor analysis to examine the structural validity of the CLASS K–3. Factor analytic findings sup-

ported a 3-factor and 10-dimension structure for the CLASS K–3; however, some modifications

were made to the original CLASS model. Practice or Policy: Although the overall structure of

the CLASS has been generally consistent across validation studies, some facets of the model may

be less stable than others. Additional examination of alternative factor structures is needed to further

clarify the relationships among the CLASS dimensions and domains. Current psychometric evidence

provides support for continued use of the CLASS to guide intervention, instruction, and professional

development.

Children’s experiences in early childhood classrooms have been identified as an important

predictor of future academic and social-emotional functioning (La Paro & Pianta, 2000;

Zaslow, Martinez-Beck, Tout, & Halle, 2011). The enrollment of children in prekindergarten

programs nationwide increased steadily from 20% in 1970 to 53% in 2010, and nearly all

children (94%) now attend a part- or full-day kindergarten program (National Center for

Education Statistics, 2012). In response to the growing need for early childhood education

(ECE), standards emerged for ECE licensing and accreditation, and the federal government

Correspondence regarding this article should be addressed to Lia E. Sandilos, Temple University, 1701 N. 13th

Street, 352A Weiss Hall, Philadelphia, PA 19122. E-mail: [email protected]

Early Education and Development, 25: 894–914

Copyright # 2014 Taylor & Francis Group, LLC

ISSN: 1040-9289 print/1556-6935 online

DOI: 10.1080/10409289.2014.883588

invested funds in early education through large-scale programs (e.g., the Head Start Child Care

and Development Fund; Zaslow et al., 2011). Simultaneously, educational researchers furthered

efforts to establish a substantive operational definition of high-quality early education by

examining theoretical models of education and creating instruments to measure the quality of

early childhood classroom experiences.

Developing instruments that can accurately evaluate characteristics of high-quality teaching

and document the contextual factors that may influence effective instruction could improve

the quality of education. Systematic observation is one method of assessment often used by

educators and educational researchers to assess the quality of a classroom environment. The

focus of the current study was to assess the factor structure of a widely used observation scale

developed to evaluate the quality of early classroom environments—the Classroom Assessment

Scoring System (CLASS K–3; Pianta, La Paro, & Hamre, 2008).

THEORETICAL FOUNDATION AND FRAMEWORK FOR THE CLASS K–3

The CLASS observation system (Pianta et al., 2008) was developed to provide a research-based

framework for assessing teacher–child interactions and resulting instructional quality in

prekindergarten and primary classroom environments (Hamre & Pianta, 2007). Since its

publication, the CLASS has been used extensively in evaluation and research in more than

3,000 early childhood classrooms (Hamre, Goffin, & Kraft-Sayre, 2009). As a part of the

Improving Head Start for School Readiness Act of 2007, the Office of Head Start selected

the CLASS as one of the primary observation scales piloted to assess the quality of Head Start

classrooms nationwide (Early Childhood Learning and Knowledge Center, 2008). Head Start

now utilizes CLASS scores to determine the accreditation of new prekindergarten centers

around the nation (Hamre, Hatfield, Jamil, & Pianta, 2013). The growing popularity of the

CLASS framework in research and practice has led to national and international studies of

the psychometric properties of the CLASS, as well as evaluation of the relationship between

CLASS scores and a variety of academic and behavioral outcomes (Pianta et al., 2008).

The purpose of the CLASS is to measure the quality of teachers’ interactions with their

students (La Paro, Pianta, & Stuhlman, 2004). Toddler, prekindergarten, elementary, and

secondary versions of the CLASS are available; however, the focus of the current study

is the K–3 version. The primary theoretical foundation for the CLASS framework is the

developmental systems model of early learning (Pianta, 1999), which considers children’s

interactions with their teacher and the classroom environment to be crucial for academic

success. CLASS factors also were developed through a review of research on high-quality

teaching and an extensive review of existing observation measures commonly used in

early childhood and elementary classrooms (La Paro et al., 2004). Within the published

CLASS framework, instructional quality is assessed in three primary domains: Emotional

Support, Classroom Organization, and Instructional Support (Pianta et al., 2008). These

domains are further divided into 10 dimensions. Emotional Support consists of Positive

Climate, Negative Climate, Teacher Sensitivity, and Regard for Student Perspectives.

Classroom Organization consists of Behavior Management, Productivity, and Instructional

Learning Formats. Instructional Support is composed of Concept Development, Quality of

Feedback, and Language Modeling.

STRUCTURAL ANALYSIS OF THE CLASS K–3 895

Emotional Support

Research regarding effective didactic practices in early childhood has emphasized that the quality

of teacher–student relationships in early education has a significant influence on student learning

and future academic success (La Paro & Pianta, 2000; Pianta, La Paro, Payne, Cox, &

Bradley, 2002). Previous literature also indicates that children often enter school without impor-

tant social-emotional skills. Rimm-Kaufman, Pianta, and Cox (2000) found that 20% of kinder-

garten teachers reported that approximately half of their students lack the social skills needed to

achieve early academic success. Fostering social-emotional support in ECE classrooms is parti-

cularly crucial, as children with behavioral and emotional issues have been found to be less

receptive to intervention as early as age 8 (Eron, 1990).

Hamre, Pianta, Mashburn, and Downer (2007) cited attachment theory as guiding the devel-

opment of the Emotional Support domain because child–caregiver relationships are emphasized

within this domain. Specifically, this domain assesses the level of positive=negative teacher–

student and peer–peer interaction as well as the degree to which the teacher demonstrates

awareness=responsiveness to students’ academic and emotional needs and the teacher’s empha-

sis on student interest and autonomy (Pianta et al., 2008).

Classroom Organization

Recent studies also have identified key aspects of classroom organization that positively impact

learning (Domı́nguez, Vitiello, Maier, & Greenfield, 2010; Rimm-Kaufman, Curby, Grimm,

Nathanson, & Brock, 2009). Specifically, effective classroom management strategies (e.g., clear

behavioral expectations and learning objectives, consistent routines, varied learning modalities)

have been linked to higher levels of self-regulatory and adaptive behaviors in kindergarten

students (Rimm-Kaufman et al., 2009) and improved reading skills in first grade (Ponitz,

Rimm-Kaufman, Brock, & Nathanson, 2009). Furthermore, proactive redirection of misbehavior

has long been considered more effective than reactive behavior management strategies (Sugai &

Horner, 2006; Yates & Yates, 1990).

The Classroom Organization domain of the CLASS draws from research on behavior

management and self-regulation (Hamre et al., 2007), as the use of behavioral reinforcement

strategies (rewarding=recognizing positive behaviors), routines, and methods to improve student

engagement yields higher scores on this construct. Within this domain, teachers are evaluated on

their ability to proactively manage behavior, effectively make use of learning time, and maintain

student attention and participation in instruction.

Instructional Support

High-quality instruction and feedback have a significant impact on the development of higher

order thinking skills (Bierman et al., 2008; Yates & Yates, 1990). In addition, frequent conver-

sation and exposure to literacy has a profound influence on children’s early language develop-

ment (Copple & Bredekamp, 2009). However, kindergartners enter school with wide variation in

their level of exposure to language and literacy (Hart & Risley, 2003). Thus, the CLASS K–3

Instructional Support domain emphasizes teachers’ use of techniques to promote analytical

896 SANDILOS ET AL.

thinking skills, provide feedback to strengthen skills, and facilitate language development.

Because the materials available in early education programs can vary widely, the Instructional

Support domain is distinctive in that it assesses what teachers do with what they have, and it

does not evaluate the quantity or physical quality of the curricular materials accessible in the

environment (Pianta et al., 2008). Behavioral, metacognitive, and constructivist learning theories

are incorporated into the Instructional Support domain through the evaluation of scaffolding,

modeling, rehearsal, and elaboration (Hamre et al., 2007).

STRUCTURAL VALIDATION OF THE CLASS

Several studies have been conducted to examine the structural validity of the CLASS. Hamre

et al. (2007), for example, tested one-, two-, and three-factor CLASS models using data

from a sample of 4,000 prekindergarten through fifth-grade U.S. classrooms. The sample of

classrooms was taken from several large-scale studies that occurred between 1998 and 2005.

The scales used during the observations were versions of the CLASS prekindergarten through

third-grade frameworks and a precursor to the CLASS (i.e., the Classroom Observation

System). Structural analyses indicated that the three-factor model (Emotional Support,

Classroom Organization, and Instructional Support) demonstrated a better overall fit in prekin-

dergarten through third-grade classrooms than the one- and two-factor models that were tested

(Hamre et al., 2007).

There were some notable limitations to this study, however. Most significantly, when the

published CLASS model (three factors and 10 dimensions) was tested with prekindergarten

classrooms, several fit indices suggested inadequate fit and the presence of error in the model

(Browne & Cudeck, 1993; Hu & Bentler, 1999; Schermelleh-Engel, Moosbrugger, & Muller,

2003). In addition, although the framework of three overarching domains (i.e., Emotional

Support, Classroom Organization, and Instructional Support) was consistently evaluated across

grades, there was variability in the dimensions within those domains because different versions

of the CLASS and Classroom Observation System frameworks were used during the longitudi-

nal data collection. For example, an earlier version of the CLASS, which excluded the Language

Modeling and Regard for Student Perspectives dimensions, was tested on the kindergarten

sample in this study (Hamre et al., 2007). As a result, direct comparison of the internal structure

of the CLASS across grade levels was not possible, warranting further research regarding the fit

of the CLASS K–3 model with classrooms in the primary grades.

The CLASS framework also has been examined internationally. A study conducted in

Finland examined the structural validity of the CLASS Pre-K using data from 49 Finnish

kindergarten classrooms (Pakarinen et al., 2010). Results of an initial confirmatory factor analy-

sis (CFA) indicated that the three-factor model hypothesized by Pianta et al. (2008) did not fit

the Finnish classroom data. In addition, Negative Climate displayed poor discriminant validity

within the Emotional Support domain. A final CFA was conducted with Negative Climate

removed from the model and with the residuals correlated between Behavior Management

and Productivity and between Concept Development and Quality of Feedback.

Although most of the indices demonstrated good fit in the final revised model, the root mean

square error of approximation (RMSEA) value was inflated above acceptable fit. Furthermore,

the three domain factors (Emotional Support, Classroom Organization, and Instructional


Support) demonstrated multicollinearity (>.90). Because the three domains correlated highly,

a one-factor model of global classroom quality was also tested. Results, though, indicated that

the one-factor model also did not fit the data. In addition, a significant limitation of this study

was the small number of classrooms in the sample (n¼ 49), as a small sample size limits

statistical power and can lead to unreliable results (Comrey & Lee, 1992; Kline, 2005). Thus,

both national and international examinations of the three-factor CLASS framework have

featured less than ideal methodological and structural outcomes.

To address the aforementioned limitations of the structural evidence accrued for the CLASS,

researchers recently have begun to explore alternative conceptual and structural models. For

example, Hamre et al. (2013) tested a bifactor model of the CLASS with a prekindergarten

sample. A bifactor model is composed of a general factor that loads on all indicators and uncor-

related factors that load on select indicators (Chen, West, & Sousa, 2006; Hamre et al., 2013).

Hamre and colleagues (2013) found that the best fitting bifactor CLASS model consisted of

a general factor (Responsive Teaching) and two domain factors (Positive Management and

Routines, and Cognitive Facilitation). This bifactor CLASS model demonstrated a subtle impro-

vement in fit over the original three-factor model by bringing all indices within acceptable

ranges with the exception of RMSEA, which was slightly larger than the recommended maxi-

mum threshold for mediocre fit. The bifactor model represents an entirely new conceptualization

of the CLASS framework, and Hamre et al. (2013) encouraged replication of the bifactor model.

RATIONALE

Classroom quality is a construct that has received increasing attention in ECE research, as

high-quality teaching can have a significant effect on student academic and behavioral outcomes

(Bierman et al., 2008; Howes et al., 2008). The CLASS K–3 is a widely used measure of

classroom quality that has the potential to provide valuable data regarding effective teaching

techniques and to facilitate professional development in school districts (Hamre et al., 2010).

However, because of some notable limitations (e.g., small sample size, suboptimal fit indices,

different versions of the CLASS evaluated simultaneously) of previous validity studies (Hamre

et al., 2007; Pakarinen et al., 2010), as well as the exploration of completely new structural mod-

els for the CLASS (e.g., a bifactor model; Hamre et al., 2013), further evaluation of the structural

validity of the CLASS is necessary. The present study also provides an independent examination

of the psychometric properties of CLASS scores by authors who were not involved in the devel-

opment of the measure. Thus, the primary purpose of this study was to examine the internal

structure of the CLASS K–3.

METHOD

Participants

Participating teachers in the current study were part of a longitudinal observational study of the

cognitive, social, and emotional development of young children. The study has followed 1,292

children and their families since the participating children were born in 2004. Participants in the

898 SANDILOS ET AL.

sample reside in North Carolina and central Pennsylvania (Vernon-Feagans, Cox, & The Family

Life Project Key Investigators, 2011).

Data for the current study were drawn from a sample of 426 classrooms (North Carolina¼232, Pennsylvania¼ 194) within 190 different schools that enrolled all participating children

during their kindergarten year. The majority of the classrooms (94%) were located in public

elementary schools. Schools were recruited in the year prior to data collection, and monetary

incentives were provided to teachers and principals in the form of gift cards. Demographic

information on teachers was collected through a self-report questionnaire. Of the total sample,

412 teachers were women and 14 were men. Teachers ranged in age from 22 to 66 years

(M¼ 41.5, SD¼ 11.2), with an average of 9.4 years of teaching experience (range¼ less than

1 year to 38 years). Nearly half (44%) of the teachers reported having a bachelor’s degree,

and teachers’ median annual income (pretax) was $30,000–$40,000. In addition, 88% of

teachers were Caucasian, 98% spoke English as their first language, 92% were certified in

elementary education, and 82% reported having at least one classroom assistant.

Demographic data regarding the composition of students within classrooms were also

collected from the teacher questionnaire. The median number of students per classroom was

20 (female¼ 48%, male¼ 52%). The average student racial composition with each classroom

was as follows: 62% Caucasian, 24% African American, 11% Hispanic, 1% Asian=Pacific

Islander, and 2% other. Half of the teachers (51%) reported having at least one student in

their classroom who spoke a language other than English, with the most common language

spoken in the home being Spanish (14%).

Measures

CLASS K–3. The CLASS K–3 assesses the quality of the classroom environment through

three primary domains and 10 dimensions (Emotional Support domain¼ Positive Climate,

Negative Climate,1 Teacher Sensitivity, and Regard for Student Perspectives; Classroom Organi-

zation domain¼Behavior Management, Productivity, and Instructional Learning Formats;

Instructional Support domain¼Concept Development, Quality of Feedback, and Language

Modeling). Each dimension is rated on a 7-point scale ranging from low (1–2) to middle (3–5)

to high (6–7). CLASS dimensions are calculated by averaging scores across cycles within an

observation. To calculate each domain score, one computes the mean for the dimensions that

fall within that domain. In this study, the dimension scores were used for the statistical analyses.

Raters can observe a classroom for one to six cycles. A cycle consists of 20 min of obser-

vation and 10 min of coding. In the current study, each classroom observation consisted of

two cycles. The 1-hr observation (i.e., two cycles) has been endorsed by the authors of the

CLASS, as studies have indicated that data from two cycles correlate highly with data from four

cycles (rs¼ .89–.95; Pianta et al., 2008). The authors of the CLASS (Pianta et al., 2008) reported

adequate interrater agreement (.78–.96) and internal consistency (.76–.90). In the current study,

interobserver agreement data were collected in the field approximately midway through the

data collection window, and all data collectors maintained the 80% interrater reliability criterion

achieved during the CLASS certification training.

1The averaged Negative Climate dimension is reverse-scored before the Emotional Support domain score is

computed.


Procedure

CLASS observations were completed in kindergarten classrooms during the fall semester

(October through December) of the academic year. Observers consisted of part-time graduate

assistants and full-time data collectors. Different data collection teams conducted CLASS

observations in each state, but all observers had to undergo the same CLASS training. Prior

to the start of data collection, observers at both sites were formally trained and certified to

use the CLASS K–3. To meet initial certification requirements, all observers attended a 2-day

formal training conducted by a certified CLASS trainer. After the training, the observers viewed

and coded videos of early elementary classrooms (developed by the authors of the CLASS) and

completed a real-time CLASS observation in a prekindergarten classroom. The trainees’ scores

on the videos were compared to criterion scores provided by CLASS developers, and scores

on the real-time observation were compared to those of the CLASS trainer, who observed

simultaneously. All trainees had to meet the interrater agreement accuracy standard specified

by the authors of the CLASS (percent-within-one agreement �.80; Pianta et al., 2008)

before conducting observations for the study. In addition to being established by the CLASS

developers (i.e., Pianta et al., 2008), the 80% criterion is supported by other sources regarding

reliability of low-stakes assessments (e.g., Nunnally & Bernstein, 1994; Salvia & Ysseldyke,

2007; Sattler, 2001).

The certified CLASS observers traveled to various classroom sites in North Carolina

and central Pennsylvania over a 16-week period. All observers used the CLASS K–3 obser-

vation scale to assess the kindergarten classroom environment and completed two 30-min cycles

in each classroom. One observer conducted the CLASS observation in each classroom, and

observations were conducted at the start of the school day during circle time and morning

academic activities.

Design and Data Analyses

CFA was used to evaluate the structural validity of the CLASS K–3. Specifically, AMOS 19.0

software was used to estimate each model. Tanaka (1993) recommended using multiple fit

indices to assess the overall fit of a model. Thus, a variety of model fit and model comparison

indices were examined in the present study. Model fit indices measure how well the proposed

model represents the data drawn from the current sample (Kline, 2005). The model fit indices

examined in this study consisted of the RMSEA, the goodness-of-fit index (GFI), and the stan-

dardized root-mean-square residual (SRMR). Criteria for model fit were as follows: RMSEA<.05¼ good fit, .05–.08¼ adequate fit, .08–.10¼mediocre fit, >.10¼ unacceptable (Browne &

Cudeck, 1993); GFI> .90¼ good fit (Hu & Bentler, 1999); SRMR< .08¼ good fit, .08–.10¼acceptable (Hu & Bentler, 1999; Schermelleh-Engel et al., 2003).

Model comparison indices, which represent improvement in a model after modifications have

been made and the likelihood that the model is replicable, also were examined (Kline, 2005).

The model comparison indices examined in this study were the comparative fit index (CFI),

Bentler–Bonett normed fit index (NFI), Tucker–Lewis index (TLI), Akaike information

criterion (AIC), and the consistent Akaike information criterion (CAIC). For the CFI, NFI,

and TLI, values �.90 indicate acceptable fit (Bentler & Bonett, 1980; Hu & Bentler, 1999;

900 SANDILOS ET AL.

Kline, 2005), whereas values �.95 indicate good fit (Schermelleh-Engel, 2003). For the AIC

and CAIC, lower values are preferred (Kline, 2005). Given that chi-square is particularly

sensitive to sample size (Kline, 2005; Schermelleh-Engel et al., 2003), we deemphasized this

statistic when evaluating the fit of each model.

RESULTS

The CLASS K–3 observations were used for structural analyses. Three of these cases (<1% of

the total sample) had one or more missing data points. Using the Mahalanobis distance test

(Tabachnick & Fidell, 2007), we identified nine multivariate outliers (p< .001). No systematic

patterns of missing data or outliers were evident among the identified cases; thus, the cases were

deleted listwise (Field, 2009). The final sample size (n¼ 417) was sufficient for the planned

analyses.

Descriptive statistics for each CLASS K–3 dimension and domain are provided in Table 1.

Dimension scores aggregated across two cycles were used as items or indicators in the CFA.

Prior to conducting the CFA, we examined the data to determine whether the necessary assump-

tions had been met. Normality was determined through a visual inspection of normal probability

plots and an examination of skewness and kurtosis values. Standardized indices were considered

highly skewed or kurtotic at >2.0 and >7.0, respectively (Fabrigar, Wegener, MacCallum, &

Strahan, 1999). Based on these criteria, only one CLASS domain, Negative Climate, demon-

strated severe skewness and severe kurtosis (see Table 1). A log-linear transformation was used

to successfully normalize the Negative Climate data (transformed skewness¼ 2.4, transformed

kurtosis¼ 5.8). Linearity of the data, as examined via a visual inspection of scatter plots, was

met. The presence of multicollinearity or singularity was not a significant concern as only

TABLE 1

Means, Standard Deviations, Skewness, and Kurtosis for Classroom Assessment Scoring

System K–3 Dimension and Domain Scores

Dimension=Domain M SD Skewness Kurtosis

Positive Climate 5.35 1.03 �0.22 �0.46

Negative Climatea 1.16 3.43 3.43 13.97

Teacher Sensitivity 4.90 1.11 �0.09 �0.50

Regard for Student Perspectives 3.97 1.11 �0.13 �0.36

Behavior Management 5.41 0.98 �0.52 0.19

Productivity 5.42 0.96 �0.46 0.10

Instructional Learning Formats 4.64 0.96 �0.16 �0.28

Concept Development 2.60 1.07 0.32 �0.82

Quality of Feedback 3.39 1.08 0.12 �0.71

Language Modeling 3.09 0.98 0.21 �0.33

Emotional Support 5.26 0.76 �0.33 �0.07

Classroom Organization 5.16 0.80 �0.57 0.31

Instructional Support 3.03 0.92 0.13 �0.59

aNegative Climate was log transformed.


TABLE2

CorrelationsforClassroom

AssessmentSco

ringSystem

K–3DomainsandDim

ensionsforFactorAnalytic

Sample

Variable

12

34

56

78

910

1112

13

1.

Po

siti

ve

Cli

mat

e—

�.4

3��

.74��

.58��

.57��

.46��

.57��

.18��

.31��

.31��

.88��

.65��

.30��

2.

Neg

ativ

eC

lim

ate

—�

.37��

�.2

7��

�.4

7��

�.2

8��

�.2

4��

�.0

3�

.14��

�.0

9�

.53��

�.4

0��

�.1

0�

3.

Tea

cher

Sen

siti

vit

y—

.61��

.51��

.42��

.64��

.19��

.37��

.33��

.89��

.63��

.34��

4.

Reg

ard

for

Stu

den

tP

ersp

ecti

ves

—.2

4��

.19��

.56��

.36��

.36��

.33��

.82��

.40��

.40��

5.

Beh

avio

rM

anag

emen

t—

.63��

.39��

0.0

7.2

3��

.17��

.54��

.82��

.18��

6.

Pro

du

ctiv

ity

—.5

6��

0.0

7.2

7��

.26��

.42��

.88��

.22��

7.

Inst

ruct

ional

Lea

rnin

gF

orm

ats

—.3

7��

.52��

.46��

.67��

.79��

.51��

8.

Conce

pt

Dev

elopm

ent

—.6

7��

.62��

.27��

.21��

.87��

9.

Qual

ity

of

Fee

dbac

k—

.70��

.39��

.41��

.90��

10

.L

angu

age

Mo

del

ing

—.3

6��

.36��

.87��

11.

Em

oti

onal

Support

—.6

5��

.38��

12.

Cla

ssro

om

Org

aniz

atio

n—

.37��

13.

Inst

ruct

ional

Support

—

� p<

.05

.��p<

.01

.

902

one correlation, between the Quality of Feedback dimension and the corresponding Instructional

Support domain, had a value of .90 (see Table 2).

CFAs (maximum likelihood extraction) were conducted for six models. The first model

tested was identical to the factor structure tested by Hamre et al. (2007) with the CLASS

standardization sample and reported in the published version (Pianta et al., 2008) of the CLASS

manual. The second model was the revised CLASS model, which included five modifications

to the published structure. Then, based on the abbreviated model proposed by Pakarinen et al.

(2010), a model excluding the Negative Climate dimension was tested. Next, a model with

10 dimensions loading on one global domain, a model with 10 dimensions loading on two

domains (Emotional Support and Instructional Support), and a bifactor model based on the work

of Hamre et al. (2013) were tested.

The original CLASS K–3 framework (Hamre et al., 2007; Pianta et al., 2008) was tested with

the current sample, and results indicated that the model did not fit the data well (see Table 3;

Figure 1). Based on modification indices, five changes were made to improve the model. First,

the residuals of Productivity and Behavior Management were correlated, as these dimensions are

conceptually related (Pianta et al., 2008), and this modification was consistent with previous

structural validity research (Pakarinen et al., 2010). Second, the residuals of Behavior Manage-

ment and Negative Climate were correlated. Although these dimensions were part of different

domains, Classroom Organization and Emotional Support were highly related factors, and

modification indices revealed that Behavior Management shared a significant amount of

residual error with the Emotional Support indicators. Third, the residuals of Regard for Student

Perspectives and Concept Development were correlated, as these indicators demonstrated

a moderate correlation in the present study and were found to be highly correlated in previous

research (Pianta et al., 2008). Fourth, a direct pathway was inserted from Emotional Support

to Behavior Management based on the moderate to strong relationship between the four

Emotional Support indicators and Behavior Management. Fifth, the pathway from Classroom

Organization to Behavior Management was removed, as the weight of this pathway was

negligible after we incorporated the other modifications (see Figure 2). In the revised CLASS

model (see Figure 2), the GFI, SRMR, CFI, NFI, and TLI all fell within the range of acceptable

TABLE 3

Fit Indices for the Structural Models

Fit indices

Modification v2 df SRMR RMSEA GFI CFI NFI TLI AIC CAIC

Original CLASS K–3 359.9 32 .087 .157 .841 .851 .839 .790 405.9 521.6

Revised CLASS K–3 142.3 29 .060 .097 .936 .948 .936 .920 194.3 325.2

Revised Pakarinen model 146.4 20 .065 .123 .927 .939 .930 .889 196.4 322.2

One factor 839.6 35 .140 .235 .684 .633 .625 .528 879.6 980.3

Two factors (emotional and instructional) 412.2 34 .091 .164 .835 .828 .816 .772 454.2 559.5

Bifactor (Hamre et al., 2013) 165.2 28 .066 .136 .896 .901 .891 .840 298.5 434.4

Note. All chi-square values are significant at p< .01. SRMR¼ standardized root-mean-square residual; RMSEA¼root mean square error of approximation; GFI¼ goodness-of-fit index; CFI¼ comparative fit index; NFI¼Bentler–

Bonett normed fit index; TLI¼Tucker–Lewis index; AIC¼Akaike information criterion; CAIC¼ consistent Akaike

information criterion; CLASS K–3¼Classroom Assessment Scoring System.


to good fit, and the RMSEA fell just within the .10 threshold for mediocre fit (Browne &

Cudeck, 1993). The AIC and CAIC produced the lowest values after all five modifications were

incorporated into the original CLASS model (see Table 3).

FIGURE 1 Classroom Assessment Scoring System K–3 original model (Hamre et al., 2007; Pianta et al., 2008).

904 SANDILOS ET AL.

Another model tested in this study was based on the results of the Pakarinen et al. (2010)

study (see Figure 3). First, Negative Climate was removed from the original CLASS K–3 model;

however, the resulting model did not demonstrate good fit with the data. Second, the residual

errors of Behavior Management and Productivity were correlated, which created a slight

improvement in fit. However, the third modification recommended by Pakarinen et al. (i.e.,

FIGURE 2 Revised Classroom Assessment Scoring System K–3 model.


correlated residuals of Quality of Feedback and Concept Development) resulted in a covariance

matrix that was not positive definite, so the exact Pakarinen model could not be tested as part of

this study. Instead, three additional revisions were made to the model based on modification

FIGURE 3 Revised Pakarinen et al. (2010) Classroom Assessment Scoring System K–3 model.

906 SANDILOS ET AL.

FIGURE 4 One-factor model of the Classroom Assessment Scoring System K–3.


indices: (a) correlating residuals of Regard for Student Perspectives and Concept Development,

(b) correlating residuals of Positive Climate and Behavior Management, (c) and setting a direct

pathway from Emotional Support to Productivity. In the revised Pakarinen model, the GFI,

SRMR, CFI, and NFI met criteria for acceptable to good fit. In addition, the AIC and CAIC

produced the lowest values after all modifications were made to the Pakarinen model. The

TLI was just below the criterion threshold for acceptable fit, and the RMSEA was inflated

beyond the mediocre threshold (see Table 3).

FIGURE 5 Two-factor model of the Classroom Assessment Scoring System K–3.

908 SANDILOS ET AL.

Three additional alternative factor structures (i.e., one-factor, two-factor, and bifactor models)

were tested. First, given the moderate to high correlations observed between the latent factors

and dimensions in this study and in previous research (e.g., Hamre et al., 2007; Pakarinen

et al., 2010), a one-factor model was tested with the CLASS data (see Figure 4). Results of this

CFA, however, did not provide strong support for such a model (see Table 3).

FIGURE 6 Bifactor Classroom Assessment Scoring System model (Hamre et al., 2013).


Second, a two-factor model was tested with Emotional Support and Classroom Organization

combined into a single factor while the Instructional Support domain remained intact as the

second factor (see Figure 5). Results indicated that the two-factor model improved upon the

one-factor model, but it still demonstrated worse fit than the original three-factor CLASS model

and the Pakarinen model.

Finally, the Hamre et al. (2013) bifactor model was tested (see Figure 6). The overall fit of the

bifactor model was an improvement upon the original three-factor CLASS model, but the model

demonstrated a worse fit than the revised CLASS K–3 model (see Table 3).

Overall, the revised CLASS model (see Figure 2), which had several minor modifications

(i.e., correlating residuals) and one substantive modification (i.e., moving Behavior Management

as an indicator of Classroom Organization to Emotional Support), was determined to be the best

fitting model with the current sample.

DISCUSSION

This study tested the original structure of the CLASS K–3 with a sample of classrooms in North

Carolina and Pennsylvania. Fit indices for the original CLASS model did not meet either the

recommended (Hu & Bentler, 1999; Kline, 2005) or the less stringent (Browne & Cudeck,

1993; Schermelleh-Engel et al., 2003) criterion thresholds in the current sample. This finding is

somewhat consistent with previous CLASS structural validity research (e.g., Hamre et al., 2007,

2013; Pakarinen et al., 2010), and the limited fit suggests a pattern of error emerging across studies.

In the current study, several minor modifications (i.e., correlating residuals) were made to the

original three-factor CLASS model. Although the correlation of residuals was considered to be

a relatively minor modification, it still presents a concern for the CLASS model, as the correlations

reveal associations among the dimensions that are not being explained by the three domain factors.

These associations could be resulting from the presence of key characteristics measured by the

indicators that are not explained by the current factors, such as the potential impact of different

student needs and interactions, teacher social-emotional functioning, and observer characteristics.

The substantive modification (i.e., the placement of a direct pathway between Emotional

Support and Behavior Management) resulted in a revised CLASS model that demonstrated

reasonable fit with the data. Across the current and previous (e.g., Hamre et al., 2013) studies,

a relationship among the emotional and classroom management indicators appears to be

emerging, with Behavior Management in particular demonstrating a strong presence within these

domains. The placement of Behavior Management in the Emotional Support domain in the

revised CLASS K–3 model is also consistent with Hamre and colleagues’ (2013) bifactor model,

in which the dimensions of Emotional Support and Classroom Organization are combined to

form one factor (i.e., Positive Management and Routines). In addition, Behavior Management

demonstrated the strongest factor loading on the Positive Management and Routines domain

within the bifactor model in both the current study and the study by Hamre et al. (2013).

One potential explanation for these findings is that the Behavior Management dimension

yields higher scores when teachers utilize positive strategies that enable students to self-regulate,

assist students in understanding the feelings of others (e.g., perspective taking), and use

subtle cues to redirect behavior as opposed to overt disciplinary actions (Pianta et al., 2008).

The emotionally supportive strategies encompassed within the operationalization of Behavior

910 SANDILOS ET AL.

Management may be contributing to its relationship with the Emotional Support domain in the

current study. Mosier (2001) recommended that developmentally appropriate classroom man-

agement techniques foster understanding of social consciousness and prosocial behavior (e.g.,

perspective taking, sharing). The implementation of such behavior management techniques

requires a degree of emotional sensitivity on the part of the teacher.

The aforementioned results must be considered within the context of several limitations. First,

the sample used in this study was not drawn from a nationally representative population of

kindergarten teachers, which limits the generalizability of the results. Second, the sample only

included kindergarten classrooms, although the CLASS K–3 can be used in kindergarten

through third-grade classrooms. Third, to improve model fit, modifications were made to the

original 10-dimension, three-domain CLASS structure, and fit indices were interpreted using

less stringent criteria (Browne & Cudeck, 1993; Schermelleh-Engel et al., 2003) than some

(Hu & Bentler, 1999; Kline, 2005) have recommended for testing structural equation models.

Fourth, all of the alternative models presented within this article add complexity to the scoring

process beyond the simple hand calculations required for the published version of the CLASS.

Finally, the current study focused solely on a select form of validity evidence for CLASS K–3

scores in kindergarten classrooms. Although structural validity is important to consider, the

examination of additional types of score validity and reliability evidence (e.g., concurrent=predictive validity, internal consistency reliability) is essential to justify the use of scores from

a measurement system such as the CLASS K–3.

The findings of this study have implications for research and practice. ECE and its contri-

bution to children’s cognitive, social, and emotional development has come to the forefront

of educational research and policy. Information obtained from an observation system such

as the CLASS K–3 could provide valuable and constructive feedback to practitioners and research-

ers regarding effective teaching practices. The data from the CLASS observation systems are

already being utilized throughout the United States for the purpose of enhancing ECE. Thus,

it is critical that scores yielded from this scale demonstrate strong psychometric properties.

CFAs revealed the presence of some error in the structure of the scale, as demonstrated by fit

indices in the original CLASS model. This finding has potential implications for the validity of

score interpretation, as the fit of the original model raises questions about the organizational

structure of the CLASS. Findings suggest that within the current sample of kindergarten

classrooms, the Behavior Management dimension may be a better reflection of Emotional

Support than Classroom Organization. Replication of this modification in nationally representa-

tive samples of kindergarten classrooms is needed to further substantiate this finding. Neverthe-

less, the strong relationship between Emotional Support and Behavior Management has potential

implications for teacher training in the early grades, as the implementation of social-emotional

training for students (e.g., teaching children to express feelings effectively) may be most

effective for managing and preventing problem behaviors in the primary grades.

Although the results of this study suggest that a modified three-factor model demonstrated

slightly better fit with data from a kindergarten classroom sample than a bifactor model, one

advantage of the bifactor model is that it yields uncorrelated factors, which is beneficial when

using scores from CLASS dimensions as statistical predictors of student or class outcomes.

Regardless, both the Hamre et al. (2013) and current results suggest that the CLASS factor

structure may need modification to accurately represent the quality of classroom interactions.

Future validity research regarding the extent to which the factor structure of the CLASS changes


depending on the grade level observed would be beneficial, as this information may better

elucidate the impact of students’ age=grade on classroom management practices.

Moreover, an analysis of the CLASS K–3 across samples drawn from various schools,

observers, socioeconomic status levels, and community types (e.g., urban, suburban, rural) will

provide further information regarding the nature of CLASS K–3 domains and dimensions across

varying subpopulations. Multilevel CFA is a recent approach that can be used to test nested

data wherein scores may be affected by common characteristics (e.g., teachers within the same

schools; Kline, 2013). Previous structural validity studies do not appear to have considered the

nested nature of CLASS data through multilevel CFA. Within the current study, multilevel CFA

could not be tested because many of the schools in the sample had only one participating teacher,

but future examinations of the CLASS structure may wish to consider this method. At present,

the current structure of the scale may be best used in conjunction with additional measures

hypothesized to provide information related to each CLASS domain, such as anecdotal observa-

tions, student feedback, and academic data.

Although the structural validity of a scale is critical, it is also essential to consider the utility

of scores for guiding intervention. For example, previous research has linked the use of the

CLASS within the context of professional development to increases in student reading scores

(Hamre et al., 2010). In addition to continued psychometric studies, further examination of

the relationship between the use of CLASS scores to inform teacher professional development

and student achievement will provide valuable insight regarding the relationships among the

dimensions and domains and the overall utility of the scale.

As noted in the literature review, the dimensions and domains of the CLASS have substantial

theoretical support and reflect current research regarding effective practices in early childhood

settings. However, the elevated error in the structural fit of the CLASS model when it is used

in early childhood settings indicates that some aspects of the model may be less stable across

classrooms than others. Moreover, emerging evidence indicates that there may be a need for

some reconceptualization of the CLASS factors. Further inquiry into the structural validity of

CLASS scores with diverse populations and other primary grades to identify consistencies or

variations in structure based on these characteristics will help clarify the need (or not) for such

reconceptualization.

REFERENCES

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures.

Psychological Bulletin, 88, 588–606.

Bierman, K. L., Domitrovich, C. E., Nix, R. L., Gest, S. D., Welsh, J. A., Greenberg, M. T., . . . Gill, S. (2008).

Promoting academic and social-emotional school readiness: The Head Start REDI Program. Child Development,

79, 1802–1817. doi:10.1111=j.1467-8624.2008.01227

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.).

Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage.

Chen, F. F., West, S. G., & Sousa, K. (2006). A comparison of bifactor and second-order models of quality of life.

Multivariate Behavioral Research, 41, 189–225.

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.

Copple, C., & Bredekamp, S. (Eds.). (2009). Developmentally appropriate practice in early childhood programs:Serving children from birth through age 8 (3rd ed.). Washington, DC: National Association for the Education of

Young Children.

912 SANDILOS ET AL.

Domı́nguez, X., Vitiello, V. E., Maier, M. F., & Greenfield, D. B. (2010). A longitudinal examination of young

children’s learning behavior: Child-level and classroom-level predictors of change throughout the preschool year.

School Psychology Review, 39, 29–47.

Early Childhood Learning and Knowledge Center. (2008). Classroom Assessment Scoring System (CLASS). Retrieved

from http://www.acf.hhs.gov/programs/opre/research/topic/overview/head-start

Eron, L. D. (1990). Understanding aggression. Bulletin of the International Society for Research on Aggression, 12, 5–9.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor

analysis in psychological research. Psychological Methods, 4, 272–299. doi:10.1037==1082-989X.4.3.272

Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London, England: Sage.

Hamre, B. K., Goffin, S. G., & Kraft-Sayre, M. (2009). Classroom Assessment Scoring System implementation guide:

Measuring and improving classroom interactions in early childhood settings. Retrieved from the Center for

Advanced Study of Teaching and Learning website: http://curry.virginia.edu/research/centers/castl/publications

Hamre, B. K., Hatfield, B. E., Jamil, F., & Pianta, R. C. (2013). Evidence for general and domain specific elements of

teacher-child interactions: Associations with preschool children’s development. Child Development. Advance online

publication. doi:10.1111=cdev.12184

Hamre, B. K., Justice, L. M., Pianta, R. C., Kilday, C., Sweeney, B., Downer, J. T., & Leach, A. (2010). Implementation

fidelity of MyTeachingPartner literacy and language activities: Association with preschoolers’ language and literacy

growth. Early Childhood Research Quarterly, 25, 329–347. doi:10.1016=j.ecresq.2009.07.002

Hamre, B. K., & Pianta, R. C. (2007). Learning opportunities in preschool and early elementary classrooms. In R. C.

Pianta, M. J. Cox, & K. Snow (Eds.), School readiness and the transition to school (pp. 49–84). Baltimore, MD:

Brookes.

Hamre, B. K., Pianta, R. C., Mashburn, A. J., & Downer, J. T. (2007). Building a science of classrooms: Application of

the CLASS framework in over 4,000 early childhood and elementary classrooms. Retrieved from the Foundation for

Child Development website: http://fcd-us.org/resources/building-science-classrooms-application-class-framework-

over-4000-us-early-childhood-and-e?destination=resources%2Fsearch%3Ftopic%3D0%26authors%3DHamre%26

keywords%3D

Hart, B., & Risley, T. R. (2003). The early catastrophe. Education Review, 17, 110–118.

Howes, C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., & Barbarin, O. (2008). Ready to learn?

Children’s pre-academic achievement in pre-kindergarten programs. Early Childhood Research Quarterly, 23,

27–50. doi:10.1016=j.ecresq.2007.05.002

Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indices in covariance structure analysis: Conventional criteria

versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1080=10705519909540118Kline, R. B. (2005). Principles and practice of structural equation modeling. New York, NY: Guilford Press.

Kline, R. B. (2013). Exploratory and confirmatory factor analysis. In Y. Petscher, C. Schatschneider, & D. L. Compton

(Eds.), Applied quantitative analysis in education and the social sciences (pp. 202–203). New York, NY: Routledge.

La Paro, K. M., & Pianta, R. C. (2000). Predicting children’s competence in the early school years: A meta-analytic

review. Review of Educational Research, 70, 443–484. doi:10.1016=j.jsp.2006.01.003

La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The Classroom Assessment Scoring System: Findings from the

pre-kindergarten year. The Elementary School Journal, 104, 409–426. doi:10.1086=499760

Mosier, W. (2001). Developmentally appropriate child guidance: Helping children gain self-control. Retrieved from

http://www.childcarequarterly.com/spring09_story1a.html

National Center for Education Statistics. (2012). The condition of education (NCES Publication No. 2012045). Retrieved

from http://nces.ed.gov/programs/coe

Nunnally, J., & Bernstein, I. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.

Pakarinen, E., Lerkkanen, M., Poikkeus, A., Kiuru, N., Siekkinen, M., Rasku-Puttonen, H., & Nurmi, J. (2010).

A validation of the Classroom Assessment Scoring System in Finnish kindergartens. Early Education & Development,

21, 95–124. doi:10.1080=10409280902858764

Pianta, R. C. (1999). Enhancing relationships between children and teachers. Washington, DC: American Psychological

Association. doi:10.1037=10314-000

Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). The Classroom Assessment Scoring System manual, K-3.

Baltimore, MD: Brookes.

Pianta, R. C., La Paro, K. M., Payne, C., Cox, M. J., & Bradley, R. (2002). The relationship of kindergarten classroom

environment to teacher, family, and school characteristics and child outcomes. Elementary School Journal, 102,

225–238. doi:10.1086=499701


Ponitz, C. C., Rimm-Kaufman, S. E., Brock, L. L., & Nathanson, L. (2009). Early adjustment, gender differences,

and classroom organizational climate in first grade. The Elementary School Journal, 110, 142–162.

Rimm-Kaufman, S. E., Curby, T. W., Grimm, K. J., Nathanson, L., & Brock, L. L. (2009). The contribution of children’s

self-regulation and classroom quality to children’s adaptive behaviors in the kindergarten classroom. Developmental

Psychology, 45, 958–972. doi:10.1037=a0015861

Rimm-Kaufman, S. E., Pianta, R. C., & Cox, M. J. (2000). Teachers’ judgments of problems in the transition to

kindergarten. Early Childhood Research Quarterly, 12, 363–385. doi:10.1016=S0885-2006(00)00049-1

Salvia, J., & Ysseldyke, J. E. (2007). Assessment in special and inclusive education (10th ed.). New York, NY:

Houghton Mifflin.

Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). La Mesa, CA: Jerome Sattler.

Schermelleh-Engel, K., Moosbrugger, H., & Muller, H. (2003). Evaluating the fit of structural equation models: Tests of

significance and descriptive goodness-of-fit measures. Methods of Psychological Research Online, 8, 23–74.

Sugai, G., & Horner, R. R. (2006). A promising approach for expanding and sustaining school-wide positive behavior

support. School Psychology Review, 35, 245–259.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Needham Heights, MA: Allyn & Bacon.

Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. A. Bollen & J. S. Long (Eds.),

Testing structural equation models (pp. 10–39). Newbury Park, CA: Sage.

Vernon-Feagans, L., Cox, M., & The Family Life Project Key Investigators. (2011). The Family Life Project:

An epidemiological and developmental study of young children living in poor rural communities [Monograph].

Retrieved from http://www.fpg.unc.edu/~flp/exec/!Final%20Monograph%20submitted%20to%20SRCD.pdf

Yates, G. C., & Yates, S. M. (1990). Teacher effectiveness research: Towards describing user-friendly classroom

instruction. Educational Psychology, 10, 225–238. doi:10.1080=0144341900100304

Zaslow, M., Martinez-Beck, I., Tout, K., & Halle, T. (2011). Quality measurement in early childhood settings.Baltimore, MD: Brookes.

914 SANDILOS ET AL.