Measuring Quality in Kindergarten Classrooms:Structural Analysis of the Classroom Assessment
Scoring System (CLASS K–3)
Lia E. Sandilos
Department of Communication Sciences and Disorders, Temple University
James C. DiPerna
Department of Educational Psychology, Counseling, and Special Education,The Pennsylvania State University
The Family Life Project Key Investigators
The Pennsylvania State University and University of North Carolinaat Chapel Hill
Research Findings: The purpose of the current study was to evaluate the structural validity of scores
on a measure of global classroom quality, the Classroom Assessment Scoring System (CLASS K–3;
Pianta, La Paro, & Hamre, 2008). Using observational data from a sample of 417 kindergarten
classrooms from the southern and mid-Atlantic regions of the United States, we used confirmatory
factor analysis to examine the structural validity of the CLASS K–3. Factor analytic findings sup-
ported a 3-factor and 10-dimension structure for the CLASS K–3; however, some modifications
were made to the original CLASS model. Practice or Policy: Although the overall structure of
the CLASS has been generally consistent across validation studies, some facets of the model may
be less stable than others. Additional examination of alternative factor structures is needed to further
clarify the relationships among the CLASS dimensions and domains. Current psychometric evidence
provides support for continued use of the CLASS to guide intervention, instruction, and professional
development.
Children’s experiences in early childhood classrooms have been identified as an important
predictor of future academic and social-emotional functioning (La Paro & Pianta, 2000;
Zaslow, Martinez-Beck, Tout, & Halle, 2011). The enrollment of children in prekindergarten
programs nationwide increased steadily from 20% in 1970 to 53% in 2010, and nearly all
children (94%) now attend a part- or full-day kindergarten program (National Center for
Education Statistics, 2012). In response to the growing need for early childhood education
(ECE), standards emerged for ECE licensing and accreditation, and the federal government
Correspondence regarding this article should be addressed to Lia E. Sandilos, Temple University, 1701 N. 13th
Street, 352A Weiss Hall, Philadelphia, PA 19122. E-mail: [email protected]
Early Education and Development, 25: 894–914
Copyright # 2014 Taylor & Francis Group, LLC
ISSN: 1040-9289 print/1556-6935 online
DOI: 10.1080/10409289.2014.883588
invested funds in early education through large-scale programs (e.g., the Head Start Child Care
and Development Fund; Zaslow et al., 2011). Simultaneously, educational researchers furthered
efforts to establish a substantive operational definition of high-quality early education by
examining theoretical models of education and creating instruments to measure the quality of
early childhood classroom experiences.
Developing instruments that can accurately evaluate characteristics of high-quality teaching
and document the contextual factors that may influence effective instruction could improve
the quality of education. Systematic observation is one method of assessment often used by
educators and educational researchers to assess the quality of a classroom environment. The
focus of the current study was to assess the factor structure of a widely used observation scale
developed to evaluate the quality of early classroom environments—the Classroom Assessment
Scoring System (CLASS K–3; Pianta, La Paro, & Hamre, 2008).
THEORETICAL FOUNDATION AND FRAMEWORK FOR THE CLASS K–3
The CLASS observation system (Pianta et al., 2008) was developed to provide a research-based
framework for assessing teacher–child interactions and resulting instructional quality in
prekindergarten and primary classroom environments (Hamre & Pianta, 2007). Since its
publication, the CLASS has been used extensively in evaluation and research in more than
3,000 early childhood classrooms (Hamre, Goffin, & Kraft-Sayre, 2009). As a part of the
Improving Head Start for School Readiness Act of 2007, the Office of Head Start selected
the CLASS as one of the primary observation scales piloted to assess the quality of Head Start
classrooms nationwide (Early Childhood Learning and Knowledge Center, 2008). Head Start
now utilizes CLASS scores to determine the accreditation of new prekindergarten centers
around the nation (Hamre, Hatfield, Jamil, & Pianta, 2013). The growing popularity of the
CLASS framework in research and practice has led to national and international studies of
the psychometric properties of the CLASS, as well as evaluation of the relationship between
CLASS scores and a variety of academic and behavioral outcomes (Pianta et al., 2008).
The purpose of the CLASS is to measure the quality of teachers’ interactions with their
students (La Paro, Pianta, & Stuhlman, 2004). Toddler, prekindergarten, elementary, and
secondary versions of the CLASS are available; however, the focus of the current study
is the K–3 version. The primary theoretical foundation for the CLASS framework is the
developmental systems model of early learning (Pianta, 1999), which considers children’s
interactions with their teacher and the classroom environment to be crucial for academic
success. CLASS factors also were developed through a review of research on high-quality
teaching and an extensive review of existing observation measures commonly used in
early childhood and elementary classrooms (La Paro et al., 2004). Within the published
CLASS framework, instructional quality is assessed in three primary domains: Emotional
Support, Classroom Organization, and Instructional Support (Pianta et al., 2008). These
domains are further divided into 10 dimensions. Emotional Support consists of Positive
Climate, Negative Climate, Teacher Sensitivity, and Regard for Student Perspectives.
Classroom Organization consists of Behavior Management, Productivity, and Instructional
Learning Formats. Instructional Support is composed of Concept Development, Quality of
Feedback, and Language Modeling.
STRUCTURAL ANALYSIS OF THE CLASS K–3 895
Emotional Support
Research regarding effective didactic practices in early childhood has emphasized that the quality
of teacher–student relationships in early education has a significant influence on student learning
and future academic success (La Paro & Pianta, 2000; Pianta, La Paro, Payne, Cox, &
Bradley, 2002). Previous literature also indicates that children often enter school without impor-
tant social-emotional skills. Rimm-Kaufman, Pianta, and Cox (2000) found that 20% of kinder-
garten teachers reported that approximately half of their students lack the social skills needed to
achieve early academic success. Fostering social-emotional support in ECE classrooms is parti-
cularly crucial, as children with behavioral and emotional issues have been found to be less
receptive to intervention as early as age 8 (Eron, 1990).
Hamre, Pianta, Mashburn, and Downer (2007) cited attachment theory as guiding the devel-
opment of the Emotional Support domain because child–caregiver relationships are emphasized
within this domain. Specifically, this domain assesses the level of positive=negative teacher–
student and peer–peer interaction as well as the degree to which the teacher demonstrates
awareness=responsiveness to students’ academic and emotional needs and the teacher’s empha-
sis on student interest and autonomy (Pianta et al., 2008).
Classroom Organization
Recent studies also have identified key aspects of classroom organization that positively impact
learning (Domı́nguez, Vitiello, Maier, & Greenfield, 2010; Rimm-Kaufman, Curby, Grimm,
Nathanson, & Brock, 2009). Specifically, effective classroom management strategies (e.g., clear
behavioral expectations and learning objectives, consistent routines, varied learning modalities)
have been linked to higher levels of self-regulatory and adaptive behaviors in kindergarten
students (Rimm-Kaufman et al., 2009) and improved reading skills in first grade (Ponitz,
Rimm-Kaufman, Brock, & Nathanson, 2009). Furthermore, proactive redirection of misbehavior
has long been considered more effective than reactive behavior management strategies (Sugai &
Horner, 2006; Yates & Yates, 1990).
The Classroom Organization domain of the CLASS draws from research on behavior
management and self-regulation (Hamre et al., 2007), as the use of behavioral reinforcement
strategies (rewarding=recognizing positive behaviors), routines, and methods to improve student
engagement yields higher scores on this construct. Within this domain, teachers are evaluated on
their ability to proactively manage behavior, effectively make use of learning time, and maintain
student attention and participation in instruction.
Instructional Support
High-quality instruction and feedback have a significant impact on the development of higher
order thinking skills (Bierman et al., 2008; Yates & Yates, 1990). In addition, frequent conver-
sation and exposure to literacy has a profound influence on children’s early language develop-
ment (Copple & Bredekamp, 2009). However, kindergartners enter school with wide variation in
their level of exposure to language and literacy (Hart & Risley, 2003). Thus, the CLASS K–3
Instructional Support domain emphasizes teachers’ use of techniques to promote analytical
896 SANDILOS ET AL.
thinking skills, provide feedback to strengthen skills, and facilitate language development.
Because the materials available in early education programs can vary widely, the Instructional
Support domain is distinctive in that it assesses what teachers do with what they have, and it
does not evaluate the quantity or physical quality of the curricular materials accessible in the
environment (Pianta et al., 2008). Behavioral, metacognitive, and constructivist learning theories
are incorporated into the Instructional Support domain through the evaluation of scaffolding,
modeling, rehearsal, and elaboration (Hamre et al., 2007).
STRUCTURAL VALIDATION OF THE CLASS
Several studies have been conducted to examine the structural validity of the CLASS. Hamre
et al. (2007), for example, tested one-, two-, and three-factor CLASS models using data
from a sample of 4,000 prekindergarten through fifth-grade U.S. classrooms. The sample of
classrooms was taken from several large-scale studies that occurred between 1998 and 2005.
The scales used during the observations were versions of the CLASS prekindergarten through
third-grade frameworks and a precursor to the CLASS (i.e., the Classroom Observation
System). Structural analyses indicated that the three-factor model (Emotional Support,
Classroom Organization, and Instructional Support) demonstrated a better overall fit in prekin-
dergarten through third-grade classrooms than the one- and two-factor models that were tested
(Hamre et al., 2007).
There were some notable limitations to this study, however. Most significantly, when the
published CLASS model (three factors and 10 dimensions) was tested with prekindergarten
classrooms, several fit indices suggested inadequate fit and the presence of error in the model
(Browne & Cudeck, 1993; Hu & Bentler, 1999; Schermelleh-Engel, Moosbrugger, & Muller,
2003). In addition, although the framework of three overarching domains (i.e., Emotional
Support, Classroom Organization, and Instructional Support) was consistently evaluated across
grades, there was variability in the dimensions within those domains because different versions
of the CLASS and Classroom Observation System frameworks were used during the longitudi-
nal data collection. For example, an earlier version of the CLASS, which excluded the Language
Modeling and Regard for Student Perspectives dimensions, was tested on the kindergarten
sample in this study (Hamre et al., 2007). As a result, direct comparison of the internal structure
of the CLASS across grade levels was not possible, warranting further research regarding the fit
of the CLASS K–3 model with classrooms in the primary grades.
The CLASS framework also has been examined internationally. A study conducted in
Finland examined the structural validity of the CLASS Pre-K using data from 49 Finnish
kindergarten classrooms (Pakarinen et al., 2010). Results of an initial confirmatory factor analy-
sis (CFA) indicated that the three-factor model hypothesized by Pianta et al. (2008) did not fit
the Finnish classroom data. In addition, Negative Climate displayed poor discriminant validity
within the Emotional Support domain. A final CFA was conducted with Negative Climate
removed from the model and with the residuals correlated between Behavior Management
and Productivity and between Concept Development and Quality of Feedback.
Although most of the indices demonstrated good fit in the final revised model, the root mean
square error of approximation (RMSEA) value was inflated above acceptable fit. Furthermore,
the three domain factors (Emotional Support, Classroom Organization, and Instructional
STRUCTURAL ANALYSIS OF THE CLASS K–3 897
Support) demonstrated multicollinearity (>.90). Because the three domains correlated highly,
a one-factor model of global classroom quality was also tested. Results, though, indicated that
the one-factor model also did not fit the data. In addition, a significant limitation of this study
was the small number of classrooms in the sample (n¼ 49), as a small sample size limits
statistical power and can lead to unreliable results (Comrey & Lee, 1992; Kline, 2005). Thus,
both national and international examinations of the three-factor CLASS framework have
featured less than ideal methodological and structural outcomes.
To address the aforementioned limitations of the structural evidence accrued for the CLASS,
researchers recently have begun to explore alternative conceptual and structural models. For
example, Hamre et al. (2013) tested a bifactor model of the CLASS with a prekindergarten
sample. A bifactor model is composed of a general factor that loads on all indicators and uncor-
related factors that load on select indicators (Chen, West, & Sousa, 2006; Hamre et al., 2013).
Hamre and colleagues (2013) found that the best fitting bifactor CLASS model consisted of
a general factor (Responsive Teaching) and two domain factors (Positive Management and
Routines, and Cognitive Facilitation). This bifactor CLASS model demonstrated a subtle impro-
vement in fit over the original three-factor model by bringing all indices within acceptable
ranges with the exception of RMSEA, which was slightly larger than the recommended maxi-
mum threshold for mediocre fit. The bifactor model represents an entirely new conceptualization
of the CLASS framework, and Hamre et al. (2013) encouraged replication of the bifactor model.
RATIONALE
Classroom quality is a construct that has received increasing attention in ECE research, as
high-quality teaching can have a significant effect on student academic and behavioral outcomes
(Bierman et al., 2008; Howes et al., 2008). The CLASS K–3 is a widely used measure of
classroom quality that has the potential to provide valuable data regarding effective teaching
techniques and to facilitate professional development in school districts (Hamre et al., 2010).
However, because of some notable limitations (e.g., small sample size, suboptimal fit indices,
different versions of the CLASS evaluated simultaneously) of previous validity studies (Hamre
et al., 2007; Pakarinen et al., 2010), as well as the exploration of completely new structural mod-
els for the CLASS (e.g., a bifactor model; Hamre et al., 2013), further evaluation of the structural
validity of the CLASS is necessary. The present study also provides an independent examination
of the psychometric properties of CLASS scores by authors who were not involved in the devel-
opment of the measure. Thus, the primary purpose of this study was to examine the internal
structure of the CLASS K–3.
METHOD
Participants
Participating teachers in the current study were part of a longitudinal observational study of the
cognitive, social, and emotional development of young children. The study has followed 1,292
children and their families since the participating children were born in 2004. Participants in the
898 SANDILOS ET AL.
sample reside in North Carolina and central Pennsylvania (Vernon-Feagans, Cox, & The Family
Life Project Key Investigators, 2011).
Data for the current study were drawn from a sample of 426 classrooms (North Carolina¼232, Pennsylvania¼ 194) within 190 different schools that enrolled all participating children
during their kindergarten year. The majority of the classrooms (94%) were located in public
elementary schools. Schools were recruited in the year prior to data collection, and monetary
incentives were provided to teachers and principals in the form of gift cards. Demographic
information on teachers was collected through a self-report questionnaire. Of the total sample,
412 teachers were women and 14 were men. Teachers ranged in age from 22 to 66 years
(M¼ 41.5, SD¼ 11.2), with an average of 9.4 years of teaching experience (range¼ less than
1 year to 38 years). Nearly half (44%) of the teachers reported having a bachelor’s degree,
and teachers’ median annual income (pretax) was $30,000–$40,000. In addition, 88% of
teachers were Caucasian, 98% spoke English as their first language, 92% were certified in
elementary education, and 82% reported having at least one classroom assistant.
Demographic data regarding the composition of students within classrooms were also
collected from the teacher questionnaire. The median number of students per classroom was
20 (female¼ 48%, male¼ 52%). The average student racial composition with each classroom
was as follows: 62% Caucasian, 24% African American, 11% Hispanic, 1% Asian=Pacific
Islander, and 2% other. Half of the teachers (51%) reported having at least one student in
their classroom who spoke a language other than English, with the most common language
spoken in the home being Spanish (14%).
Measures
CLASS K–3. The CLASS K–3 assesses the quality of the classroom environment through
three primary domains and 10 dimensions (Emotional Support domain¼ Positive Climate,
Negative Climate,1 Teacher Sensitivity, and Regard for Student Perspectives; Classroom Organi-
zation domain¼Behavior Management, Productivity, and Instructional Learning Formats;
Instructional Support domain¼Concept Development, Quality of Feedback, and Language
Modeling). Each dimension is rated on a 7-point scale ranging from low (1–2) to middle (3–5)
to high (6–7). CLASS dimensions are calculated by averaging scores across cycles within an
observation. To calculate each domain score, one computes the mean for the dimensions that
fall within that domain. In this study, the dimension scores were used for the statistical analyses.
Raters can observe a classroom for one to six cycles. A cycle consists of 20 min of obser-
vation and 10 min of coding. In the current study, each classroom observation consisted of
two cycles. The 1-hr observation (i.e., two cycles) has been endorsed by the authors of the
CLASS, as studies have indicated that data from two cycles correlate highly with data from four
cycles (rs¼ .89–.95; Pianta et al., 2008). The authors of the CLASS (Pianta et al., 2008) reported
adequate interrater agreement (.78–.96) and internal consistency (.76–.90). In the current study,
interobserver agreement data were collected in the field approximately midway through the
data collection window, and all data collectors maintained the 80% interrater reliability criterion
achieved during the CLASS certification training.
1The averaged Negative Climate dimension is reverse-scored before the Emotional Support domain score is
computed.
STRUCTURAL ANALYSIS OF THE CLASS K–3 899
Procedure
CLASS observations were completed in kindergarten classrooms during the fall semester
(October through December) of the academic year. Observers consisted of part-time graduate
assistants and full-time data collectors. Different data collection teams conducted CLASS
observations in each state, but all observers had to undergo the same CLASS training. Prior
to the start of data collection, observers at both sites were formally trained and certified to
use the CLASS K–3. To meet initial certification requirements, all observers attended a 2-day
formal training conducted by a certified CLASS trainer. After the training, the observers viewed
and coded videos of early elementary classrooms (developed by the authors of the CLASS) and
completed a real-time CLASS observation in a prekindergarten classroom. The trainees’ scores
on the videos were compared to criterion scores provided by CLASS developers, and scores
on the real-time observation were compared to those of the CLASS trainer, who observed
simultaneously. All trainees had to meet the interrater agreement accuracy standard specified
by the authors of the CLASS (percent-within-one agreement �.80; Pianta et al., 2008)
before conducting observations for the study. In addition to being established by the CLASS
developers (i.e., Pianta et al., 2008), the 80% criterion is supported by other sources regarding
reliability of low-stakes assessments (e.g., Nunnally & Bernstein, 1994; Salvia & Ysseldyke,
2007; Sattler, 2001).
The certified CLASS observers traveled to various classroom sites in North Carolina
and central Pennsylvania over a 16-week period. All observers used the CLASS K–3 obser-
vation scale to assess the kindergarten classroom environment and completed two 30-min cycles
in each classroom. One observer conducted the CLASS observation in each classroom, and
observations were conducted at the start of the school day during circle time and morning
academic activities.
Design and Data Analyses
CFA was used to evaluate the structural validity of the CLASS K–3. Specifically, AMOS 19.0
software was used to estimate each model. Tanaka (1993) recommended using multiple fit
indices to assess the overall fit of a model. Thus, a variety of model fit and model comparison
indices were examined in the present study. Model fit indices measure how well the proposed
model represents the data drawn from the current sample (Kline, 2005). The model fit indices
examined in this study consisted of the RMSEA, the goodness-of-fit index (GFI), and the stan-
dardized root-mean-square residual (SRMR). Criteria for model fit were as follows: RMSEA<.05¼ good fit, .05–.08¼ adequate fit, .08–.10¼mediocre fit, >.10¼ unacceptable (Browne &
Cudeck, 1993); GFI> .90¼ good fit (Hu & Bentler, 1999); SRMR< .08¼ good fit, .08–.10¼acceptable (Hu & Bentler, 1999; Schermelleh-Engel et al., 2003).
Model comparison indices, which represent improvement in a model after modifications have
been made and the likelihood that the model is replicable, also were examined (Kline, 2005).
The model comparison indices examined in this study were the comparative fit index (CFI),
Bentler–Bonett normed fit index (NFI), Tucker–Lewis index (TLI), Akaike information
criterion (AIC), and the consistent Akaike information criterion (CAIC). For the CFI, NFI,
and TLI, values �.90 indicate acceptable fit (Bentler & Bonett, 1980; Hu & Bentler, 1999;
900 SANDILOS ET AL.
Kline, 2005), whereas values �.95 indicate good fit (Schermelleh-Engel, 2003). For the AIC
and CAIC, lower values are preferred (Kline, 2005). Given that chi-square is particularly
sensitive to sample size (Kline, 2005; Schermelleh-Engel et al., 2003), we deemphasized this
statistic when evaluating the fit of each model.
RESULTS
The CLASS K–3 observations were used for structural analyses. Three of these cases (<1% of
the total sample) had one or more missing data points. Using the Mahalanobis distance test
(Tabachnick & Fidell, 2007), we identified nine multivariate outliers (p< .001). No systematic
patterns of missing data or outliers were evident among the identified cases; thus, the cases were
deleted listwise (Field, 2009). The final sample size (n¼ 417) was sufficient for the planned
analyses.
Descriptive statistics for each CLASS K–3 dimension and domain are provided in Table 1.
Dimension scores aggregated across two cycles were used as items or indicators in the CFA.
Prior to conducting the CFA, we examined the data to determine whether the necessary assump-
tions had been met. Normality was determined through a visual inspection of normal probability
plots and an examination of skewness and kurtosis values. Standardized indices were considered
highly skewed or kurtotic at >2.0 and >7.0, respectively (Fabrigar, Wegener, MacCallum, &
Strahan, 1999). Based on these criteria, only one CLASS domain, Negative Climate, demon-
strated severe skewness and severe kurtosis (see Table 1). A log-linear transformation was used
to successfully normalize the Negative Climate data (transformed skewness¼ 2.4, transformed
kurtosis¼ 5.8). Linearity of the data, as examined via a visual inspection of scatter plots, was
met. The presence of multicollinearity or singularity was not a significant concern as only
TABLE 1
Means, Standard Deviations, Skewness, and Kurtosis for Classroom Assessment Scoring
System K–3 Dimension and Domain Scores
Dimension=Domain M SD Skewness Kurtosis
Positive Climate 5.35 1.03 �0.22 �0.46
Negative Climatea 1.16 3.43 3.43 13.97
Teacher Sensitivity 4.90 1.11 �0.09 �0.50
Regard for Student Perspectives 3.97 1.11 �0.13 �0.36
Behavior Management 5.41 0.98 �0.52 0.19
Productivity 5.42 0.96 �0.46 0.10
Instructional Learning Formats 4.64 0.96 �0.16 �0.28
Concept Development 2.60 1.07 0.32 �0.82
Quality of Feedback 3.39 1.08 0.12 �0.71
Language Modeling 3.09 0.98 0.21 �0.33
Emotional Support 5.26 0.76 �0.33 �0.07
Classroom Organization 5.16 0.80 �0.57 0.31
Instructional Support 3.03 0.92 0.13 �0.59
aNegative Climate was log transformed.
STRUCTURAL ANALYSIS OF THE CLASS K–3 901
TABLE2
CorrelationsforClassroom
AssessmentSco
ringSystem
K–3DomainsandDim
ensionsforFactorAnalytic
Sample
Variable
12
34
56
78
910
1112
13
1.
Po
siti
ve
Cli
mat
e—
�.4
3��
.74��
.58��
.57��
.46��
.57��
.18��
.31��
.31��
.88��
.65��
.30��
2.
Neg
ativ
eC
lim
ate
—�
.37��
�.2
7��
�.4
7��
�.2
8��
�.2
4��
�.0
3�
.14��
�.0
9�
.53��
�.4
0��
�.1
0�
3.
Tea
cher
Sen
siti
vit
y—
.61��
.51��
.42��
.64��
.19��
.37��
.33��
.89��
.63��
.34��
4.
Reg
ard
for
Stu
den
tP
ersp
ecti
ves
—.2
4��
.19��
.56��
.36��
.36��
.33��
.82��
.40��
.40��
5.
Beh
avio
rM
anag
emen
t—
.63��
.39��
0.0
7.2
3��
.17��
.54��
.82��
.18��
6.
Pro
du
ctiv
ity
—.5
6��
0.0
7.2
7��
.26��
.42��
.88��
.22��
7.
Inst
ruct
ional
Lea
rnin
gF
orm
ats
—.3
7��
.52��
.46��
.67��
.79��
.51��
8.
Conce
pt
Dev
elopm
ent
—.6
7��
.62��
.27��
.21��
.87��
9.
Qual
ity
of
Fee
dbac
k—
.70��
.39��
.41��
.90��
10
.L
angu
age
Mo
del
ing
—.3
6��
.36��
.87��
11.
Em
oti
onal
Support
—.6
5��
.38��
12.
Cla
ssro
om
Org
aniz
atio
n—
.37��
13.
Inst
ruct
ional
Support
—
� p<
.05
.��p<
.01
.
902
one correlation, between the Quality of Feedback dimension and the corresponding Instructional
Support domain, had a value of .90 (see Table 2).
CFAs (maximum likelihood extraction) were conducted for six models. The first model
tested was identical to the factor structure tested by Hamre et al. (2007) with the CLASS
standardization sample and reported in the published version (Pianta et al., 2008) of the CLASS
manual. The second model was the revised CLASS model, which included five modifications
to the published structure. Then, based on the abbreviated model proposed by Pakarinen et al.
(2010), a model excluding the Negative Climate dimension was tested. Next, a model with
10 dimensions loading on one global domain, a model with 10 dimensions loading on two
domains (Emotional Support and Instructional Support), and a bifactor model based on the work
of Hamre et al. (2013) were tested.
The original CLASS K–3 framework (Hamre et al., 2007; Pianta et al., 2008) was tested with
the current sample, and results indicated that the model did not fit the data well (see Table 3;
Figure 1). Based on modification indices, five changes were made to improve the model. First,
the residuals of Productivity and Behavior Management were correlated, as these dimensions are
conceptually related (Pianta et al., 2008), and this modification was consistent with previous
structural validity research (Pakarinen et al., 2010). Second, the residuals of Behavior Manage-
ment and Negative Climate were correlated. Although these dimensions were part of different
domains, Classroom Organization and Emotional Support were highly related factors, and
modification indices revealed that Behavior Management shared a significant amount of
residual error with the Emotional Support indicators. Third, the residuals of Regard for Student
Perspectives and Concept Development were correlated, as these indicators demonstrated
a moderate correlation in the present study and were found to be highly correlated in previous
research (Pianta et al., 2008). Fourth, a direct pathway was inserted from Emotional Support
to Behavior Management based on the moderate to strong relationship between the four
Emotional Support indicators and Behavior Management. Fifth, the pathway from Classroom
Organization to Behavior Management was removed, as the weight of this pathway was
negligible after we incorporated the other modifications (see Figure 2). In the revised CLASS
model (see Figure 2), the GFI, SRMR, CFI, NFI, and TLI all fell within the range of acceptable
TABLE 3
Fit Indices for the Structural Models
Fit indices
Modification v2 df SRMR RMSEA GFI CFI NFI TLI AIC CAIC
Original CLASS K–3 359.9 32 .087 .157 .841 .851 .839 .790 405.9 521.6
Revised CLASS K–3 142.3 29 .060 .097 .936 .948 .936 .920 194.3 325.2
Revised Pakarinen model 146.4 20 .065 .123 .927 .939 .930 .889 196.4 322.2
One factor 839.6 35 .140 .235 .684 .633 .625 .528 879.6 980.3
Two factors (emotional and instructional) 412.2 34 .091 .164 .835 .828 .816 .772 454.2 559.5
Bifactor (Hamre et al., 2013) 165.2 28 .066 .136 .896 .901 .891 .840 298.5 434.4
Note. All chi-square values are significant at p< .01. SRMR¼ standardized root-mean-square residual; RMSEA¼root mean square error of approximation; GFI¼ goodness-of-fit index; CFI¼ comparative fit index; NFI¼Bentler–
Bonett normed fit index; TLI¼Tucker–Lewis index; AIC¼Akaike information criterion; CAIC¼ consistent Akaike
information criterion; CLASS K–3¼Classroom Assessment Scoring System.
STRUCTURAL ANALYSIS OF THE CLASS K–3 903
to good fit, and the RMSEA fell just within the .10 threshold for mediocre fit (Browne &
Cudeck, 1993). The AIC and CAIC produced the lowest values after all five modifications were
incorporated into the original CLASS model (see Table 3).
FIGURE 1 Classroom Assessment Scoring System K–3 original model (Hamre et al., 2007; Pianta et al., 2008).
904 SANDILOS ET AL.
Another model tested in this study was based on the results of the Pakarinen et al. (2010)
study (see Figure 3). First, Negative Climate was removed from the original CLASS K–3 model;
however, the resulting model did not demonstrate good fit with the data. Second, the residual
errors of Behavior Management and Productivity were correlated, which created a slight
improvement in fit. However, the third modification recommended by Pakarinen et al. (i.e.,
FIGURE 2 Revised Classroom Assessment Scoring System K–3 model.
STRUCTURAL ANALYSIS OF THE CLASS K–3 905
correlated residuals of Quality of Feedback and Concept Development) resulted in a covariance
matrix that was not positive definite, so the exact Pakarinen model could not be tested as part of
this study. Instead, three additional revisions were made to the model based on modification
FIGURE 3 Revised Pakarinen et al. (2010) Classroom Assessment Scoring System K–3 model.
906 SANDILOS ET AL.
FIGURE 4 One-factor model of the Classroom Assessment Scoring System K–3.
STRUCTURAL ANALYSIS OF THE CLASS K–3 907
indices: (a) correlating residuals of Regard for Student Perspectives and Concept Development,
(b) correlating residuals of Positive Climate and Behavior Management, (c) and setting a direct
pathway from Emotional Support to Productivity. In the revised Pakarinen model, the GFI,
SRMR, CFI, and NFI met criteria for acceptable to good fit. In addition, the AIC and CAIC
produced the lowest values after all modifications were made to the Pakarinen model. The
TLI was just below the criterion threshold for acceptable fit, and the RMSEA was inflated
beyond the mediocre threshold (see Table 3).
FIGURE 5 Two-factor model of the Classroom Assessment Scoring System K–3.
908 SANDILOS ET AL.
Three additional alternative factor structures (i.e., one-factor, two-factor, and bifactor models)
were tested. First, given the moderate to high correlations observed between the latent factors
and dimensions in this study and in previous research (e.g., Hamre et al., 2007; Pakarinen
et al., 2010), a one-factor model was tested with the CLASS data (see Figure 4). Results of this
CFA, however, did not provide strong support for such a model (see Table 3).
FIGURE 6 Bifactor Classroom Assessment Scoring System model (Hamre et al., 2013).
STRUCTURAL ANALYSIS OF THE CLASS K–3 909
Second, a two-factor model was tested with Emotional Support and Classroom Organization
combined into a single factor while the Instructional Support domain remained intact as the
second factor (see Figure 5). Results indicated that the two-factor model improved upon the
one-factor model, but it still demonstrated worse fit than the original three-factor CLASS model
and the Pakarinen model.
Finally, the Hamre et al. (2013) bifactor model was tested (see Figure 6). The overall fit of the
bifactor model was an improvement upon the original three-factor CLASS model, but the model
demonstrated a worse fit than the revised CLASS K–3 model (see Table 3).
Overall, the revised CLASS model (see Figure 2), which had several minor modifications
(i.e., correlating residuals) and one substantive modification (i.e., moving Behavior Management
as an indicator of Classroom Organization to Emotional Support), was determined to be the best
fitting model with the current sample.
DISCUSSION
This study tested the original structure of the CLASS K–3 with a sample of classrooms in North
Carolina and Pennsylvania. Fit indices for the original CLASS model did not meet either the
recommended (Hu & Bentler, 1999; Kline, 2005) or the less stringent (Browne & Cudeck,
1993; Schermelleh-Engel et al., 2003) criterion thresholds in the current sample. This finding is
somewhat consistent with previous CLASS structural validity research (e.g., Hamre et al., 2007,
2013; Pakarinen et al., 2010), and the limited fit suggests a pattern of error emerging across studies.
In the current study, several minor modifications (i.e., correlating residuals) were made to the
original three-factor CLASS model. Although the correlation of residuals was considered to be
a relatively minor modification, it still presents a concern for the CLASS model, as the correlations
reveal associations among the dimensions that are not being explained by the three domain factors.
These associations could be resulting from the presence of key characteristics measured by the
indicators that are not explained by the current factors, such as the potential impact of different
student needs and interactions, teacher social-emotional functioning, and observer characteristics.
The substantive modification (i.e., the placement of a direct pathway between Emotional
Support and Behavior Management) resulted in a revised CLASS model that demonstrated
reasonable fit with the data. Across the current and previous (e.g., Hamre et al., 2013) studies,
a relationship among the emotional and classroom management indicators appears to be
emerging, with Behavior Management in particular demonstrating a strong presence within these
domains. The placement of Behavior Management in the Emotional Support domain in the
revised CLASS K–3 model is also consistent with Hamre and colleagues’ (2013) bifactor model,
in which the dimensions of Emotional Support and Classroom Organization are combined to
form one factor (i.e., Positive Management and Routines). In addition, Behavior Management
demonstrated the strongest factor loading on the Positive Management and Routines domain
within the bifactor model in both the current study and the study by Hamre et al. (2013).
One potential explanation for these findings is that the Behavior Management dimension
yields higher scores when teachers utilize positive strategies that enable students to self-regulate,
assist students in understanding the feelings of others (e.g., perspective taking), and use
subtle cues to redirect behavior as opposed to overt disciplinary actions (Pianta et al., 2008).
The emotionally supportive strategies encompassed within the operationalization of Behavior
910 SANDILOS ET AL.
Management may be contributing to its relationship with the Emotional Support domain in the
current study. Mosier (2001) recommended that developmentally appropriate classroom man-
agement techniques foster understanding of social consciousness and prosocial behavior (e.g.,
perspective taking, sharing). The implementation of such behavior management techniques
requires a degree of emotional sensitivity on the part of the teacher.
The aforementioned results must be considered within the context of several limitations. First,
the sample used in this study was not drawn from a nationally representative population of
kindergarten teachers, which limits the generalizability of the results. Second, the sample only
included kindergarten classrooms, although the CLASS K–3 can be used in kindergarten
through third-grade classrooms. Third, to improve model fit, modifications were made to the
original 10-dimension, three-domain CLASS structure, and fit indices were interpreted using
less stringent criteria (Browne & Cudeck, 1993; Schermelleh-Engel et al., 2003) than some
(Hu & Bentler, 1999; Kline, 2005) have recommended for testing structural equation models.
Fourth, all of the alternative models presented within this article add complexity to the scoring
process beyond the simple hand calculations required for the published version of the CLASS.
Finally, the current study focused solely on a select form of validity evidence for CLASS K–3
scores in kindergarten classrooms. Although structural validity is important to consider, the
examination of additional types of score validity and reliability evidence (e.g., concurrent=predictive validity, internal consistency reliability) is essential to justify the use of scores from
a measurement system such as the CLASS K–3.
The findings of this study have implications for research and practice. ECE and its contri-
bution to children’s cognitive, social, and emotional development has come to the forefront
of educational research and policy. Information obtained from an observation system such
as the CLASS K–3 could provide valuable and constructive feedback to practitioners and research-
ers regarding effective teaching practices. The data from the CLASS observation systems are
already being utilized throughout the United States for the purpose of enhancing ECE. Thus,
it is critical that scores yielded from this scale demonstrate strong psychometric properties.
CFAs revealed the presence of some error in the structure of the scale, as demonstrated by fit
indices in the original CLASS model. This finding has potential implications for the validity of
score interpretation, as the fit of the original model raises questions about the organizational
structure of the CLASS. Findings suggest that within the current sample of kindergarten
classrooms, the Behavior Management dimension may be a better reflection of Emotional
Support than Classroom Organization. Replication of this modification in nationally representa-
tive samples of kindergarten classrooms is needed to further substantiate this finding. Neverthe-
less, the strong relationship between Emotional Support and Behavior Management has potential
implications for teacher training in the early grades, as the implementation of social-emotional
training for students (e.g., teaching children to express feelings effectively) may be most
effective for managing and preventing problem behaviors in the primary grades.
Although the results of this study suggest that a modified three-factor model demonstrated
slightly better fit with data from a kindergarten classroom sample than a bifactor model, one
advantage of the bifactor model is that it yields uncorrelated factors, which is beneficial when
using scores from CLASS dimensions as statistical predictors of student or class outcomes.
Regardless, both the Hamre et al. (2013) and current results suggest that the CLASS factor
structure may need modification to accurately represent the quality of classroom interactions.
Future validity research regarding the extent to which the factor structure of the CLASS changes
STRUCTURAL ANALYSIS OF THE CLASS K–3 911
depending on the grade level observed would be beneficial, as this information may better
elucidate the impact of students’ age=grade on classroom management practices.
Moreover, an analysis of the CLASS K–3 across samples drawn from various schools,
observers, socioeconomic status levels, and community types (e.g., urban, suburban, rural) will
provide further information regarding the nature of CLASS K–3 domains and dimensions across
varying subpopulations. Multilevel CFA is a recent approach that can be used to test nested
data wherein scores may be affected by common characteristics (e.g., teachers within the same
schools; Kline, 2013). Previous structural validity studies do not appear to have considered the
nested nature of CLASS data through multilevel CFA. Within the current study, multilevel CFA
could not be tested because many of the schools in the sample had only one participating teacher,
but future examinations of the CLASS structure may wish to consider this method. At present,
the current structure of the scale may be best used in conjunction with additional measures
hypothesized to provide information related to each CLASS domain, such as anecdotal observa-
tions, student feedback, and academic data.
Although the structural validity of a scale is critical, it is also essential to consider the utility
of scores for guiding intervention. For example, previous research has linked the use of the
CLASS within the context of professional development to increases in student reading scores
(Hamre et al., 2010). In addition to continued psychometric studies, further examination of
the relationship between the use of CLASS scores to inform teacher professional development
and student achievement will provide valuable insight regarding the relationships among the
dimensions and domains and the overall utility of the scale.
As noted in the literature review, the dimensions and domains of the CLASS have substantial
theoretical support and reflect current research regarding effective practices in early childhood
settings. However, the elevated error in the structural fit of the CLASS model when it is used
in early childhood settings indicates that some aspects of the model may be less stable across
classrooms than others. Moreover, emerging evidence indicates that there may be a need for
some reconceptualization of the CLASS factors. Further inquiry into the structural validity of
CLASS scores with diverse populations and other primary grades to identify consistencies or
variations in structure based on these characteristics will help clarify the need (or not) for such
reconceptualization.
REFERENCES
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures.
Psychological Bulletin, 88, 588–606.
Bierman, K. L., Domitrovich, C. E., Nix, R. L., Gest, S. D., Welsh, J. A., Greenberg, M. T., . . . Gill, S. (2008).
Promoting academic and social-emotional school readiness: The Head Start REDI Program. Child Development,
79, 1802–1817. doi:10.1111=j.1467-8624.2008.01227
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.).
Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage.
Chen, F. F., West, S. G., & Sousa, K. (2006). A comparison of bifactor and second-order models of quality of life.
Multivariate Behavioral Research, 41, 189–225.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Copple, C., & Bredekamp, S. (Eds.). (2009). Developmentally appropriate practice in early childhood programs:Serving children from birth through age 8 (3rd ed.). Washington, DC: National Association for the Education of
Young Children.
912 SANDILOS ET AL.
Domı́nguez, X., Vitiello, V. E., Maier, M. F., & Greenfield, D. B. (2010). A longitudinal examination of young
children’s learning behavior: Child-level and classroom-level predictors of change throughout the preschool year.
School Psychology Review, 39, 29–47.
Early Childhood Learning and Knowledge Center. (2008). Classroom Assessment Scoring System (CLASS). Retrieved
from http://www.acf.hhs.gov/programs/opre/research/topic/overview/head-start
Eron, L. D. (1990). Understanding aggression. Bulletin of the International Society for Research on Aggression, 12, 5–9.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor
analysis in psychological research. Psychological Methods, 4, 272–299. doi:10.1037==1082-989X.4.3.272
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London, England: Sage.
Hamre, B. K., Goffin, S. G., & Kraft-Sayre, M. (2009). Classroom Assessment Scoring System implementation guide:
Measuring and improving classroom interactions in early childhood settings. Retrieved from the Center for
Advanced Study of Teaching and Learning website: http://curry.virginia.edu/research/centers/castl/publications
Hamre, B. K., Hatfield, B. E., Jamil, F., & Pianta, R. C. (2013). Evidence for general and domain specific elements of
teacher-child interactions: Associations with preschool children’s development. Child Development. Advance online
publication. doi:10.1111=cdev.12184
Hamre, B. K., Justice, L. M., Pianta, R. C., Kilday, C., Sweeney, B., Downer, J. T., & Leach, A. (2010). Implementation
fidelity of MyTeachingPartner literacy and language activities: Association with preschoolers’ language and literacy
growth. Early Childhood Research Quarterly, 25, 329–347. doi:10.1016=j.ecresq.2009.07.002
Hamre, B. K., & Pianta, R. C. (2007). Learning opportunities in preschool and early elementary classrooms. In R. C.
Pianta, M. J. Cox, & K. Snow (Eds.), School readiness and the transition to school (pp. 49–84). Baltimore, MD:
Brookes.
Hamre, B. K., Pianta, R. C., Mashburn, A. J., & Downer, J. T. (2007). Building a science of classrooms: Application of
the CLASS framework in over 4,000 early childhood and elementary classrooms. Retrieved from the Foundation for
Child Development website: http://fcd-us.org/resources/building-science-classrooms-application-class-framework-
over-4000-us-early-childhood-and-e?destination=resources%2Fsearch%3Ftopic%3D0%26authors%3DHamre%26
keywords%3D
Hart, B., & Risley, T. R. (2003). The early catastrophe. Education Review, 17, 110–118.
Howes, C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., & Barbarin, O. (2008). Ready to learn?
Children’s pre-academic achievement in pre-kindergarten programs. Early Childhood Research Quarterly, 23,
27–50. doi:10.1016=j.ecresq.2007.05.002
Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indices in covariance structure analysis: Conventional criteria
versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1080=10705519909540118Kline, R. B. (2005). Principles and practice of structural equation modeling. New York, NY: Guilford Press.
Kline, R. B. (2013). Exploratory and confirmatory factor analysis. In Y. Petscher, C. Schatschneider, & D. L. Compton
(Eds.), Applied quantitative analysis in education and the social sciences (pp. 202–203). New York, NY: Routledge.
La Paro, K. M., & Pianta, R. C. (2000). Predicting children’s competence in the early school years: A meta-analytic
review. Review of Educational Research, 70, 443–484. doi:10.1016=j.jsp.2006.01.003
La Paro, K. M., Pianta, R. C., & Stuhlman, M. (2004). The Classroom Assessment Scoring System: Findings from the
pre-kindergarten year. The Elementary School Journal, 104, 409–426. doi:10.1086=499760
Mosier, W. (2001). Developmentally appropriate child guidance: Helping children gain self-control. Retrieved from
http://www.childcarequarterly.com/spring09_story1a.html
National Center for Education Statistics. (2012). The condition of education (NCES Publication No. 2012045). Retrieved
from http://nces.ed.gov/programs/coe
Nunnally, J., & Bernstein, I. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.
Pakarinen, E., Lerkkanen, M., Poikkeus, A., Kiuru, N., Siekkinen, M., Rasku-Puttonen, H., & Nurmi, J. (2010).
A validation of the Classroom Assessment Scoring System in Finnish kindergartens. Early Education & Development,
21, 95–124. doi:10.1080=10409280902858764
Pianta, R. C. (1999). Enhancing relationships between children and teachers. Washington, DC: American Psychological
Association. doi:10.1037=10314-000
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). The Classroom Assessment Scoring System manual, K-3.
Baltimore, MD: Brookes.
Pianta, R. C., La Paro, K. M., Payne, C., Cox, M. J., & Bradley, R. (2002). The relationship of kindergarten classroom
environment to teacher, family, and school characteristics and child outcomes. Elementary School Journal, 102,
225–238. doi:10.1086=499701
STRUCTURAL ANALYSIS OF THE CLASS K–3 913
Ponitz, C. C., Rimm-Kaufman, S. E., Brock, L. L., & Nathanson, L. (2009). Early adjustment, gender differences,
and classroom organizational climate in first grade. The Elementary School Journal, 110, 142–162.
Rimm-Kaufman, S. E., Curby, T. W., Grimm, K. J., Nathanson, L., & Brock, L. L. (2009). The contribution of children’s
self-regulation and classroom quality to children’s adaptive behaviors in the kindergarten classroom. Developmental
Psychology, 45, 958–972. doi:10.1037=a0015861
Rimm-Kaufman, S. E., Pianta, R. C., & Cox, M. J. (2000). Teachers’ judgments of problems in the transition to
kindergarten. Early Childhood Research Quarterly, 12, 363–385. doi:10.1016=S0885-2006(00)00049-1
Salvia, J., & Ysseldyke, J. E. (2007). Assessment in special and inclusive education (10th ed.). New York, NY:
Houghton Mifflin.
Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). La Mesa, CA: Jerome Sattler.
Schermelleh-Engel, K., Moosbrugger, H., & Muller, H. (2003). Evaluating the fit of structural equation models: Tests of
significance and descriptive goodness-of-fit measures. Methods of Psychological Research Online, 8, 23–74.
Sugai, G., & Horner, R. R. (2006). A promising approach for expanding and sustaining school-wide positive behavior
support. School Psychology Review, 35, 245–259.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Needham Heights, MA: Allyn & Bacon.
Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. A. Bollen & J. S. Long (Eds.),
Testing structural equation models (pp. 10–39). Newbury Park, CA: Sage.
Vernon-Feagans, L., Cox, M., & The Family Life Project Key Investigators. (2011). The Family Life Project:
An epidemiological and developmental study of young children living in poor rural communities [Monograph].
Retrieved from http://www.fpg.unc.edu/~flp/exec/!Final%20Monograph%20submitted%20to%20SRCD.pdf
Yates, G. C., & Yates, S. M. (1990). Teacher effectiveness research: Towards describing user-friendly classroom
instruction. Educational Psychology, 10, 225–238. doi:10.1080=0144341900100304
Zaslow, M., Martinez-Beck, I., Tout, K., & Halle, T. (2011). Quality measurement in early childhood settings.Baltimore, MD: Brookes.
914 SANDILOS ET AL.
Top Related