The Development of the Student Instructional Report II · PDF fileThe Development of the...
Transcript of The Development of the Student Instructional Report II · PDF fileThe Development of the...
TM
The Development of the Student Instructional Report II
by
John A. Centra Chair and Professor
Program in Higher Education School of Education Syracuse University
Higher Education Assessment
The Development of the Student Instructional Report II
by
John A. Centra Chair and Professor
Program in Higher Education School of Education Syracuse University
EDUCATIONAL TESTING SERVICE, ETS, the ETS logo, STUDENT INSTRUCTIONAL REPORT II, SIR II logo, and HIGHER EDUCATION ASSESSMENT is a trademark of Educational Testing Service. The modernized logo is a trademark of Educational Testing Service.
Copyright © 1998 by Educational Testing Service. All rights reserved.
Table of Contents Page Overview .............................................................................................................................1 The Development of SIR II ...............................................................................................3 Research Background .....................................................................................................3 Construct Validity...........................................................................................................5 Instrument Design..............................................................................................................7 SIR II Teaching Dimensions ...........................................................................................7 Questionnaire Format......................................................................................................8 Response Format.............................................................................................................9 Pretesting SIR II ..............................................................................................................11 PreTest Item and Scale Statistics ...................................................................................11 Factor Analysis of Forms A and B.................................................................................16 Rasch Analysis of Forms A and B .................................................................................17 SIR II: Pilot Testing.........................................................................................................20 Factor Analysis of SIR II................................................................................................20 Reliability of SIR II........................................................................................................21 Note on Validity................................................................................................................24 References ........................................................................................................................29 List of Tables: Table 1. Course Instructional Report, Form A, Item and Scale Analysis ..................................................................................31 Table 2. Course Instructional Report, Form B, Item and Scale Analysis ..................................................................................35 Table 3. PreTest Course Instructional Report, Form A, Factor Analysis (Varimax Rotation) ................................................................39 Table 4. PreTest Course Instructional Report, Form B, Factor Analysis (Varimax Rotation) ...............................................................41 Table 5. SIR II, Factor Analysis, Factor Loadings on Six Scales, Equamax Rotation............................................................................................43 Table 6. SIR II, Coefficient Alpha Reliability Analysis ...............................................45
Table of Contents Page Table 7. SIR II, Item Reliability Coefficients for Classes with 10, 15, 20, and 25 Respondents .............................................................................47 Table 8. SIR II, Item and Scale Test/Retest Reliability Coefficients............................49 List of Appendices............................................................................................................51 A. Guidelines for the Use of Results of the Student Instructional Report (SIR/SIR II) ..................................................................................................53 B. Couse Instructional Report (CIR), Form A ..............................................................55 C. Course Instructional Report (CIR), Form B.............................................................57 D. Student Instructional Report II (SIR II)....................................................................59 E. Student Instructional Report II (SIR II), Student Comments Section ...................................................................................61
THE DEVELOPMENT OF SIR II OVERVIEW
The original Student Instructional Report (SIR), published in 1972, was based on what was
then known about effective college teaching and how students might contribute to its evaluation.
Since that time much has been learned about both effective teaching and its evaluation. The
development of SIR II, described in this report, was based on this knowledge.
Two new forms were developed and pretested in spring 1994. These forms included five of
the scales from the original SIR with questions (items) added or deleted. Three new scales or
dimensions were added: Course Outcomes, Student Effort and Involvement, and Methods of
Instruction (originally Supplementary Instructional Methods). These new scales reflected recent
emphases on measuring learning outcomes, promoting students' time on task and effort in their
learning, and encouraging active learning in the classroom. Each of the two pretested forms
included a different response format to the same set of items and scales. By having random halves
of students in 50 classes respond to the two forms, it was possible to determine which response
format was better.
The pretesting was conducted at 10 two- and four-year colleges. Traditional item and scale
analyses of the two forms included computing means, standard deviations, coefficient alphas, item-
to-scale correlations, and factor analyses. A Rasch analysis also compared the response categories
for the two forms to determine which provided better variation in student responses. Based on the
above analyses, one of the two forms was selected and items were further honed and altered.
1
The 40-item SIR II that evolved from the pretesting included eight categories of items and
an overall evaluation item. The items were grouped within each category or dimension. Students
answered most of the items regarding a course based on a five-point scale of “effectiveness,” or as
“compared to other courses,” both of these, different formats from the original SIR. A set of open-
ended questions that parallelled the eight SIR II categories was also developed to enable students to
add their comments for the instructor.
The pilot testing occurred at a variety of colleges from spring 1995 through spring 1996.
Course means and standard deviations were computed for each item and scale. A sample of the
data from the pilot testing was used to determine the reliability and construct validity of SIR II. The
three kinds of reliability computed established the internal consistency of the items within the scales
(coefficient alpha), the number of students needed for consistency of course results (intraclass
correlations), and the stability of responses over brief periods of time (test-retest). The factor
analysis indicated that the resulting factors matched perfectly with the expected or a priori scales for
SIR II.
2
THE DEVELOPMENT OF SIR II
Research Background
Research on student evaluations of teaching has mushroomed in the past 25 years, with
ERIC now containing well over 1,500 references. The vast majority of the findings from these
studies have supported the use of student evaluations for both teaching improvement and personnel
decisions, especially if users observe proper guidelines. A general discussion of these guidelines
appears in Reflective Faculty Evaluation (Centra, 1993) and, as they apply to both Student
Instructional Reports, in Appendix A.
The original SIR, published in 1972, was based on studies available at that time. In the
decade or so that followed, a series of studies on the reliability, validity, and utility of student
evaluations with the SIR found that:
• The reliability or consistency (i.e., an internal consistency measure) of mean
student ratings was good, particularly when based on more than 15 students
in a class. The ratings were also reasonably stable over short periods of time,
as measured by test-retest reliability. (SIR Reports #3 & 4)
• Validity studies indicated that student ratings generally evaluated some
aspect of teaching effectiveness. For example, in multiple section courses
instructors who received higher ratings tended to be teaching classes in which
students learned more (as measured by a final exam). Class size, subject area
of the course, and course type (major requirements, elective, etc.) were the only
3
characteristics that affected ratings, and those effects were relatively small.
(SIR Report #4)
• The usefulness of student ratings for instructional improvement was
demonstrated in a SIR study conducted at five colleges. Additional studies
have verified that teachers who want to, can use ratings and comments from
students to make positive changes in instruction. In personnel decisions,
student evaluations have increasingly been added to other sources of
information to judge teaching effectiveness more comprehensively. (SIR
Report #2)
• A more recent study indicated that SIR results were related to teaching
portfolio evaluations by peers, and that together these two sources can
provide a comprehensive review of teaching performance. (SIR Report #6)
For the original SIR, three criteria were used to select items:
(1) items that experts believed were most important to teaching and that had
been
included in previous research,
(2) items that reflected areas of instruction that students were capable of
observing and judging, and
(3) items that faculty members believed would be most useful for instructional
improvement.
A factor analysis of the items included in the original instrument resulted in six item
clusters, or factors: Sudent/Teacher Relationship, Course Objectives and Organization, Lectures
4
(Communi-cation), Course Difficulty and Workload, Course Examinations, and Reading
Assignments (SIR Report #3). These six factors, with some modifications in titles and item
wording, have been used to summarize SIR responses for users over the past 25 years.
Construct Validity
Research supports the view that teaching is indeed multidimensional, and that it is a
complex activity in which teachers may be effective or ineffective on different aspects or
dimensions. After devising a system for categorizing items, Feldman (1976) came up with a list of
21 categories that had been included in studies in which students identified characteristics of
superior college teachers. In addition to the six identified by the original SIR analysis, student self-
ratings of learning, teacher enthusiasm, and teacher personality characteristics were identified by
Feldman (other dimensions were simply more specific, such as “clarity” and “elocutionary skills”
instead of “communication”). The dimensions identified by Feldman were reviewed for items that
might be included for the revision of SIR. Also reviewed were characteristics of effective teaching
identified by faculty members, administrators, and alumni in various studies (Centra, Froh, Gray, &
Lambert, 1987). These qualities, not surprisingly, overlapped considerably those identified by
students. The qualities include:
• Good organization of subject matter and course
• Effective communication
• Knowledge of and enthusiasm for the subject matter and teaching
• Positive attitude toward students
• Fairness in examinations and grading
• Flexibility in approaches to teaching
5
• Appropriate student learning outcomes (Centra et al.)
The above qualities were discussed in a monograph written and published with a committee
at Syracuse University (Centra et al. 1987). Designed to help deans, department chairs, and faculty
members evaluate teaching performance, the first six qualities emphasize appropriate teaching
procedures and the seventh points out the importance of appropriate and purposeful student
learning.
These seven categories, as the following discussion makes clear, were critical in the development of
SIR II.
6
INSTRUMENT DESIGN
SIR II Teaching Dimensions
The extensive research on the dimensions of effective teaching and the large number of
factor analyses of student ratings that duplicated several of these dimensions were the foundation
for the development of SIR II. Five of these dimensions which are included in the original SIR are
also used in SIR II: Course Organization and Planning; Communication; Faculty/Student
Interaction; Assignments, Exams, and Grading; and Course Difficulty, Workload, and Pace. While
the dimensions were the same, new items were added to reflect a broader or more current
interpretation of each scale. For example, the Communication dimension now includes the
instructor’s command of spoken English and his/her enthusiasm for the course material (as opposed
to a general personal enthusiasm which some rating forms include). The instructor's respect for
students became part of Faculty/Student Interaction, and an item on whether students were told how
they would be graded became part of Assignments, Exams, and Grading. All together, eight new
items were added to four of the five dimensions, while Course Difficulty, Workload and Pace
remained the same.
The major change in the first draft of the SIR II was the addition of three new areas:
Methods of Instruction (originally called Supplementary Instructional Methods), Student Effort and
Involvement, and Course Outcomes. Each of these dimensions addressed recent emphases in
college instruction and learning. Under Methods of Instruction, for example, a number of active
learning practices were listed. Active learning, as opposed to passive lecturing, has long been
known to facilitate student learning. For this reason, college teachers have been urged to use active
7
instructional methods in their courses (see, for example, the 1984 Involvement in Learning report
[Study Group, 1984]; also Bonwell & Eison, 1991).
Similarly, an educators’ conference at the Wingspread Conference Center in Racine,
Wisconsin, in 1986 resulted in the “Inventories of Good Practice in Undergraduate Education,”
which emphasized the need for active learning, as well as student time on task and other practices
(Chickering & Gamson, 1987). Time on task means that students need to spend the time and make
the commitment necessary to prepare for class and assignments. To underscore this need, the new
SIR included three items asking about student effort and involvement in the course.
The third area added, Course Outcomes, reflects still another emphasis in evaluation during
the past decade or so. Numerous higher education reports, including the two mentioned above,
discussed the need to focus on student learning. The various accrediting agencies have also called
on institutions to measure student learning and other outcomes of instruction in their self studies.
Items added to the SIR II assessed students’ ratings of their learning, their independent thinking,
and other broad outcomes that students attributed to the particular course they were evaluating.
Questionnaire Format
Because of the numerous factor analytic studies that previously established the various
dimensions of instruction, the items for each dimension were grouped together in the new SIR
rather than being scattered throughout the questionnaire. Doing this made the form easier and
quicker to complete. Factor analysis and other internal consistency analyses were later used to
validate the
dimensions.
8
Response Format
The original SIR primarily used a four-point agree-disagree scale, (except for a smaller set
of items, which used a five-point scale with ratings of poor to excellent). Data clearly indicated that
the four-point scale provided less discrimination between instructors than might be useful. That is,
even if students preferred to make finer distinctions in their ratings of instruction, they did not have
the opportunity because of the limited scale choices (strongly agree, agree, disagree, strongly
disagree). Therefore, two five-point scales were designed and pretested for the new SIR. Two
different response formats were tested at the experimental stage. Because users had become
accustomed to the agree-disagree format, this response option was repeated for one form, adding a
midpoint to make it a five-point scale. The midpoint allowed students a “neither” option, meaning
that they neither agreed nor disagreed with the statement.
A second form included the same items related to the course, but students were asked to
respond with an effectiveness rating, in particular the effectiveness in contributing “to your
learning in this course.” Specifically, the following options were presented:
In (1) Ineffective
SIn (2) Somewhat ineffective
SE (3) Somewhat effective
E (4) Effective
VE (5) Very effective
NA (6) Not applicable, not used, or don’t know
9
This second form also included different scale responses for the “course outcomes” and
“student effort and involvement” areas. In an effort to distribute responses, the scale asked students
to compare the course being rated to other courses using the following options:
ML (1) Much less than most courses
L (2) Less than most courses
S (3) About the same as others
MT (4) More than most courses
MM (5) Much more than most courses
NA (6) Not applicable, not used, or don’t know
10
PRETESTING SIR II
The two forms of the new SIR that appear in Appendix B were called the Course
Instructional Report (CIR), a name briefly considered as an alternative to SIR II. Form A of CIR
consisted of the five-point agree/disagree scale, while Form B included the effectiveness and other
scales. These forms were pretested in 50 classes at 10 two- and four-year colleges during spring
1994. Approximately 1,200 students participated in the pretest. Faculty volunteers at the
institutions administered the two forms to random halves of their classes, which allowed for
comparison of items and scale characteristics while at the same time controlling for the effects from
students, instructors, or courses. Thus, 50 paired comparisons with approximately the same number
of students responding to each form in each class could be analyzed to determine which was the
better form. About 600 students responded to Form A and a similar number responded to Form B.
Each instructor received a numerical summary of the students’ responses and were asked to provide
reactions to each of the forms. The comments, which came from about a third of the teachers, were
split between those who liked the agree/disagree format because they were accustomed to it, and
those who liked the rationale and information received from the effectiveness response format.
PreTest Item and Scale Statistics
Tables 1 and 2 include traditional items and scale analyses data for forms A and B of the
“Course Instructional Report.” For Form A (agree/disagree format), the scale mean for Course
Organization and Planning was 4.40 and the Coefficient Alpha (the extent to which the items are
consistent, or intercorrelated, for the scale) was .79. By comparison, the same scale on Form B had
a mean of 4.29 and a Coefficient Alpha of .82. On the five-point scale for both forms, the lower
11
mean of 4.29 is preferable because it reflects a less positive bias and therefore a better spread of
scores above the mean. The slightly higher Coefficient Alpha on Form B indicates better
consistency among the item responses than on Form A. On both forms the usefulness of the course
syllabus or outline had the lowest correlation with total scale score and the highest percentage of
not applicable (NA) responses, suggesting that course syllabi were either not available for the
course or that some students did not know what a syllabus was. Because another item, “the
instructor’s explanation of course requirements,” overlapped the syllabus item and was apparently
clearer to students, the syllabus item was later dropped. Added to the Course Organization and
Planning scale for SIR II was “the instructor’s way of summarizing important points in class,” an
item that the factor analysis for Form B placed in that scale.
The Communication scale on the pretest included seven items on Form A and the same
seven items plus an item on “the instructor’s willingness to listen to student questions and
opinions” on Form B. The scale means were identical (4.38), but the Coefficient Alpha was
slightly higher on Form B (.89 vs .85 on Form A). On Form B, the two items with the highest item
to scale correlations were “the instructor’s ability to make clear and understandable presentations”
(.73), and “the instructor’s use of relevant examples or illustrations to clarify course material” (.74).
Both were highly intercorrelated with other items and would seem to be the crux of the
Communications scale. While the pace of the course item was consistent with other items in this
scale, it could logically be placed with the difficulty and workload items, as it ultimately was for
SIR II. By also changing the wording, instructors could learn whether the pace was fast, slow, or
“just about right,” thereby making the item more useful for instructional improvement.
12
The Faculty/Student Interaction scale included four identical items in Form A and Form B
plus a fifth item dealing with the students perception of the instructor's receptivity to their questions
or opinions that was on Form A. After some rewording, this item was included in SIR II. The
mean scale score on Form B was lower than on Form A (4.28 vs 4.34), as were the means on three
of the four identical items. The Coefficient Alphas were identical at .81.
The fourth scale, Assignments, Exams, and Grading included seven identical items, of
which six were retained for SIR II. Omitted was the instructor’s overall fairness in grading
students, which had one of the highest mean scores and which, relative to the other items, was
much more general in its wording. Form B again had the lower mean scale score (4.09 vs 4.16) and
a higher Coefficient Alpha (.88 vs .84). The items most central to this scale, judging from the item
intercorrelations, were the exams’ coverage of important aspects of the course, and the helpfulness
of comments on assignments and exams.
Methods of Instruction (Scale E on Form B and Scale G on Form A) included a variety of
practices that an instructor may have used in the course, such as assigned group projects, case
studies, and laboratory exercises. It is not surprising that these items received the highest
percentage of “not applicable” (not used) responses. For example, 77.5 percent of the students did
not think computers had been used as aids to instruction. The item with the most use, the active
involvement of students in what they were learning, received only a 9.7 percent “not applicable”
response on Form A (slightly more on Form B). This item was later moved to the Course
Outcomes scale on SIR II, a change supported by the factor analysis. Active involvement of
students in their learning can occur across many instructional methods and, moreover, can be
viewed as an ongoing and positive outcome of instruction. Therefore it also makes sense to include
13
this item on the Outcomes scale. Although scale means and Coefficient Alphas are included in
Tables 1 and 2, Methods of Instruction does not make sense as a scale score. Rather, it is the
individual responses to the methods actually used in the course that will be most useful to the
instructor. Although students may not totally agree on the methods used, the information can be
worthwhile for the faculty member. The fact that standard deviations for items were similar to
other items on the two forms would suggest sufficient agreement among students in their responses.
The Course Outcomes scale included the same seven items on both forms, and once again
Form B had the lower mean (3.75 vs 4.22). The Coefficient Alphas for the two forms were
identical, .89. Three of the seven items were eventually either omitted or moved to other scales.
Moved to the Student Effort and Involvement Scale was the item concerning the extent to which
students thought they were challenged by the course (the low correlation with the total score, .34 on
Form B, supported this move). Omitted, in order to shorten the instrument, were the value of the
course to the student (too general), and the understanding of concepts and principles in the subject
area. Although the latter item might be useful in many courses, and thus instructors may want to
include it as an optional item, students would likely have trouble applying it to all courses. Both the
Course Outcomes and Student Effort and Involvement scales for Form B used a “compared to other
courses” response because it seemed more appropriate than an “effectiveness” response.
The Student Effort and Involvement scale included three items on both Forms A and B, one
of which was eliminated for SIR II and replaced by the “challenge” item. Eliminated was the extent
to which students were interested in learning the content when they enrolled. This item had a low
correlation with the total scale (.28 on Form B), and might better stand alone as a separate student
background item. In fact, when students are asked whether a course is a major requirement,
14
elective, or a college requirement, they are in part also reflecting their interest in taking the course.
No doubt, student interest in a course at the outset is an important influence in how they later view
instruction and their own efforts.
The final two items, Course Difficulty and Workload, were not scored on a linear scale
since the middle or “3” response on the five-point scale was most favorable. These items were
therefore omitted from the analysis. For SIR II the item on the pace of the course was added to
Course Difficulty and Workload for a three-item set; these same three items and response options
were also part of the original SIR.
The single overall evaluation item used a five-point poor to excellent scale on Form A (as
with the original SIR), and a five-point effectiveness rating scale on Form B, which had been also
used for most of the other items in the form. Having students use the same response scale for the
overall evaluation item as with other items on the form allowed them to apply the same standard.
Both overall evaluation items correlated highly with scale means on their respective forms (in the
.78 to .89 range), except for the Student Effort and Involvement scale. For this scale, the low
correlations (.56 and .48) indicated that students generally saw the quality of instruction as only
moderately related to their own effort in the course.
The traditional analyses included in Tables 1 and 2 were useful for selecting and refining
items. The generally lower mean scale scores on Form B reflect a better discrimination among
instructors than those on Form A. The Coefficient Alphas for Form B were generally higher than
for Form A. The Rasch analysis, reported later, also supports the use of the Form B response
formats.
Factor Analysis of Forms A and B
15
Tables 3 and 4 include the results of the factor analyses of Forms A and B. For both
analyses a varimax rotation was used to help clarify factors. The items for each factor are listed in
order of their loading on the factor, and items that loaded on more than one factor are given on the
same line along with the factor number in parentheses. For Form A, five of the seven factors reflect
the categories of items included in the questionnaire: Course Organization and Planning (Factor 1);
Faculty/Student Interaction (Factor 2); Course Outcomes (Factor 3); Course Workload, Difficulty,
and Pace (Factor 4); and Student Effort and Involvement (Factor 6). Two of the factors tap
different categories: three items appear to reflect Individualized Learning such as labs and
computers (Factor 5), and two items dealing with Course Challenge (Factor 7). For Form B (Table
4), the first factor seemed to include many items related to learning and grading, and was termed
Student Learning. The second factor, Preparation/Responsiveness, included 10 items dealing with
the instructors’ preparation and presentation of subject matter as well as their responsiveness or
interaction with students. Three of the factors in Form B were similar to factors in Form A:
Student Effort (Factor 3); Individualized Learning (Factor 5); and Course Difficulty and Workload
(Factor 7). The fourth factor, Collaborative Learning, included those items from the Methods of
Instruction set related to group learning. Factor 6, Clarity, included five items that largely reflected
ratings of the instructor’s ability to clarify course material.
While there is some similarity in the factor structure for the two forms, there are also
significant differences caused by the different response formats. For example, the Student Effort,
Individualized Learning, and Course Difficulty and Workload factors were part of both factor
structures, while the other four factors differed. Both factor analyses were also used in eliminating
or changing items from the pretest set.
16
Rasch Analysis of Forms A and B
Rasch item parameter estimation was a useful supplement to the traditional item and scale
analyses presented thus far with Forms A and B. In particular, the Rasch analysis (1) supported the
choice of Form B (effectiveness response scale) over Form A (agree-disagree response scale), (2)
indicated a change in the effectiveness response scale, and (3) supported the selection or
elimination of items suggested by the traditional analyses.
Rasch Analysis compared the response categories of Form A and Form B to determine
which provided better variation in student responses (variance sources include items and students).
Rasch step calibrations indicate how much additional information is provided by moving from one
response category to the next along the continuum. If there is very little difference in the
calibrations, then very little additional information is being provided by the categories. This
became evident in Form B when the analysis indicated that the distinction between “Somewhat
Ineffective” and “Somewhat Effective” was so slight that very little additional information was
provided by the two responses. In other words, students appeared to be confusing the two
responses, probably because of the use of the word “somewhat” for both categories. For SIR II the
responses were changed to “Moderately Effective” and “Somewhat Ineffective” in order to clarify
the distinctions and thus obtain better variations in student responses.
Item response analysis of Forms A and B used a one-parameter logistic model. Based upon
the results of the analysis, the following conclusions were drawn for items in each scale:
(1) Organization and Planning
� Eliminate syllabus item on both forms.
17
� Change wording on “extent that course requirements and goals
were made clear by the instructor.”
(2) Communication
� The pace item (#10 on B and #9 on A) should be moved to
another scale or eliminated.
(3) Faculty/Student Interaction
� The first item (instructor helpfulness) is somewhat redundant with
the “respect for students” item. (“Respect for students” item
was changed.)
(4) Assignments, Exams, Grading
� Quality of texts item (#23) does not relate to other items.
� Exam effectiveness item (22) seems redundant (wording was
changed on both items, to “The exams’ coverage of important
aspects of the course,” and “The overall quality of the
textbooks”).
(5) Course Outcomes Scale (E on Form A and F on Form B)
� The first item (on “challenge”) doesn’t perform like others. (It
was moved to the effort scale).
� Items 2 and 6, “amount learned” and “gaining understanding of
concepts,” seem redundant (item 6 was dropped).
(6) Student Effort and Involvement Scale
18
� The third question (student interest) doesn't fit with others.
Coefficient Alpha analysis shows the same. (This item was
moved to the Course Outcomes scale.)
19
SIR II: PILOT TESTING
The revised Student Instructional Report (SIR II) is included as Appendix C. The 40 item
form includes eight categories of items and an Overall Evaluation item. Four student background
questions, which instructors can use to interpret results, and which will be used in future research
with SIR II, are also listed. These are (1) course curriculum status (required in major, elective,
etc.); (2) student class level; (3) instructor's English ability; (4) student gender. Ten supplemental
questions that may be added by the instructor or the college to be machine scored are also part of
SIR II, as they were in the original SIR. Finally, students are invited to provide their own additional
comments about the course or instruction and to submit these to the teacher. Because of the
importance of these comments for the improvement of instruction, a separate open-ended form with
broad questions tied to each SIR II scale was developed (Appendix D).
SIR II was pilot tested at a variety of colleges during Spring semester 1995 through Spring
1996. While one purpose of the pilot testing was to build a comparison data pool to help interpret
SIR II responses, an equally important purpose was to conduct further reliability and validity
studies. A sample of classes and students were selected from the pilot data to conduct these studies,
reported next.
Factor Analysis of SIR II
Table 5 includes the results of the factor analysis of SIR II. Approximately 1,200 classes
were used for this analysis, with the unit of analysis being each class. The major purpose was to
determine the extent to which the factors duplicated the a priori scales in SIR II. As Table 5
indicates, the duplication was perfect; all of the predetermined scale items group together on the
20
expected factors. For example, the first factor, Faculty/Student Interaction, contains the same five
items that comprise the SIR II scale. All six factors matched the scales in SIR II with only a slight
reordering of items based on the factor loadings. Two sets of items, Instructional Methods and
Course Difficulty, Workload, and Pace were not factor analyzed because the items were not
intended to be interpreted as a single scale score; each item in these categories should be interpreted
by itself.
The first six factors of the principal axis factor analysis accounted for 88 percent of the
variance. The equamax rotation equalizes, to some extent, the variance among the factors selected.
The scree plot of eigenvalues indicated that six factors provided the best solution, and each of these
six factors accounted for between 4.70 and 3.88 percent of the variance.
The factor loadings for the overall evaluation item (#40) on each scale ranged between .31
and .49. The highest loadings were on Course Organization and Planning (.49) and Course
Outcomes (.42), suggesting that the items in these scales were most highly related to the students'
ratings of the general effectiveness of the course in promoting learning. Not surprisingly, the
lowest loading for the overall evaluation item was on the Student Effort and Involvement scale
(.31), suggesting that students perceived their own effort as being somewhat more separate from
their rating of the course as a whole.
Reliability of SIR II
Three kinds of reliability analyses were conducted with the items and scales in SIR. The
first was a Coefficient Alpha analysis of the scales to determine the extent to which the items in
each scale intercorrelate, or hang together. As the values in Table 6 indicate, the Coefficient Alphas
were uniformly high, ranging from .89 to .98. Thus the items within each of the scales are
21
consistently measuring a single dimension. Two of the items had lower, but still acceptable,
correlations with the total scale score. Item 7, “The Instructor’s command of spoken English,”
correlated .67 with the Communication Scale Score, and “The overall quality of textbooks”
correlated .53 with the Assignments, Exams, and Grading Scale.
The second type of reliability was at the item level, an intraclass correlation which estimates
the extent of agreement among students on each item. Separate analyses were run for class sizes of
10, 15, 20, and 25. The coefficients, presented in Table 7, typically tend to increase with class size.
For item #1 in the Course Organization and Planning Scale, for example, the correlation coefficient
was .59 for a class size of 10 (N = 58 classes analyzed), .78 for a class size of 15 (N = 51), .89 for a
class size of 20 (N = 34), and .92 for a class size of 25 (N = 24). Intraclass correlations at or above
.90 are generally considered very good, and above .80 are considered adequate. For class size of 10
or fewer, the item reliabilities reported in Table 7 are relatively low; thus item means at these class
sizes will typically have a relatively low agreement level among students. For class size of at least
15, 26 of the 30 items analyzed were close to or above .80. The overall evaluation item had a
reliability of .85 for a class size of 15 and .90 for a class size of 25. The least reliable items were in
the Student Effort and Involvement scale, which would be expected to contain more individual
variation in responses. The reliability for “I was prepared for each class” (item #35), for example,
was under .70. On the other hand, in this same scale the extent to which students were challenged
by the course (item #36) had good reliability because it was more course rather than individual
student related.
The third kind of reliability, referred to as test-retest, measures the extent to which
responses for each class are stable over short periods of time. Slight fluctuations (i.e. high
22
correlations) would indicate that students’ responses are not subject to daily occurrences or moods,
and that they likely do represent students’ ratings of the items and broader dimensions of the course,
such as Course Organization and Planning. For this analysis, 42 classes with a total of 724 students
at a small liberal arts college were studied. The two administrations of SIR II occurred
approximately two weeks apart. Table 8 includes the Pearson product-moment correlations for
each of 30 items and six scale means in SIR II. The item correlations are generally above .80, with
nine at or above .90 and only three just below .80. Scale correlations are even higher: five of the
six are above .90, with one at .88. These uniformly high correlations clearly indicate that the mean
student responses for these courses did not vary over the two weeks. In other words, the relative
rankings of the course ratings did not change because of inconsistency or fluctuations in students’
responses over the time period.
23
NOTE ON VALIDITY
A common definition of validity is the degree to which a test or instrument measures what it
is supposed to measure. The Standards for Educational and Psychological Testing (American
Educational Research Association, American Psychological Association, and National Council on
Measurement in Education, 1985, p. 9) further defines it as the “appropriateness, meaningfulness,
and usefulness of the specific inferences made from test scores” (or similar scores from an
instrument such as SIR II). A test or instrument is valid for a particular purpose or a particular
group. More specifically, it is the inferences made from an instrument that needs to be validated.
For SIR II and other student rating instruments, validity refers to the inferences drawn about
instructional effectiveness based on the students’ responses.
Several validity studies were completed with the original SIR which would have relevance
for SIR II, particularly for the scales that overlapped both instruments. For example, in SIR Report
Number 2 the usefulness of the SIR ratings for improving teaching was demonstrated (see the
“Research Background” section of this report). Other studies completed are also relevant to a
discussion of SIR II validity. In addition, during the development of SIR II validity was addressed
several times, although more studies are needed. Perhaps this can best be put in context by
considering the different types of validity.
There are three types: content, criterion, and construct validity. Content validity is a
subjective estimate of the extent that the content of an instrument relates to whatever it is designed
to measure. In selecting items for SIR II, previous studies in which various constituents (teachers,
students, administrators) had identified characteristics of effective teaching were consulted. These
24
characteristics were then used as the basis for selecting items for SIR II that defined the various
dimensions of effective teaching (e.g. course organization, interaction with students, flexibility in
approaches to teaching, appropriate learning outcomes). Conferences and publications during
recent years had also emphasized the importance of some of these criteria for effective teaching, in
particular emphasizing active learning and learning outcomes. Thus the items and scales of SIR
were designed to reflect the content of what many sources define as effective teaching.
Criterion validity is the extent that the scores from a test or instrument are related to one or
more outcome criteria. One such outcome criteria is student learning in a course. Instructors who
receive higher ratings from students should also be more successful in achieving learning outcomes.
With the original SIR, a study of several multiple section courses did demonstrate that learning
gains were related to the overall evaluation of the instructor as well as to some of the scale scores
(e.g. course organization, faculty-student interaction). That study, published as SIR Report #4,
would most likely apply to SIR II as well, particularly for the scales that are in common. Because
new scales have been added to SIR II, it remains for future studies to investigate the relationship
between those scales and appropriate criteria.
Construct validity is a more complicated concept: it evaluates the degree to which the
scores from an instrument correspond to other measures of the underlying theoretical trait. A
common practice has been to use factor analysis as one approach to studying construct validity as
was done with both versions of SIR. Having designed a set of scales and items to reflect effective
teaching, factor analysis was used to determine whether the subsequent structure corresponded to
the a priori scales. As this report demonstrated, the factors produced closely duplicated the scales
designed. However, the scales did correlate significantly with each other, as has been typical of
25
other student rating forms, and this may reflect a response set by students. That is, students have a
tendency to rate good instructors as effective on all items and scales rather than differentiating their
performances. Nevertheless, students do differ enough in their responses to make the resulting item
and scale scores useful for formative and summative purposes.
Another type of evidence used for construct validity is the correlation of scores with other
external variables. Comparing these correlations with what should or should not be found
demonstrates a lack of bias and helps establish the existence of the theoretical construct (i.e.
effective teaching). With the original SIR, class size, subject area of the course, course type,
student gender, expected student grade, class level, and other variables were studied for their
possible relationship to student ratings (see SIR Report #4). The correlations were relatively small
except for those relationships that would be expected to be strong. For example, students who
expected to receive a higher grade in a course rated instruction higher, but this likely reflects a
desirable relationship: students who learn more and thus receive higher grades also rated
instruction as effective for them.
Future studies with SIR II will continue to examine the relationship between scale scores
and other variables. One study planned will examine student/teacher gender interaction within
separate disciplines. It will, for example, investigate such questions as whether male students rate
female teachers lower than males in such male dominated fields as engineering or the physical
sciences; or whether female students rate female teachers higher than males in the nursing field.
Another study will investigate the learning outcomes scale for SIR II. How are other scales
in SIR II correlated with what students say they have learned? How do external variables modify
this relationship?
26
While future validity students are planned with SIR II, it is nonetheless true that the studies
completed with the original SIR and with SIR II thus far support its construction and current use in
evaluating effective teaching.
27
References American Educational Research Association, American
Psychological Association, and National Council on Measurement in Education. The Standards for Educational and Psychological Testing Washington, DC: American Psychological Association, 1985.
Bonwell, C. C., and Eison, J. A. Active Learning: Creating Excitement in the Classroom. ASHE-ERIC Higher Education Report, no. 1. Washington, D.C.: School of Education and Human Development, George Washington University, 1991. Centra, J. A. The Student Instructional Report: Its Development and Uses. Student Instructional Report, no. 1. Princeton, NJ: Educational Testing Service, 1972. Centra, J. A. Two Studies on the Utility of Student Ratings for Instructional Improvement. Student Instructional Report no. 2, Princeton, NJ: Educational Testing Service, 1972b. Centra, J. A. Item Reliabilities, the Factor Structure, Comparison with Alumni Ratings. Student Instructional Report, no. 3. Princeton, NJ: Educational Testing Service, 1973. Centra, J. A. Two Studies on the Validity of the Student Instructional Report. Student Instructional Report no. 4, Princeton, NJ: Educational Testing Service, 1976. Centra, J. A. The Use of the Teaching Portfolio and the Student Instructional Report (SIR) for Summative Evaluation. Student Instructional Report no. 6. Princeton, NJ: Educational Testing Service, 1992 Centra, J. A. Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. Jossey-Bass Publishers, San Francisco, 1993 Centra, J. A., Froh, R. C., Gray, P. J., and Lambert, L. M. A Guide to Evaluation Teaching for Promotion and
Tenure. Action, Mass.: Copley Publishing Group, 1987. Chickering, A. W., and Gamson, Z. "Seven Principles for Good Practice in Undergraduate Education." American Association for Higher Education Bulletin, 1987, 39, 3-7. Feldman, K. A. "The Superior College Teacher from the Student's View." Research in Higher Education, 1976, 5, 243-288. Feldman, K. A. "Class Size and College Students' Evaluations of Teachers and Courses: A Closer Look." Research in Higher Education, 1976, 21, 45-115.
Study Group on the Conditions of Excellence in American Higher Education. Involvement in Learning: Realizing the Potential of American Higher Education. Washington, DC: National Institute of Education/U.S. Dept. of Education, 1984.
TABLE 1 Course Instructional Report Form A Item and Scale Analysis Corr Alpha with (Item Scale and Items %NA Mean S.D. Total delt'd) (Items renumbered consecutively) A. Course Organization/Planning 4.40 .56 .79
1. Syllabus/Outline 2. Course requirements 3. Instr. preparation 4. Command of subject 5. Use of class time
1.8 .2 0.0 .3 .3
4.19 4.38 4.49 4.56 4.36
.85 .70 .71 .67 .82
.40 .68 .63 .58 .61
.81 .72 .73 .75 .74
Highest Intercorrelations 4 with 3 = .61 5 with 3 = .58 2 with 1 = .55 2 with 3 = .50 Subscale with Overall Item = .79 B. Communication 4.38 .56 .85
6. Clear Lectures 7. Command of English 8. Relevant examples 9. Pace of course 10. Challenging questions 11. Enthusiasm 12. Summarized pts
.3 1.5 1.5 .2 1.2 .2 .8
4.34 4.61 4.54 4.07 4.31 4.49 4.38
.80 .60 .66 .96 .77 .73 .77
.67 .56 .66 .53 .56 .61 .68
.81 .83 .82 .84 .83 .82 .81
Highest Intercorrelations 6 with 12 = .59 7 with 8 = .56 6 with 8 = .54 8 with 12 = .54 Subscale with Overall Item = .84
Course Instructional Report Form A Item and Scale Analysis C. D.
Scale and Items Faculty/Student Interaction 13. Active helpful 14. Respectful to students 15. Concern with progress 16. Available for help 17. Feel free to question Highest Intercorrelations 13 with 15 = .58 13 with 17 = .52 13 with 16 - .52 Subscale with Overall Item =.78 Assignments, Exams, Grading 18. Fair grader 19. Exams clear 20. Inform students graded 21. Helpful comments 22. Important concepts, exams 23. Texts, readings 24. Helpful assignments Higher Interactions 19 with 22 = .60 19 with 18 = .56 18 with 20 = .55 18 with 22 = .53 21 with 22 = .50 Subscale with Overall Item =.86
%NA .2 .2 .3 2.5 .5 1.0 4.1 .7 3.8 3.8 6.4 8.3
Mean 4.34 4.46 4.29 4.29 4.31 4.36 4.16 4.29 3.88 4.39 4.02 4.19 4.08 4.24
S.D. .60 .68 .90 .80 .74 .80 .63 .76 1.04 .70 1.01 .81 .94 .81
Corr with Total .69 .52 .64 .58 .60 .63 .67 .61 .60 .67 .44 .57
Alpha (Item delt'd) .81 .75 .81 .76 .78 .78 .84 .81 .80 .81 .81 .80 .84 .82
Course Instructional Report Form A Item and Scale Analysis E. F.
Scale and Items Course Outcomes 25. I was challenged 26. I learned a great deal 27. Inst. accom objectives 28. Course stimulated my interest 29. Course think critically 30. Gained understanding of concepts 31. Value of course to me Highest intercorrelations 30 with 31 = .73 30 with 28 = .64 30 with 26 = .68 30 with 29 = .66 30 with 27 = .63 31 with 28 = .71 31 with 26 = .62 31 with 29 = .58 31 with 27 = .56 28 with 29 = .62 26 with 27 = .61 26 with 28 = .58 Subscale with Overall item =.85 Student Effort/Involvement 32. I studied and put effort 33. I was prepared 34. I was interested at outset Highest Intercorrelations 32 with 33 = .55 Subscale with Overall Item =.56
%NA .8 .3 .5 1.2 .5 .7 .7 .7 .7 1.7
Mean 4.22 4.15 4.28 4.37 4.09 4.13 4.29 4.24 4.03 4.06 3.97 4.06
S.D. .64 .85 .82 .68 .96 .83 .72 .89 .72 .93 .86 .99
Corr With Total .45 .75 .67 .71 .70 .79 .77 .52 .55 .38
Alpha (Item delt'd) .89 .90 .87 .88 .87 .87 .87 .86 .67 .52 .49 .71
Course Instruction Report Form A Item and Scale Analysis G.
Scale and Items Methods of Instruction 35. Problems for small groups 36. Projects for students 37. Course actively inv. students 38. Case studies, simul., etc. 39. Lab exercises 40. Term papers/projects 41. Computers as aids 42. Course journals/logs Highest intercorrelations 39 with 41 = .66 37 with 38 = .64 35 with 38 = .55 35 with 37 = .54 Subscale with Overall Item =.71
%NA 39.1 52.1 9.7 49.0 66.8 36.1 69.1 68.8
Mean 4.12 4.18 4.03 4.19 4.18 4.08 4.11 3.91 3.79
S.D. .71 .76 .83 .82 .84 .91 .86 1.00 1.09
Corr with Total .66 .65 .65 .63 .70 .58 .65 .59
Alpha (Item delt'd) .87 .86 .86 .86 .86 .85 .86 .86 .87
TABLE 2
Course Instructional Report
Form B
Item and Scale Analysis A. B.
Scale and Item (Items renumbered consecutively) Course Organ./Planning 1. Syllabus/Outline 2. Course requirements 3. Instr. preparation 4. Command of subject 5. Use of class time Highest Intercorrelations 3 with 5 = .71 3 with 4 = .62 4 with 5 = .57 1 with 2 = .56 Subscale with Overall Item =.85 Communication 6. Clear lectures 7. Command of English 8. Willingness to listen 9. Relevant examples 10. Pace of course 11. Use of challenging questions 12. Enthusiasm 13. Summarized points Highest Intercorrelations 8 with 9 = .63 6 with 13 = .61 6 with 9 = .60 6 with 10 = .57 Subselected w/Overall Item =.89
%NA 3.7 .8 .2 .3 .8 1.0 1.7 .8 .7 .7 2.7 1.0 .7
Mean 4.29 3.96 4.19 4.40 4.56 4.31 4.38 4.32 4.63 4.53 4.45 4.11 4.23 4.50 4.25
S.D. .65 .96 .84 .79 .74 .89 .61 .86 .64 .76 .77 .87 .86 .78 .89
Corr with Total .45 .66 .70 .58 .69 .73 .57 .64 .74 .67 .69 .67 .69
Alpha (item delt'd) .82 .84 .77 .76 .79 .76 .89 .87 .89 .88 .87 .88 .88 .88 .88
Course Instructional Report
Form B
Item and Scale Analysis C. D.
Scale and Items Faculty/Student Interaction 14. Instruction helpfulness 15. Respectful to students 16. Concern with progress 17. Available for help Highest Intercorrelations 14 with 16 = .59 14 with 15 = .58 16 with 17 = .57 Subscale with Overall item =.79 Assignments, Exams Grading 18. Fair grader 19. Exams clear 20. Inform student graded 21. Helpful comments 22. Imp't concepts, exams 23. Texts, readings 24. Helpful assignments Highest Intercorrelations 19 with 22 = .71 21 with 22 = .66 19 with 21 = .60 20 with 21 = .59 18 with 21 = .59 18 with 22 = .59 Subscale with Overall Item =.85
%NA .3 .8 .7 7.9 1.2 4.9 1.7 8.8 5.2 7.3 8.1
Mean 4.28 4.34 4.29 4.23 4.23 4.09 4.25 3.89 4.26 3.98 4.08 4.07 4.17
S.D. .68 .76 .95 .86 .81 .72 .86 1.04 .84 1.03 .97 .93 .88
Corr with Total .69 .58 .69 .56 .68 .72 .65 .74 .77 .51 .61
Alpha (Item delt'd) .81 .73 .79 .73 .79 .88 .86 .86 .87 .85 .85 .88 .87
Course Instructional Report
Form B
Item and Scale Analysis E. F.
Scale and Items Methods of Instruction 25. Problems for small groups 26. Projects for students 27. Case studies, simul., etc. 28. Course journals/logs 29. Lab exercises 30. Term papers/projects 31. Computers as aids Highest Intercorrelations 27 with 28 = .71 28 with 29 = .78 Subscale with Overall Item =.71 Course Outcomes 32. I was challenged 33. I learned great deal 34. Instr. accomplished objectives 35. Course stimulated my interest 36. Course think critically 37. Gained understing of concepts 38. Value of course to me Highest Intercorrelations 35 with 38 = .76 33 with 38 = .72 37 with 38 = .72 35 with 37 = .69 36 with 37 = .68 33 with 37 = .67 33 with 35 = .67 35 with 36 = .67 36 with 38 = .66 Subscale with Overall Item =.80
%NA 31.4 61.0 57.1 77.5 68.8 36.3 77.5 3.4 1.5 2.2 2.5 3.2 1.7 1.4
Mean 4.09 4.16 3.97 4.19 3.84 4.09 4.13 3.95 3.75 3.59 3.75 3.65 3.77 3.84 3.81 3.89
S.D. .75 .77 .87 .85 1.08 1.03 .88 1.11 .75 .95 .97 .85 1.06 .92 .92 1.02
Corr with Total .56 .64 .76 .95 .72 .66 .59 .34 .79 .67 .76 .76 .78 .79
Alpha (Item delt'd) .89 .89 .88 .87 .84 .87 .88 .89 .89 .92 .87 .88 .87 .87 .87 .87
Course Instructional Report
Form B
Item and Scale Analysis G.
Scale and Items Student Effort/Involvement 39. I studied and put effort 40. I was prepared 41. I was interested at outset Highest Intercorrelations 39 with 40 = .71 Subscale with Overall Item =.48
%NA 1.2 1.4 1.4
Mean 3.53 3.62 3.39 3.57
S.D. .73 .95 .87 .99
Corr with Total .58 .62 .28
Alpha (Item delt'd) .66 .43 .38 .83
TABLE 3 Pre-Test Course Instructional Report Form A Factor Analysis (Varimax Rotation) Factor 1: Course Organization & Planning Factor Item # and Item (items renumbered consecutively) Loading 4. 3. 7. 2. 5. 8. 10. 18.
Instructor's command of subject matter Instructor was well prepared for each class Instructor's command of spoken English Instructor made course requirements and goals clear Instructor used class time well Instructor used relevant examples and illustrations Instructor raised challenging questions or problems Instructor was fair in evaluating and grading
46(4) 44(4)
83 77 76 71 71 71 56 50
Factor 2: Faculty/Student Interaction 15. 24. 13. 21. 14. 17. 23. 19. 22. 27. 16.
Instructor concerned with student progress Assignments helpful in understanding material Instructor actively helpful and responsive to students Instructor made helpful comments on exams/assignments Instructor was respectful to students I felt free to ask questions or express opinions The texts and supplementary readings were useful Exam questions were clear Exam emphasized important concepts Instructor accomplished course objectives Instructor readily available for help
55(3) 46(3) 52(1)
82 79 73 67 65 64 63 55 58 55 55
Factor 3: Course Outcomes 28. 31. 30. 37. 29. 38. 26. 22. 6.
The course stimulated my interest in subject The course was valuable to me I gained understanding of concepts/ principles Course actively involved students in learning Course helped me to think critically/ independently Case studies, simulations, etc. effective I learned a great deal Exams emphasized important concepts Instructor presented clear lectures
46(4) 44(5) 45(1) 58(2)
85 84 79 66 64 53 48 55 47
Factor 4: Course Workload, Difficulty, Pace 43. 44. 36. 40. 9. 20. 12.
Workload about right Level of difficulty about right Assigned projects effective I learned from Term papers/Projects Pace was about right Instructor informed student of grading Instructor emphasized or summarized
43(1) 48(1),43(2) 46(1)
84 62 58 59 56 55 51
Factor 5: Individualized Learning 39. 41. 42.
Lab exercises helpful Computers as aids to instruction Course journals or logs effective
90 85 75
Factor 6: Student Effort and Involvement 33. 34. 32.
I was prepared for each class I was interested in learning content I studied and put effort into course
74 71 69
Factor 7: Course Challenge 25. 35.
I was challenged by this course Thoughtful problems or questions used
41(3)
81 50
TABLE 4
Pre-Test
Course Instructional Report
Form B
Factor Analysis (Varimax Rotation) Factor 1: Student Learning Factor Item # and Item (items renumbered consecutively) Loading 33. 22. 18. 23. 1. 35. 6. 24. 19. 16. 20.
Rating of how much I learned Effectiveness of exams for important concepts Fairness in evaluation students Overall quality of texts/readings Usefulness or syllabus/outline Extent to which my interest increased Instructor's ability to present clear lecture Helpfulness of assignments Clarity of exam questions Instructor's concern for student progress Information given to students on grading
50(2) 42(5) 46(2) 59(2) 66(6)
75 75 72 70 69 66 63 61 58 59 53
Factor 2: Preparation/Responsiveness 3. 4. 5. 13. 15. 12. 14. 8. 16. 17.
Instructor's preparation for class Instructor's command of subject Instructor's use of class time Instructor emphasized/summarized Instructor's respect for students Instructor's enthusiasm Instructor's helpfulness/responsiveness Instructor's willingness to listen to students Instructor's concern for student progress Instructor's availability for extra help
46(1) 50(4) 46(4) 59(1)
82 81 81 79 69 67 65 60 59 56
Factor 3: Student Effort 40. 39. 36. 37. 38. 32. 34. 41.
Student preparation for class Student effort and studying Helped student think critically/independently Student gained understanding of concepts Overall value of course to student, Student challenged by course Course objectives accomplished Increase in interest in learning
56(1) 48(1) -57(7) 49(1) 40(5)
86 85 74 70 65 64 62 58
Factor 4: Collaborative Learning 25. 26. 11. 27.
Problems or questions for small groups Projects in which students work together Instructor's use of challenging question/problems Case studies, simulations, etc.
46(1) 53(4)
77 77 72 55
Factor 5: Individualized Learning 31. 28. 29. 30. 27. 21.
Computers as aids Course journals, logs Lab exercises Term papers, projects Case studies, simulations Instructor's comments on exams, etc.
53(4) 45(1),46(2)
79 71 68 55 55 53
Factor 6: Clarity: 9. 10. 20. 7. 2.
Instructor's use of examples/illustrations Pace of course material Information given to students on grading Instructor's command of English Instructor made course requirements clear
53(1) 44(2) 45(5)
78 66 66 64 48
*Factor 7: Course Difficulty and Workload 43. 42.
Workload Level of difficulty *Mid-point is most favorable response
81 80
TABLE 5 SIR II FACTOR ANALYSIS FACTOR LOADINGS ON SIX SCALES EQUAMAX ROTATION N = 1200 classes Factor 1, Scale C Faculty/Student Interaction 4.70%
1
Item # Loadings 12. 15. 11. 13. 14. (40)
Respect for students Willingness to listen to student questions/opinions Helpfulness and responsiveness to students Concern for student progress Availability for extra help Overall evaluation
.75 .73 .73 .71 .69 (.37)
Factor 2, Scale A Organization and Planning 4.30% 2. 4. 5. 1. 3. (40)
Instructor preparation for each class Use of class time Way of summarizing or emphasizing important points Explanation of course requirements Command of the subject matter (Also .54 on Scale B, Communication) Overall evaluation
.74 .70 .56 .53 .52 (.49)
Factor 3, Scale D Assignments, Exams and Grading 4.13% 17. 18. 19. 21. 16. 20. (40)
Clarity of exam questions Exams' coverage of important aspects of course Instructor's comments on assignments and exams Helpfulness of assignments in understanding material Information given to students on how graded Overall quality of textbooks Overall evaluation
.71 .68 .61 .57 .51 .39 (.33)
Factor 4, Scale F Course Outcomes 4.11% 31. 32. 33. 30. 29. (40)
My interest in subject has increased Course helped me to think independently Course actively involved me in learning I made progress toward course objectives My learning increased Overall evaluation
.75 .68 .65 .64 .61 (.42)
1Variance explained by each factor of the total variance accounted for by the equamax rotation. The first six factors accounted for 88 per cent of the principal axis factor analysis.
Factor 5, Scale G Student Effort and Involvement 4.01% 34. 36. 35. (40)
I studied and put effort into the course I was challenged by this course I was prepared for each class Overall evaluation
.95 .85 .70 (.31)
Factor 6, Scale B Communication 3.88% 7. 6. 8. 10. 9. (40)
Instructor's command of spoken English Ability to make clear presentations Use of examples or illustrations Enthusiasm for course material Use of challenging questions or problems Overall evaluation
.64 .61 .55 .51 .47 (.38)
TABLE 6 SIR II COEFFICIENT ALPHA RELIABILITY ANALYSIS N = 1200 classes
A. 1. 2. 3. 4. 5. B. 6. 7. 8. 9. 10. C. 11. 12. 13. 14. 15. D. 16. 17. 18. 19. 20. 21. F. 29. 30. 31. 32. 33.
Scale and Item Course Organization and Planning Coefficient Alpha = .96 Explanation of course requirements Instr. preparation for class Command of subject matter Use of class time Way of summarizing/emphasizing Communication Coefficient Alpha = .94 Ability to make clear explanations Instr. command of spoken English Use of examples or illustrations Use of challenging questions/prob. Enthusiasm for course material Faculty/Student Interaction Coefficient Alpha = .98 Helpfulness/responsiveness to students Respect for students Concern for student progress Availability for extra help Willingness to listen to students Assignments, Exams and Grading Coefficient Alpha = .93 Information given to students on grading Clarity of exam questions Exams' coverage of important aspects Instr. comments on assign/exams Overall quality of textbooks Helpfulness of assignments Course Outcomes Coefficient Alpha = .97 My learning increased I made progress toward course objectives My interest in subject increased Course helped me think independently Course actively involved me in learning
Corr. with total .89 .92 .86 .87 .91 .90 .67 .91 .86 .83 .96 .94 .94 .91 .94 .78 .83 .90 .89 .53 .87 .93 .94 .92 .94 .89
Alpha (with item deleted) .95 .94 .95 .95 .95 .91 .95 .91 .91 .92 .97 .97 .97 .98 .97 .92 .91 .90 .90 .95 .91 .96 .96 .97 .96 .97
G. 34. 35. 36.
Scale and Item Student Effort and Involvement Coefficient Alpha = .89 I studied and put effort in course I was prepared for each class I was challenged by this course
Corr. with total .91 .73 .81
Alpha (with item deleted) .76 .92 .86
TABLE 7 SIR-II ITEM RELIABILITY COEFFICIENTS FOR CLASSES WITH 10, 15, 20, AND 25 RESPONDENTS1 n = 10 n = 15 n = 20 n = 25 Item No. r # r # r # r # Scale name: Course Organization and Planning A1 0.59 58 0.78 51 0.89 34 0.92 24 A2 0.69 55 0.83 54 0.85 31 0.91 24 A3 0.68 58 0.85 52 0.82 34 0.86 24 A4 0.49 57 0.76 47 0.91 36 0.91 25 A5 0.72 53 0.76 52 0.91 35 0.92 24 Scale name: Communication B6 0.73 54 0.83 53 0.91 33 0.91 21 B7 0.68 56 0.92 51 0.90 32 0.75 27 B8 0.69 56 0.78 43 0.90 32 0.90 25 B9 0.65 54 0.84 50 0.84 40 0.87 26 B10 0.78 52 0.84 54 0.92 32 0.93 20 Scale name: Faculty/Student Interaction C11 0.76 55 0.78 50 0.91 32 0.92 21 C12 0.70 57 0.75 51 0.92 34 0.91 22 C13 0.67 57 0.84 53 0.91 34 0.91 28 C14 0.72 56 0.75 60 0.87 34 0.81 17 C15 0.72 57 0.81 54 0.92 35 0.89 25 Scale name: Assignments, Exams, and Grading D16 0.63 55 0.78 54 0.90 35 0.89 21 D17 0.65 54 0.77 42 0.89 27 0.86 20 D18 0.61 54 0.83 43 0.89 27 0.85 21 D19 0.71 58 0.81 58 0.91 34 0.88 23 D20 0.68 57 0.73 56 0.80 31 0.86 22 D21 0.70 64 0.83 52 0.85 35 0.83 24
1r=intraclass correlation coefficient; # = number of classes with n students sponding to the item. re
n = 10 n = 15 n = 20 n = 25 Item No. r # r # r # r # Scale name: Course Outcomes F29 0.74 58 0.81 55 0.74 35 0.90 24 F30 0.70 56 0.79 57 0.74 36 0.87 27 F31 0.77 60 0.83 54 0.78 38 0.90 27 F32 0.71 60 0.77 53 0.77 32 0.86 26 F33 0.79 59 0.83 55 0.81 35 0.91 29 Scale name: Student Effort and Involvement G34 0.58 60 0.78 59 0.75 33 0.73 24 G35 0.45 58 0.68 61 0.61 37 0.67 26 G36 0.69 60 0.85 59 0.83 33 0.87 26 Overall Evaluation 0.73 60 0.85 53 0.84 40 0.90 25
TABLE 8 SIR-II ITEM AND SCALE TEST/RETEST RELIABILITY COEFFICIENTS N = 42 CLASSES Scale A: Course Organization and Planning Scale r = .92 Item No. A1 .76 A2 .92 A3 .81 A4 .92 A5 .88 Scale B: Communication Scale r = .94 B6 .85 B7 .86 B8 .90 B9 .82 B10 .82 Scale C: Faculty/Student Interaction Scale r = .94 C11 .92 C12 .95 C13 .86 C14 .90 C15 .87 Scale D: Assignments, Exams, and Grading Scale r = .94 D16 .80 D17 .91 D18 .92 D19 .85 D20 .89 D21 .78 Scale F: Course Outcomes Scale r = .91 F29 .87 F30 .78 F31 .87 F32 .87 F33 .93
Scale G: Student Effort and Involvement Scale r = .88 G34 .87 G35 .80 G36 .88 Overall Evaluation Item r = .90
STUDENT INSTRUCTIONAL REPORT II (SIR II)
Student Comments Section
This section gives you the opportunity to add comments about the course and the way it was taught. You may want to look at your responses to the statements within each of the categories in the questionnaire. You can expand on those responses or add additional information below. A. Course Organization and Planning (Course requirements, use of class time) B. Communication (Class presentations, instructor enthusiasm and ability to communicate) C. Faculty/Student Interaction (Instructor availability and responsiveness to students) D. Assignments, Exams, and Grading (Fairness and quality of exams and assignments; grading)
E. Supplementary Instructional Methods (Your reactions to any particular practices used in the course, such as small group discussions, group projects, labs, case studies) F. Course Outcomes (What you did or did not get out of the course) G. Student Effort and Involvement (Did you put forth sufficient time and effort for the course?) H. Course Difficulty, Work Load, and Pace I. Overall Evaluation (1) What did you like most or least? (2) How can the course or the way it was taught be improved?