The Development of the Student Instructional Report II · PDF fileThe Development of the...

The Development of the Student Instructional Report II

by

John A. Centra Chair and Professor

Program in Higher Education School of Education Syracuse University

Higher Education Assessment

The Development of the Student Instructional Report II

by

John A. Centra Chair and Professor

Program in Higher Education School of Education Syracuse University

EDUCATIONAL TESTING SERVICE, ETS, the ETS logo, STUDENT INSTRUCTIONAL REPORT II, SIR II logo, and HIGHER EDUCATION ASSESSMENT is a trademark of Educational Testing Service. The modernized logo is a trademark of Educational Testing Service.

Copyright © 1998 by Educational Testing Service. All rights reserved.

Table of Contents Page Overview .............................................................................................................................1 The Development of SIR II ...............................................................................................3 Research Background .....................................................................................................3 Construct Validity...........................................................................................................5 Instrument Design..............................................................................................................7 SIR II Teaching Dimensions ...........................................................................................7 Questionnaire Format......................................................................................................8 Response Format.............................................................................................................9 Pretesting SIR II ..............................................................................................................11 PreTest Item and Scale Statistics ...................................................................................11 Factor Analysis of Forms A and B.................................................................................16 Rasch Analysis of Forms A and B .................................................................................17 SIR II: Pilot Testing.........................................................................................................20 Factor Analysis of SIR II................................................................................................20 Reliability of SIR II........................................................................................................21 Note on Validity................................................................................................................24 References ........................................................................................................................29 List of Tables: Table 1. Course Instructional Report, Form A, Item and Scale Analysis ..................................................................................31 Table 2. Course Instructional Report, Form B, Item and Scale Analysis ..................................................................................35 Table 3. PreTest Course Instructional Report, Form A, Factor Analysis (Varimax Rotation) ................................................................39 Table 4. PreTest Course Instructional Report, Form B, Factor Analysis (Varimax Rotation) ...............................................................41 Table 5. SIR II, Factor Analysis, Factor Loadings on Six Scales, Equamax Rotation............................................................................................43 Table 6. SIR II, Coefficient Alpha Reliability Analysis ...............................................45

Table of Contents Page Table 7. SIR II, Item Reliability Coefficients for Classes with 10, 15, 20, and 25 Respondents .............................................................................47 Table 8. SIR II, Item and Scale Test/Retest Reliability Coefficients............................49 List of Appendices............................................................................................................51 A. Guidelines for the Use of Results of the Student Instructional Report (SIR/SIR II) ..................................................................................................53 B. Couse Instructional Report (CIR), Form A ..............................................................55 C. Course Instructional Report (CIR), Form B.............................................................57 D. Student Instructional Report II (SIR II)....................................................................59 E. Student Instructional Report II (SIR II), Student Comments Section ...................................................................................61

THE DEVELOPMENT OF SIR II OVERVIEW

The original Student Instructional Report (SIR), published in 1972, was based on what was

then known about effective college teaching and how students might contribute to its evaluation.

Since that time much has been learned about both effective teaching and its evaluation. The

development of SIR II, described in this report, was based on this knowledge.

Two new forms were developed and pretested in spring 1994. These forms included five of

the scales from the original SIR with questions (items) added or deleted. Three new scales or

dimensions were added: Course Outcomes, Student Effort and Involvement, and Methods of

Instruction (originally Supplementary Instructional Methods). These new scales reflected recent

emphases on measuring learning outcomes, promoting students' time on task and effort in their

learning, and encouraging active learning in the classroom. Each of the two pretested forms

included a different response format to the same set of items and scales. By having random halves

of students in 50 classes respond to the two forms, it was possible to determine which response

format was better.

The pretesting was conducted at 10 two- and four-year colleges. Traditional item and scale

analyses of the two forms included computing means, standard deviations, coefficient alphas, item-

to-scale correlations, and factor analyses. A Rasch analysis also compared the response categories

for the two forms to determine which provided better variation in student responses. Based on the

above analyses, one of the two forms was selected and items were further honed and altered.

1

The 40-item SIR II that evolved from the pretesting included eight categories of items and

an overall evaluation item. The items were grouped within each category or dimension. Students

answered most of the items regarding a course based on a five-point scale of “effectiveness,” or as

“compared to other courses,” both of these, different formats from the original SIR. A set of open-

ended questions that parallelled the eight SIR II categories was also developed to enable students to

add their comments for the instructor.

The pilot testing occurred at a variety of colleges from spring 1995 through spring 1996.

Course means and standard deviations were computed for each item and scale. A sample of the

data from the pilot testing was used to determine the reliability and construct validity of SIR II. The

three kinds of reliability computed established the internal consistency of the items within the scales

(coefficient alpha), the number of students needed for consistency of course results (intraclass

correlations), and the stability of responses over brief periods of time (test-retest). The factor

analysis indicated that the resulting factors matched perfectly with the expected or a priori scales for

SIR II.

2

THE DEVELOPMENT OF SIR II

Research Background

Research on student evaluations of teaching has mushroomed in the past 25 years, with

ERIC now containing well over 1,500 references. The vast majority of the findings from these

studies have supported the use of student evaluations for both teaching improvement and personnel

decisions, especially if users observe proper guidelines. A general discussion of these guidelines

appears in Reflective Faculty Evaluation (Centra, 1993) and, as they apply to both Student

Instructional Reports, in Appendix A.

The original SIR, published in 1972, was based on studies available at that time. In the

decade or so that followed, a series of studies on the reliability, validity, and utility of student

evaluations with the SIR found that:

• The reliability or consistency (i.e., an internal consistency measure) of mean

student ratings was good, particularly when based on more than 15 students

in a class. The ratings were also reasonably stable over short periods of time,

as measured by test-retest reliability. (SIR Reports #3 & 4)

• Validity studies indicated that student ratings generally evaluated some

aspect of teaching effectiveness. For example, in multiple section courses

instructors who received higher ratings tended to be teaching classes in which

students learned more (as measured by a final exam). Class size, subject area

of the course, and course type (major requirements, elective, etc.) were the only

3

characteristics that affected ratings, and those effects were relatively small.

(SIR Report #4)

• The usefulness of student ratings for instructional improvement was

demonstrated in a SIR study conducted at five colleges. Additional studies

have verified that teachers who want to, can use ratings and comments from

students to make positive changes in instruction. In personnel decisions,

student evaluations have increasingly been added to other sources of

information to judge teaching effectiveness more comprehensively. (SIR

Report #2)

• A more recent study indicated that SIR results were related to teaching

portfolio evaluations by peers, and that together these two sources can

provide a comprehensive review of teaching performance. (SIR Report #6)

For the original SIR, three criteria were used to select items:

(1) items that experts believed were most important to teaching and that had

been

included in previous research,

(2) items that reflected areas of instruction that students were capable of

observing and judging, and

(3) items that faculty members believed would be most useful for instructional

improvement.

A factor analysis of the items included in the original instrument resulted in six item

clusters, or factors: Sudent/Teacher Relationship, Course Objectives and Organization, Lectures

4

(Communi-cation), Course Difficulty and Workload, Course Examinations, and Reading

Assignments (SIR Report #3). These six factors, with some modifications in titles and item

wording, have been used to summarize SIR responses for users over the past 25 years.

Construct Validity

Research supports the view that teaching is indeed multidimensional, and that it is a

complex activity in which teachers may be effective or ineffective on different aspects or

dimensions. After devising a system for categorizing items, Feldman (1976) came up with a list of

21 categories that had been included in studies in which students identified characteristics of

superior college teachers. In addition to the six identified by the original SIR analysis, student self-

ratings of learning, teacher enthusiasm, and teacher personality characteristics were identified by

Feldman (other dimensions were simply more specific, such as “clarity” and “elocutionary skills”

instead of “communication”). The dimensions identified by Feldman were reviewed for items that

might be included for the revision of SIR. Also reviewed were characteristics of effective teaching

identified by faculty members, administrators, and alumni in various studies (Centra, Froh, Gray, &

Lambert, 1987). These qualities, not surprisingly, overlapped considerably those identified by

students. The qualities include:

• Good organization of subject matter and course

• Effective communication

• Knowledge of and enthusiasm for the subject matter and teaching

• Positive attitude toward students

• Fairness in examinations and grading

• Flexibility in approaches to teaching

5

• Appropriate student learning outcomes (Centra et al.)

The above qualities were discussed in a monograph written and published with a committee

at Syracuse University (Centra et al. 1987). Designed to help deans, department chairs, and faculty

members evaluate teaching performance, the first six qualities emphasize appropriate teaching

procedures and the seventh points out the importance of appropriate and purposeful student

learning.

These seven categories, as the following discussion makes clear, were critical in the development of

SIR II.

6

INSTRUMENT DESIGN

SIR II Teaching Dimensions

The extensive research on the dimensions of effective teaching and the large number of

factor analyses of student ratings that duplicated several of these dimensions were the foundation

for the development of SIR II. Five of these dimensions which are included in the original SIR are

also used in SIR II: Course Organization and Planning; Communication; Faculty/Student

Interaction; Assignments, Exams, and Grading; and Course Difficulty, Workload, and Pace. While

the dimensions were the same, new items were added to reflect a broader or more current

interpretation of each scale. For example, the Communication dimension now includes the

instructor’s command of spoken English and his/her enthusiasm for the course material (as opposed

to a general personal enthusiasm which some rating forms include). The instructor's respect for

students became part of Faculty/Student Interaction, and an item on whether students were told how

they would be graded became part of Assignments, Exams, and Grading. All together, eight new

items were added to four of the five dimensions, while Course Difficulty, Workload and Pace

remained the same.

The major change in the first draft of the SIR II was the addition of three new areas:

Methods of Instruction (originally called Supplementary Instructional Methods), Student Effort and

Involvement, and Course Outcomes. Each of these dimensions addressed recent emphases in

college instruction and learning. Under Methods of Instruction, for example, a number of active

learning practices were listed. Active learning, as opposed to passive lecturing, has long been

known to facilitate student learning. For this reason, college teachers have been urged to use active

7

instructional methods in their courses (see, for example, the 1984 Involvement in Learning report

[Study Group, 1984]; also Bonwell & Eison, 1991).

Similarly, an educators’ conference at the Wingspread Conference Center in Racine,

Wisconsin, in 1986 resulted in the “Inventories of Good Practice in Undergraduate Education,”

which emphasized the need for active learning, as well as student time on task and other practices

(Chickering & Gamson, 1987). Time on task means that students need to spend the time and make

the commitment necessary to prepare for class and assignments. To underscore this need, the new

SIR included three items asking about student effort and involvement in the course.

The third area added, Course Outcomes, reflects still another emphasis in evaluation during

the past decade or so. Numerous higher education reports, including the two mentioned above,

discussed the need to focus on student learning. The various accrediting agencies have also called

on institutions to measure student learning and other outcomes of instruction in their self studies.

Items added to the SIR II assessed students’ ratings of their learning, their independent thinking,

and other broad outcomes that students attributed to the particular course they were evaluating.

Questionnaire Format

Because of the numerous factor analytic studies that previously established the various

dimensions of instruction, the items for each dimension were grouped together in the new SIR

rather than being scattered throughout the questionnaire. Doing this made the form easier and

quicker to complete. Factor analysis and other internal consistency analyses were later used to

validate the

dimensions.

8

Response Format

The original SIR primarily used a four-point agree-disagree scale, (except for a smaller set

of items, which used a five-point scale with ratings of poor to excellent). Data clearly indicated that

the four-point scale provided less discrimination between instructors than might be useful. That is,

even if students preferred to make finer distinctions in their ratings of instruction, they did not have

the opportunity because of the limited scale choices (strongly agree, agree, disagree, strongly

disagree). Therefore, two five-point scales were designed and pretested for the new SIR. Two

different response formats were tested at the experimental stage. Because users had become

accustomed to the agree-disagree format, this response option was repeated for one form, adding a

midpoint to make it a five-point scale. The midpoint allowed students a “neither” option, meaning

that they neither agreed nor disagreed with the statement.

A second form included the same items related to the course, but students were asked to

respond with an effectiveness rating, in particular the effectiveness in contributing “to your

learning in this course.” Specifically, the following options were presented:

In (1) Ineffective

SIn (2) Somewhat ineffective

SE (3) Somewhat effective

E (4) Effective

VE (5) Very effective

NA (6) Not applicable, not used, or don’t know

9

This second form also included different scale responses for the “course outcomes” and

“student effort and involvement” areas. In an effort to distribute responses, the scale asked students

to compare the course being rated to other courses using the following options:

ML (1) Much less than most courses

L (2) Less than most courses

S (3) About the same as others

MT (4) More than most courses

MM (5) Much more than most courses

NA (6) Not applicable, not used, or don’t know

10

PRETESTING SIR II

The two forms of the new SIR that appear in Appendix B were called the Course

Instructional Report (CIR), a name briefly considered as an alternative to SIR II. Form A of CIR

consisted of the five-point agree/disagree scale, while Form B included the effectiveness and other

scales. These forms were pretested in 50 classes at 10 two- and four-year colleges during spring

1994. Approximately 1,200 students participated in the pretest. Faculty volunteers at the

institutions administered the two forms to random halves of their classes, which allowed for

comparison of items and scale characteristics while at the same time controlling for the effects from

students, instructors, or courses. Thus, 50 paired comparisons with approximately the same number

of students responding to each form in each class could be analyzed to determine which was the

better form. About 600 students responded to Form A and a similar number responded to Form B.

Each instructor received a numerical summary of the students’ responses and were asked to provide

reactions to each of the forms. The comments, which came from about a third of the teachers, were

split between those who liked the agree/disagree format because they were accustomed to it, and

those who liked the rationale and information received from the effectiveness response format.

PreTest Item and Scale Statistics

Tables 1 and 2 include traditional items and scale analyses data for forms A and B of the

“Course Instructional Report.” For Form A (agree/disagree format), the scale mean for Course

Organization and Planning was 4.40 and the Coefficient Alpha (the extent to which the items are

consistent, or intercorrelated, for the scale) was .79. By comparison, the same scale on Form B had

a mean of 4.29 and a Coefficient Alpha of .82. On the five-point scale for both forms, the lower

11

mean of 4.29 is preferable because it reflects a less positive bias and therefore a better spread of

scores above the mean. The slightly higher Coefficient Alpha on Form B indicates better

consistency among the item responses than on Form A. On both forms the usefulness of the course

syllabus or outline had the lowest correlation with total scale score and the highest percentage of

not applicable (NA) responses, suggesting that course syllabi were either not available for the

course or that some students did not know what a syllabus was. Because another item, “the

instructor’s explanation of course requirements,” overlapped the syllabus item and was apparently

clearer to students, the syllabus item was later dropped. Added to the Course Organization and

Planning scale for SIR II was “the instructor’s way of summarizing important points in class,” an

item that the factor analysis for Form B placed in that scale.

The Communication scale on the pretest included seven items on Form A and the same

seven items plus an item on “the instructor’s willingness to listen to student questions and

opinions” on Form B. The scale means were identical (4.38), but the Coefficient Alpha was

slightly higher on Form B (.89 vs .85 on Form A). On Form B, the two items with the highest item

to scale correlations were “the instructor’s ability to make clear and understandable presentations”

(.73), and “the instructor’s use of relevant examples or illustrations to clarify course material” (.74).

Both were highly intercorrelated with other items and would seem to be the crux of the

Communications scale. While the pace of the course item was consistent with other items in this

scale, it could logically be placed with the difficulty and workload items, as it ultimately was for

SIR II. By also changing the wording, instructors could learn whether the pace was fast, slow, or

“just about right,” thereby making the item more useful for instructional improvement.

12

The Faculty/Student Interaction scale included four identical items in Form A and Form B

plus a fifth item dealing with the students perception of the instructor's receptivity to their questions

or opinions that was on Form A. After some rewording, this item was included in SIR II. The

mean scale score on Form B was lower than on Form A (4.28 vs 4.34), as were the means on three

of the four identical items. The Coefficient Alphas were identical at .81.

The fourth scale, Assignments, Exams, and Grading included seven identical items, of

which six were retained for SIR II. Omitted was the instructor’s overall fairness in grading

students, which had one of the highest mean scores and which, relative to the other items, was

much more general in its wording. Form B again had the lower mean scale score (4.09 vs 4.16) and

a higher Coefficient Alpha (.88 vs .84). The items most central to this scale, judging from the item

intercorrelations, were the exams’ coverage of important aspects of the course, and the helpfulness

of comments on assignments and exams.

Methods of Instruction (Scale E on Form B and Scale G on Form A) included a variety of

practices that an instructor may have used in the course, such as assigned group projects, case

studies, and laboratory exercises. It is not surprising that these items received the highest

percentage of “not applicable” (not used) responses. For example, 77.5 percent of the students did

not think computers had been used as aids to instruction. The item with the most use, the active

involvement of students in what they were learning, received only a 9.7 percent “not applicable”

response on Form A (slightly more on Form B). This item was later moved to the Course

Outcomes scale on SIR II, a change supported by the factor analysis. Active involvement of

students in their learning can occur across many instructional methods and, moreover, can be

viewed as an ongoing and positive outcome of instruction. Therefore it also makes sense to include

13

this item on the Outcomes scale. Although scale means and Coefficient Alphas are included in

Tables 1 and 2, Methods of Instruction does not make sense as a scale score. Rather, it is the

individual responses to the methods actually used in the course that will be most useful to the

instructor. Although students may not totally agree on the methods used, the information can be

worthwhile for the faculty member. The fact that standard deviations for items were similar to

other items on the two forms would suggest sufficient agreement among students in their responses.

The Course Outcomes scale included the same seven items on both forms, and once again

Form B had the lower mean (3.75 vs 4.22). The Coefficient Alphas for the two forms were

identical, .89. Three of the seven items were eventually either omitted or moved to other scales.

Moved to the Student Effort and Involvement Scale was the item concerning the extent to which

students thought they were challenged by the course (the low correlation with the total score, .34 on

Form B, supported this move). Omitted, in order to shorten the instrument, were the value of the

course to the student (too general), and the understanding of concepts and principles in the subject

area. Although the latter item might be useful in many courses, and thus instructors may want to

include it as an optional item, students would likely have trouble applying it to all courses. Both the

Course Outcomes and Student Effort and Involvement scales for Form B used a “compared to other

courses” response because it seemed more appropriate than an “effectiveness” response.

The Student Effort and Involvement scale included three items on both Forms A and B, one

of which was eliminated for SIR II and replaced by the “challenge” item. Eliminated was the extent

to which students were interested in learning the content when they enrolled. This item had a low

correlation with the total scale (.28 on Form B), and might better stand alone as a separate student

background item. In fact, when students are asked whether a course is a major requirement,

14

elective, or a college requirement, they are in part also reflecting their interest in taking the course.

No doubt, student interest in a course at the outset is an important influence in how they later view

instruction and their own efforts.

The final two items, Course Difficulty and Workload, were not scored on a linear scale

since the middle or “3” response on the five-point scale was most favorable. These items were

therefore omitted from the analysis. For SIR II the item on the pace of the course was added to

Course Difficulty and Workload for a three-item set; these same three items and response options

were also part of the original SIR.

The single overall evaluation item used a five-point poor to excellent scale on Form A (as

with the original SIR), and a five-point effectiveness rating scale on Form B, which had been also

used for most of the other items in the form. Having students use the same response scale for the

overall evaluation item as with other items on the form allowed them to apply the same standard.

Both overall evaluation items correlated highly with scale means on their respective forms (in the

.78 to .89 range), except for the Student Effort and Involvement scale. For this scale, the low

correlations (.56 and .48) indicated that students generally saw the quality of instruction as only

moderately related to their own effort in the course.

The traditional analyses included in Tables 1 and 2 were useful for selecting and refining

items. The generally lower mean scale scores on Form B reflect a better discrimination among

instructors than those on Form A. The Coefficient Alphas for Form B were generally higher than

for Form A. The Rasch analysis, reported later, also supports the use of the Form B response

formats.

Factor Analysis of Forms A and B

15

Tables 3 and 4 include the results of the factor analyses of Forms A and B. For both

analyses a varimax rotation was used to help clarify factors. The items for each factor are listed in

order of their loading on the factor, and items that loaded on more than one factor are given on the

same line along with the factor number in parentheses. For Form A, five of the seven factors reflect

the categories of items included in the questionnaire: Course Organization and Planning (Factor 1);

Faculty/Student Interaction (Factor 2); Course Outcomes (Factor 3); Course Workload, Difficulty,

and Pace (Factor 4); and Student Effort and Involvement (Factor 6). Two of the factors tap

different categories: three items appear to reflect Individualized Learning such as labs and

computers (Factor 5), and two items dealing with Course Challenge (Factor 7). For Form B (Table

4), the first factor seemed to include many items related to learning and grading, and was termed

Student Learning. The second factor, Preparation/Responsiveness, included 10 items dealing with

the instructors’ preparation and presentation of subject matter as well as their responsiveness or

interaction with students. Three of the factors in Form B were similar to factors in Form A:

Student Effort (Factor 3); Individualized Learning (Factor 5); and Course Difficulty and Workload

(Factor 7). The fourth factor, Collaborative Learning, included those items from the Methods of

Instruction set related to group learning. Factor 6, Clarity, included five items that largely reflected

ratings of the instructor’s ability to clarify course material.

While there is some similarity in the factor structure for the two forms, there are also

significant differences caused by the different response formats. For example, the Student Effort,

Individualized Learning, and Course Difficulty and Workload factors were part of both factor

structures, while the other four factors differed. Both factor analyses were also used in eliminating

or changing items from the pretest set.

16

Rasch Analysis of Forms A and B

Rasch item parameter estimation was a useful supplement to the traditional item and scale

analyses presented thus far with Forms A and B. In particular, the Rasch analysis (1) supported the

choice of Form B (effectiveness response scale) over Form A (agree-disagree response scale), (2)

indicated a change in the effectiveness response scale, and (3) supported the selection or

elimination of items suggested by the traditional analyses.

Rasch Analysis compared the response categories of Form A and Form B to determine

which provided better variation in student responses (variance sources include items and students).

Rasch step calibrations indicate how much additional information is provided by moving from one

response category to the next along the continuum. If there is very little difference in the

calibrations, then very little additional information is being provided by the categories. This

became evident in Form B when the analysis indicated that the distinction between “Somewhat

Ineffective” and “Somewhat Effective” was so slight that very little additional information was

provided by the two responses. In other words, students appeared to be confusing the two

responses, probably because of the use of the word “somewhat” for both categories. For SIR II the

responses were changed to “Moderately Effective” and “Somewhat Ineffective” in order to clarify

the distinctions and thus obtain better variations in student responses.

Item response analysis of Forms A and B used a one-parameter logistic model. Based upon

the results of the analysis, the following conclusions were drawn for items in each scale:

(1) Organization and Planning

� Eliminate syllabus item on both forms.

17

� Change wording on “extent that course requirements and goals

were made clear by the instructor.”

(2) Communication

� The pace item (#10 on B and #9 on A) should be moved to

another scale or eliminated.

(3) Faculty/Student Interaction

� The first item (instructor helpfulness) is somewhat redundant with

the “respect for students” item. (“Respect for students” item

was changed.)

(4) Assignments, Exams, Grading

� Quality of texts item (#23) does not relate to other items.

� Exam effectiveness item (22) seems redundant (wording was

changed on both items, to “The exams’ coverage of important

aspects of the course,” and “The overall quality of the

textbooks”).

(5) Course Outcomes Scale (E on Form A and F on Form B)

� The first item (on “challenge”) doesn’t perform like others. (It

was moved to the effort scale).

� Items 2 and 6, “amount learned” and “gaining understanding of

concepts,” seem redundant (item 6 was dropped).

(6) Student Effort and Involvement Scale

18

� The third question (student interest) doesn't fit with others.

Coefficient Alpha analysis shows the same. (This item was

moved to the Course Outcomes scale.)

19

SIR II: PILOT TESTING

The revised Student Instructional Report (SIR II) is included as Appendix C. The 40 item

form includes eight categories of items and an Overall Evaluation item. Four student background

questions, which instructors can use to interpret results, and which will be used in future research

with SIR II, are also listed. These are (1) course curriculum status (required in major, elective,

etc.); (2) student class level; (3) instructor's English ability; (4) student gender. Ten supplemental

questions that may be added by the instructor or the college to be machine scored are also part of

SIR II, as they were in the original SIR. Finally, students are invited to provide their own additional

comments about the course or instruction and to submit these to the teacher. Because of the

importance of these comments for the improvement of instruction, a separate open-ended form with

broad questions tied to each SIR II scale was developed (Appendix D).

SIR II was pilot tested at a variety of colleges during Spring semester 1995 through Spring

1996. While one purpose of the pilot testing was to build a comparison data pool to help interpret

SIR II responses, an equally important purpose was to conduct further reliability and validity

studies. A sample of classes and students were selected from the pilot data to conduct these studies,

reported next.

Factor Analysis of SIR II

Table 5 includes the results of the factor analysis of SIR II. Approximately 1,200 classes

were used for this analysis, with the unit of analysis being each class. The major purpose was to

determine the extent to which the factors duplicated the a priori scales in SIR II. As Table 5

indicates, the duplication was perfect; all of the predetermined scale items group together on the

20

expected factors. For example, the first factor, Faculty/Student Interaction, contains the same five

items that comprise the SIR II scale. All six factors matched the scales in SIR II with only a slight

reordering of items based on the factor loadings. Two sets of items, Instructional Methods and

Course Difficulty, Workload, and Pace were not factor analyzed because the items were not

intended to be interpreted as a single scale score; each item in these categories should be interpreted

by itself.

The first six factors of the principal axis factor analysis accounted for 88 percent of the

variance. The equamax rotation equalizes, to some extent, the variance among the factors selected.

The scree plot of eigenvalues indicated that six factors provided the best solution, and each of these

six factors accounted for between 4.70 and 3.88 percent of the variance.

The factor loadings for the overall evaluation item (#40) on each scale ranged between .31

and .49. The highest loadings were on Course Organization and Planning (.49) and Course

Outcomes (.42), suggesting that the items in these scales were most highly related to the students'

ratings of the general effectiveness of the course in promoting learning. Not surprisingly, the

lowest loading for the overall evaluation item was on the Student Effort and Involvement scale

(.31), suggesting that students perceived their own effort as being somewhat more separate from

their rating of the course as a whole.

Reliability of SIR II

Three kinds of reliability analyses were conducted with the items and scales in SIR. The

first was a Coefficient Alpha analysis of the scales to determine the extent to which the items in

each scale intercorrelate, or hang together. As the values in Table 6 indicate, the Coefficient Alphas

were uniformly high, ranging from .89 to .98. Thus the items within each of the scales are

21

consistently measuring a single dimension. Two of the items had lower, but still acceptable,

correlations with the total scale score. Item 7, “The Instructor’s command of spoken English,”

correlated .67 with the Communication Scale Score, and “The overall quality of textbooks”

correlated .53 with the Assignments, Exams, and Grading Scale.

The second type of reliability was at the item level, an intraclass correlation which estimates

the extent of agreement among students on each item. Separate analyses were run for class sizes of

10, 15, 20, and 25. The coefficients, presented in Table 7, typically tend to increase with class size.

For item #1 in the Course Organization and Planning Scale, for example, the correlation coefficient

was .59 for a class size of 10 (N = 58 classes analyzed), .78 for a class size of 15 (N = 51), .89 for a

class size of 20 (N = 34), and .92 for a class size of 25 (N = 24). Intraclass correlations at or above

.90 are generally considered very good, and above .80 are considered adequate. For class size of 10

or fewer, the item reliabilities reported in Table 7 are relatively low; thus item means at these class

sizes will typically have a relatively low agreement level among students. For class size of at least

15, 26 of the 30 items analyzed were close to or above .80. The overall evaluation item had a

reliability of .85 for a class size of 15 and .90 for a class size of 25. The least reliable items were in

the Student Effort and Involvement scale, which would be expected to contain more individual

variation in responses. The reliability for “I was prepared for each class” (item #35), for example,

was under .70. On the other hand, in this same scale the extent to which students were challenged

by the course (item #36) had good reliability because it was more course rather than individual

student related.

The third kind of reliability, referred to as test-retest, measures the extent to which

responses for each class are stable over short periods of time. Slight fluctuations (i.e. high

22

correlations) would indicate that students’ responses are not subject to daily occurrences or moods,

and that they likely do represent students’ ratings of the items and broader dimensions of the course,

such as Course Organization and Planning. For this analysis, 42 classes with a total of 724 students

at a small liberal arts college were studied. The two administrations of SIR II occurred

approximately two weeks apart. Table 8 includes the Pearson product-moment correlations for

each of 30 items and six scale means in SIR II. The item correlations are generally above .80, with

nine at or above .90 and only three just below .80. Scale correlations are even higher: five of the

six are above .90, with one at .88. These uniformly high correlations clearly indicate that the mean

student responses for these courses did not vary over the two weeks. In other words, the relative

rankings of the course ratings did not change because of inconsistency or fluctuations in students’

responses over the time period.

23

NOTE ON VALIDITY

A common definition of validity is the degree to which a test or instrument measures what it

is supposed to measure. The Standards for Educational and Psychological Testing (American

Educational Research Association, American Psychological Association, and National Council on

Measurement in Education, 1985, p. 9) further defines it as the “appropriateness, meaningfulness,

and usefulness of the specific inferences made from test scores” (or similar scores from an

instrument such as SIR II). A test or instrument is valid for a particular purpose or a particular

group. More specifically, it is the inferences made from an instrument that needs to be validated.

For SIR II and other student rating instruments, validity refers to the inferences drawn about

instructional effectiveness based on the students’ responses.

Several validity studies were completed with the original SIR which would have relevance

for SIR II, particularly for the scales that overlapped both instruments. For example, in SIR Report

Number 2 the usefulness of the SIR ratings for improving teaching was demonstrated (see the

“Research Background” section of this report). Other studies completed are also relevant to a

discussion of SIR II validity. In addition, during the development of SIR II validity was addressed

several times, although more studies are needed. Perhaps this can best be put in context by

considering the different types of validity.

There are three types: content, criterion, and construct validity. Content validity is a

subjective estimate of the extent that the content of an instrument relates to whatever it is designed

to measure. In selecting items for SIR II, previous studies in which various constituents (teachers,

students, administrators) had identified characteristics of effective teaching were consulted. These

24

characteristics were then used as the basis for selecting items for SIR II that defined the various

dimensions of effective teaching (e.g. course organization, interaction with students, flexibility in

approaches to teaching, appropriate learning outcomes). Conferences and publications during

recent years had also emphasized the importance of some of these criteria for effective teaching, in

particular emphasizing active learning and learning outcomes. Thus the items and scales of SIR

were designed to reflect the content of what many sources define as effective teaching.

Criterion validity is the extent that the scores from a test or instrument are related to one or

more outcome criteria. One such outcome criteria is student learning in a course. Instructors who

receive higher ratings from students should also be more successful in achieving learning outcomes.

With the original SIR, a study of several multiple section courses did demonstrate that learning

gains were related to the overall evaluation of the instructor as well as to some of the scale scores

(e.g. course organization, faculty-student interaction). That study, published as SIR Report #4,

would most likely apply to SIR II as well, particularly for the scales that are in common. Because

new scales have been added to SIR II, it remains for future studies to investigate the relationship

between those scales and appropriate criteria.

Construct validity is a more complicated concept: it evaluates the degree to which the

scores from an instrument correspond to other measures of the underlying theoretical trait. A

common practice has been to use factor analysis as one approach to studying construct validity as

was done with both versions of SIR. Having designed a set of scales and items to reflect effective

teaching, factor analysis was used to determine whether the subsequent structure corresponded to

the a priori scales. As this report demonstrated, the factors produced closely duplicated the scales

designed. However, the scales did correlate significantly with each other, as has been typical of

25

other student rating forms, and this may reflect a response set by students. That is, students have a

tendency to rate good instructors as effective on all items and scales rather than differentiating their

performances. Nevertheless, students do differ enough in their responses to make the resulting item

and scale scores useful for formative and summative purposes.

Another type of evidence used for construct validity is the correlation of scores with other

external variables. Comparing these correlations with what should or should not be found

demonstrates a lack of bias and helps establish the existence of the theoretical construct (i.e.

effective teaching). With the original SIR, class size, subject area of the course, course type,

student gender, expected student grade, class level, and other variables were studied for their

possible relationship to student ratings (see SIR Report #4). The correlations were relatively small

except for those relationships that would be expected to be strong. For example, students who

expected to receive a higher grade in a course rated instruction higher, but this likely reflects a

desirable relationship: students who learn more and thus receive higher grades also rated

instruction as effective for them.

Future studies with SIR II will continue to examine the relationship between scale scores

and other variables. One study planned will examine student/teacher gender interaction within

separate disciplines. It will, for example, investigate such questions as whether male students rate

female teachers lower than males in such male dominated fields as engineering or the physical

sciences; or whether female students rate female teachers higher than males in the nursing field.

Another study will investigate the learning outcomes scale for SIR II. How are other scales

in SIR II correlated with what students say they have learned? How do external variables modify

this relationship?

26

While future validity students are planned with SIR II, it is nonetheless true that the studies

completed with the original SIR and with SIR II thus far support its construction and current use in

evaluating effective teaching.

27

References American Educational Research Association, American

Psychological Association, and National Council on Measurement in Education. The Standards for Educational and Psychological Testing Washington, DC: American Psychological Association, 1985.

Bonwell, C. C., and Eison, J. A. Active Learning: Creating Excitement in the Classroom. ASHE-ERIC Higher Education Report, no. 1. Washington, D.C.: School of Education and Human Development, George Washington University, 1991. Centra, J. A. The Student Instructional Report: Its Development and Uses. Student Instructional Report, no. 1. Princeton, NJ: Educational Testing Service, 1972. Centra, J. A. Two Studies on the Utility of Student Ratings for Instructional Improvement. Student Instructional Report no. 2, Princeton, NJ: Educational Testing Service, 1972b. Centra, J. A. Item Reliabilities, the Factor Structure, Comparison with Alumni Ratings. Student Instructional Report, no. 3. Princeton, NJ: Educational Testing Service, 1973. Centra, J. A. Two Studies on the Validity of the Student Instructional Report. Student Instructional Report no. 4, Princeton, NJ: Educational Testing Service, 1976. Centra, J. A. The Use of the Teaching Portfolio and the Student Instructional Report (SIR) for Summative Evaluation. Student Instructional Report no. 6. Princeton, NJ: Educational Testing Service, 1992 Centra, J. A. Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. Jossey-Bass Publishers, San Francisco, 1993 Centra, J. A., Froh, R. C., Gray, P. J., and Lambert, L. M. A Guide to Evaluation Teaching for Promotion and

Tenure. Action, Mass.: Copley Publishing Group, 1987. Chickering, A. W., and Gamson, Z. "Seven Principles for Good Practice in Undergraduate Education." American Association for Higher Education Bulletin, 1987, 39, 3-7. Feldman, K. A. "The Superior College Teacher from the Student's View." Research in Higher Education, 1976, 5, 243-288. Feldman, K. A. "Class Size and College Students' Evaluations of Teachers and Courses: A Closer Look." Research in Higher Education, 1976, 21, 45-115.

Study Group on the Conditions of Excellence in American Higher Education. Involvement in Learning: Realizing the Potential of American Higher Education. Washington, DC: National Institute of Education/U.S. Dept. of Education, 1984.

TABLE 1 Course Instructional Report Form A Item and Scale Analysis Corr Alpha with (Item Scale and Items %NA Mean S.D. Total delt'd) (Items renumbered consecutively) A. Course Organization/Planning 4.40 .56 .79

1. Syllabus/Outline 2. Course requirements 3. Instr. preparation 4. Command of subject 5. Use of class time

1.8 .2 0.0 .3 .3

4.19 4.38 4.49 4.56 4.36

.85 .70 .71 .67 .82

.40 .68 .63 .58 .61

.81 .72 .73 .75 .74

Highest Intercorrelations 4 with 3 = .61 5 with 3 = .58 2 with 1 = .55 2 with 3 = .50 Subscale with Overall Item = .79 B. Communication 4.38 .56 .85

6. Clear Lectures 7. Command of English 8. Relevant examples 9. Pace of course 10. Challenging questions 11. Enthusiasm 12. Summarized pts

.3 1.5 1.5 .2 1.2 .2 .8

4.34 4.61 4.54 4.07 4.31 4.49 4.38

.80 .60 .66 .96 .77 .73 .77

.67 .56 .66 .53 .56 .61 .68

.81 .83 .82 .84 .83 .82 .81

Highest Intercorrelations 6 with 12 = .59 7 with 8 = .56 6 with 8 = .54 8 with 12 = .54 Subscale with Overall Item = .84

Course Instructional Report Form A Item and Scale Analysis C. D.

Scale and Items Faculty/Student Interaction 13. Active helpful 14. Respectful to students 15. Concern with progress 16. Available for help 17. Feel free to question Highest Intercorrelations 13 with 15 = .58 13 with 17 = .52 13 with 16 - .52 Subscale with Overall Item =.78 Assignments, Exams, Grading 18. Fair grader 19. Exams clear 20. Inform students graded 21. Helpful comments 22. Important concepts, exams 23. Texts, readings 24. Helpful assignments Higher Interactions 19 with 22 = .60 19 with 18 = .56 18 with 20 = .55 18 with 22 = .53 21 with 22 = .50 Subscale with Overall Item =.86

%NA .2 .2 .3 2.5 .5 1.0 4.1 .7 3.8 3.8 6.4 8.3

Mean 4.34 4.46 4.29 4.29 4.31 4.36 4.16 4.29 3.88 4.39 4.02 4.19 4.08 4.24

S.D. .60 .68 .90 .80 .74 .80 .63 .76 1.04 .70 1.01 .81 .94 .81

Corr with Total .69 .52 .64 .58 .60 .63 .67 .61 .60 .67 .44 .57

Alpha (Item delt'd) .81 .75 .81 .76 .78 .78 .84 .81 .80 .81 .81 .80 .84 .82

Course Instructional Report Form A Item and Scale Analysis E. F.

Scale and Items Course Outcomes 25. I was challenged 26. I learned a great deal 27. Inst. accom objectives 28. Course stimulated my interest 29. Course think critically 30. Gained understanding of concepts 31. Value of course to me Highest intercorrelations 30 with 31 = .73 30 with 28 = .64 30 with 26 = .68 30 with 29 = .66 30 with 27 = .63 31 with 28 = .71 31 with 26 = .62 31 with 29 = .58 31 with 27 = .56 28 with 29 = .62 26 with 27 = .61 26 with 28 = .58 Subscale with Overall item =.85 Student Effort/Involvement 32. I studied and put effort 33. I was prepared 34. I was interested at outset Highest Intercorrelations 32 with 33 = .55 Subscale with Overall Item =.56

%NA .8 .3 .5 1.2 .5 .7 .7 .7 .7 1.7

Mean 4.22 4.15 4.28 4.37 4.09 4.13 4.29 4.24 4.03 4.06 3.97 4.06

S.D. .64 .85 .82 .68 .96 .83 .72 .89 .72 .93 .86 .99

Corr With Total .45 .75 .67 .71 .70 .79 .77 .52 .55 .38

Alpha (Item delt'd) .89 .90 .87 .88 .87 .87 .87 .86 .67 .52 .49 .71

Course Instruction Report Form A Item and Scale Analysis G.

Scale and Items Methods of Instruction 35. Problems for small groups 36. Projects for students 37. Course actively inv. students 38. Case studies, simul., etc. 39. Lab exercises 40. Term papers/projects 41. Computers as aids 42. Course journals/logs Highest intercorrelations 39 with 41 = .66 37 with 38 = .64 35 with 38 = .55 35 with 37 = .54 Subscale with Overall Item =.71

%NA 39.1 52.1 9.7 49.0 66.8 36.1 69.1 68.8

Mean 4.12 4.18 4.03 4.19 4.18 4.08 4.11 3.91 3.79

S.D. .71 .76 .83 .82 .84 .91 .86 1.00 1.09

Corr with Total .66 .65 .65 .63 .70 .58 .65 .59

Alpha (Item delt'd) .87 .86 .86 .86 .86 .85 .86 .86 .87

TABLE 2

Course Instructional Report

Form B

Item and Scale Analysis A. B.

Scale and Item (Items renumbered consecutively) Course Organ./Planning 1. Syllabus/Outline 2. Course requirements 3. Instr. preparation 4. Command of subject 5. Use of class time Highest Intercorrelations 3 with 5 = .71 3 with 4 = .62 4 with 5 = .57 1 with 2 = .56 Subscale with Overall Item =.85 Communication 6. Clear lectures 7. Command of English 8. Willingness to listen 9. Relevant examples 10. Pace of course 11. Use of challenging questions 12. Enthusiasm 13. Summarized points Highest Intercorrelations 8 with 9 = .63 6 with 13 = .61 6 with 9 = .60 6 with 10 = .57 Subselected w/Overall Item =.89

%NA 3.7 .8 .2 .3 .8 1.0 1.7 .8 .7 .7 2.7 1.0 .7

Mean 4.29 3.96 4.19 4.40 4.56 4.31 4.38 4.32 4.63 4.53 4.45 4.11 4.23 4.50 4.25

S.D. .65 .96 .84 .79 .74 .89 .61 .86 .64 .76 .77 .87 .86 .78 .89

Corr with Total .45 .66 .70 .58 .69 .73 .57 .64 .74 .67 .69 .67 .69

Alpha (item delt'd) .82 .84 .77 .76 .79 .76 .89 .87 .89 .88 .87 .88 .88 .88 .88


Form B

Item and Scale Analysis C. D.

Scale and Items Faculty/Student Interaction 14. Instruction helpfulness 15. Respectful to students 16. Concern with progress 17. Available for help Highest Intercorrelations 14 with 16 = .59 14 with 15 = .58 16 with 17 = .57 Subscale with Overall item =.79 Assignments, Exams Grading 18. Fair grader 19. Exams clear 20. Inform student graded 21. Helpful comments 22. Imp't concepts, exams 23. Texts, readings 24. Helpful assignments Highest Intercorrelations 19 with 22 = .71 21 with 22 = .66 19 with 21 = .60 20 with 21 = .59 18 with 21 = .59 18 with 22 = .59 Subscale with Overall Item =.85

%NA .3 .8 .7 7.9 1.2 4.9 1.7 8.8 5.2 7.3 8.1

Mean 4.28 4.34 4.29 4.23 4.23 4.09 4.25 3.89 4.26 3.98 4.08 4.07 4.17

S.D. .68 .76 .95 .86 .81 .72 .86 1.04 .84 1.03 .97 .93 .88

Corr with Total .69 .58 .69 .56 .68 .72 .65 .74 .77 .51 .61

Alpha (Item delt'd) .81 .73 .79 .73 .79 .88 .86 .86 .87 .85 .85 .88 .87


Form B

Item and Scale Analysis E. F.

Scale and Items Methods of Instruction 25. Problems for small groups 26. Projects for students 27. Case studies, simul., etc. 28. Course journals/logs 29. Lab exercises 30. Term papers/projects 31. Computers as aids Highest Intercorrelations 27 with 28 = .71 28 with 29 = .78 Subscale with Overall Item =.71 Course Outcomes 32. I was challenged 33. I learned great deal 34. Instr. accomplished objectives 35. Course stimulated my interest 36. Course think critically 37. Gained understing of concepts 38. Value of course to me Highest Intercorrelations 35 with 38 = .76 33 with 38 = .72 37 with 38 = .72 35 with 37 = .69 36 with 37 = .68 33 with 37 = .67 33 with 35 = .67 35 with 36 = .67 36 with 38 = .66 Subscale with Overall Item =.80

%NA 31.4 61.0 57.1 77.5 68.8 36.3 77.5 3.4 1.5 2.2 2.5 3.2 1.7 1.4

Mean 4.09 4.16 3.97 4.19 3.84 4.09 4.13 3.95 3.75 3.59 3.75 3.65 3.77 3.84 3.81 3.89

S.D. .75 .77 .87 .85 1.08 1.03 .88 1.11 .75 .95 .97 .85 1.06 .92 .92 1.02

Corr with Total .56 .64 .76 .95 .72 .66 .59 .34 .79 .67 .76 .76 .78 .79

Alpha (Item delt'd) .89 .89 .88 .87 .84 .87 .88 .89 .89 .92 .87 .88 .87 .87 .87 .87


Form B

Item and Scale Analysis G.

Scale and Items Student Effort/Involvement 39. I studied and put effort 40. I was prepared 41. I was interested at outset Highest Intercorrelations 39 with 40 = .71 Subscale with Overall Item =.48

%NA 1.2 1.4 1.4

Mean 3.53 3.62 3.39 3.57

S.D. .73 .95 .87 .99

Corr with Total .58 .62 .28

Alpha (Item delt'd) .66 .43 .38 .83

TABLE 3 Pre-Test Course Instructional Report Form A Factor Analysis (Varimax Rotation) Factor 1: Course Organization & Planning Factor Item # and Item (items renumbered consecutively) Loading 4. 3. 7. 2. 5. 8. 10. 18.

Instructor's command of subject matter Instructor was well prepared for each class Instructor's command of spoken English Instructor made course requirements and goals clear Instructor used class time well Instructor used relevant examples and illustrations Instructor raised challenging questions or problems Instructor was fair in evaluating and grading

46(4) 44(4)

83 77 76 71 71 71 56 50

Factor 2: Faculty/Student Interaction 15. 24. 13. 21. 14. 17. 23. 19. 22. 27. 16.

Instructor concerned with student progress Assignments helpful in understanding material Instructor actively helpful and responsive to students Instructor made helpful comments on exams/assignments Instructor was respectful to students I felt free to ask questions or express opinions The texts and supplementary readings were useful Exam questions were clear Exam emphasized important concepts Instructor accomplished course objectives Instructor readily available for help

55(3) 46(3) 52(1)

82 79 73 67 65 64 63 55 58 55 55

Factor 3: Course Outcomes 28. 31. 30. 37. 29. 38. 26. 22. 6.

The course stimulated my interest in subject The course was valuable to me I gained understanding of concepts/ principles Course actively involved students in learning Course helped me to think critically/ independently Case studies, simulations, etc. effective I learned a great deal Exams emphasized important concepts Instructor presented clear lectures

46(4) 44(5) 45(1) 58(2)

85 84 79 66 64 53 48 55 47

Factor 4: Course Workload, Difficulty, Pace 43. 44. 36. 40. 9. 20. 12.

Workload about right Level of difficulty about right Assigned projects effective I learned from Term papers/Projects Pace was about right Instructor informed student of grading Instructor emphasized or summarized

43(1) 48(1),43(2) 46(1)

84 62 58 59 56 55 51

Factor 5: Individualized Learning 39. 41. 42.

Lab exercises helpful Computers as aids to instruction Course journals or logs effective

90 85 75

Factor 6: Student Effort and Involvement 33. 34. 32.

I was prepared for each class I was interested in learning content I studied and put effort into course

74 71 69

Factor 7: Course Challenge 25. 35.

I was challenged by this course Thoughtful problems or questions used

41(3)

81 50

TABLE 4

Pre-Test


Form B

Factor Analysis (Varimax Rotation) Factor 1: Student Learning Factor Item # and Item (items renumbered consecutively) Loading 33. 22. 18. 23. 1. 35. 6. 24. 19. 16. 20.

Rating of how much I learned Effectiveness of exams for important concepts Fairness in evaluation students Overall quality of texts/readings Usefulness or syllabus/outline Extent to which my interest increased Instructor's ability to present clear lecture Helpfulness of assignments Clarity of exam questions Instructor's concern for student progress Information given to students on grading

50(2) 42(5) 46(2) 59(2) 66(6)

75 75 72 70 69 66 63 61 58 59 53

Factor 2: Preparation/Responsiveness 3. 4. 5. 13. 15. 12. 14. 8. 16. 17.

Instructor's preparation for class Instructor's command of subject Instructor's use of class time Instructor emphasized/summarized Instructor's respect for students Instructor's enthusiasm Instructor's helpfulness/responsiveness Instructor's willingness to listen to students Instructor's concern for student progress Instructor's availability for extra help

46(1) 50(4) 46(4) 59(1)

82 81 81 79 69 67 65 60 59 56

Factor 3: Student Effort 40. 39. 36. 37. 38. 32. 34. 41.

Student preparation for class Student effort and studying Helped student think critically/independently Student gained understanding of concepts Overall value of course to student, Student challenged by course Course objectives accomplished Increase in interest in learning

56(1) 48(1) -57(7) 49(1) 40(5)

86 85 74 70 65 64 62 58

Factor 4: Collaborative Learning 25. 26. 11. 27.

Problems or questions for small groups Projects in which students work together Instructor's use of challenging question/problems Case studies, simulations, etc.

46(1) 53(4)

77 77 72 55

Factor 5: Individualized Learning 31. 28. 29. 30. 27. 21.

Computers as aids Course journals, logs Lab exercises Term papers, projects Case studies, simulations Instructor's comments on exams, etc.

53(4) 45(1),46(2)

79 71 68 55 55 53

Factor 6: Clarity: 9. 10. 20. 7. 2.

Instructor's use of examples/illustrations Pace of course material Information given to students on grading Instructor's command of English Instructor made course requirements clear

53(1) 44(2) 45(5)

78 66 66 64 48

*Factor 7: Course Difficulty and Workload 43. 42.

Workload Level of difficulty *Mid-point is most favorable response

81 80

TABLE 5 SIR II FACTOR ANALYSIS FACTOR LOADINGS ON SIX SCALES EQUAMAX ROTATION N = 1200 classes Factor 1, Scale C Faculty/Student Interaction 4.70%

1

Item # Loadings 12. 15. 11. 13. 14. (40)

Respect for students Willingness to listen to student questions/opinions Helpfulness and responsiveness to students Concern for student progress Availability for extra help Overall evaluation

.75 .73 .73 .71 .69 (.37)

Factor 2, Scale A Organization and Planning 4.30% 2. 4. 5. 1. 3. (40)

Instructor preparation for each class Use of class time Way of summarizing or emphasizing important points Explanation of course requirements Command of the subject matter (Also .54 on Scale B, Communication) Overall evaluation

.74 .70 .56 .53 .52 (.49)

Factor 3, Scale D Assignments, Exams and Grading 4.13% 17. 18. 19. 21. 16. 20. (40)

Clarity of exam questions Exams' coverage of important aspects of course Instructor's comments on assignments and exams Helpfulness of assignments in understanding material Information given to students on how graded Overall quality of textbooks Overall evaluation

.71 .68 .61 .57 .51 .39 (.33)

Factor 4, Scale F Course Outcomes 4.11% 31. 32. 33. 30. 29. (40)

My interest in subject has increased Course helped me to think independently Course actively involved me in learning I made progress toward course objectives My learning increased Overall evaluation

.75 .68 .65 .64 .61 (.42)

1Variance explained by each factor of the total variance accounted for by the equamax rotation. The first six factors accounted for 88 per cent of the principal axis factor analysis.

Factor 5, Scale G Student Effort and Involvement 4.01% 34. 36. 35. (40)

I studied and put effort into the course I was challenged by this course I was prepared for each class Overall evaluation

.95 .85 .70 (.31)

Factor 6, Scale B Communication 3.88% 7. 6. 8. 10. 9. (40)

Instructor's command of spoken English Ability to make clear presentations Use of examples or illustrations Enthusiasm for course material Use of challenging questions or problems Overall evaluation

.64 .61 .55 .51 .47 (.38)

TABLE 6 SIR II COEFFICIENT ALPHA RELIABILITY ANALYSIS N = 1200 classes

A. 1. 2. 3. 4. 5. B. 6. 7. 8. 9. 10. C. 11. 12. 13. 14. 15. D. 16. 17. 18. 19. 20. 21. F. 29. 30. 31. 32. 33.

Scale and Item Course Organization and Planning Coefficient Alpha = .96 Explanation of course requirements Instr. preparation for class Command of subject matter Use of class time Way of summarizing/emphasizing Communication Coefficient Alpha = .94 Ability to make clear explanations Instr. command of spoken English Use of examples or illustrations Use of challenging questions/prob. Enthusiasm for course material Faculty/Student Interaction Coefficient Alpha = .98 Helpfulness/responsiveness to students Respect for students Concern for student progress Availability for extra help Willingness to listen to students Assignments, Exams and Grading Coefficient Alpha = .93 Information given to students on grading Clarity of exam questions Exams' coverage of important aspects Instr. comments on assign/exams Overall quality of textbooks Helpfulness of assignments Course Outcomes Coefficient Alpha = .97 My learning increased I made progress toward course objectives My interest in subject increased Course helped me think independently Course actively involved me in learning

Corr. with total .89 .92 .86 .87 .91 .90 .67 .91 .86 .83 .96 .94 .94 .91 .94 .78 .83 .90 .89 .53 .87 .93 .94 .92 .94 .89

Alpha (with item deleted) .95 .94 .95 .95 .95 .91 .95 .91 .91 .92 .97 .97 .97 .98 .97 .92 .91 .90 .90 .95 .91 .96 .96 .97 .96 .97

G. 34. 35. 36.

Scale and Item Student Effort and Involvement Coefficient Alpha = .89 I studied and put effort in course I was prepared for each class I was challenged by this course

Corr. with total .91 .73 .81

Alpha (with item deleted) .76 .92 .86

TABLE 7 SIR-II ITEM RELIABILITY COEFFICIENTS FOR CLASSES WITH 10, 15, 20, AND 25 RESPONDENTS1 n = 10 n = 15 n = 20 n = 25 Item No. r # r # r # r # Scale name: Course Organization and Planning A1 0.59 58 0.78 51 0.89 34 0.92 24 A2 0.69 55 0.83 54 0.85 31 0.91 24 A3 0.68 58 0.85 52 0.82 34 0.86 24 A4 0.49 57 0.76 47 0.91 36 0.91 25 A5 0.72 53 0.76 52 0.91 35 0.92 24 Scale name: Communication B6 0.73 54 0.83 53 0.91 33 0.91 21 B7 0.68 56 0.92 51 0.90 32 0.75 27 B8 0.69 56 0.78 43 0.90 32 0.90 25 B9 0.65 54 0.84 50 0.84 40 0.87 26 B10 0.78 52 0.84 54 0.92 32 0.93 20 Scale name: Faculty/Student Interaction C11 0.76 55 0.78 50 0.91 32 0.92 21 C12 0.70 57 0.75 51 0.92 34 0.91 22 C13 0.67 57 0.84 53 0.91 34 0.91 28 C14 0.72 56 0.75 60 0.87 34 0.81 17 C15 0.72 57 0.81 54 0.92 35 0.89 25 Scale name: Assignments, Exams, and Grading D16 0.63 55 0.78 54 0.90 35 0.89 21 D17 0.65 54 0.77 42 0.89 27 0.86 20 D18 0.61 54 0.83 43 0.89 27 0.85 21 D19 0.71 58 0.81 58 0.91 34 0.88 23 D20 0.68 57 0.73 56 0.80 31 0.86 22 D21 0.70 64 0.83 52 0.85 35 0.83 24

1r=intraclass correlation coefficient; # = number of classes with n students sponding to the item. re

n = 10 n = 15 n = 20 n = 25 Item No. r # r # r # r # Scale name: Course Outcomes F29 0.74 58 0.81 55 0.74 35 0.90 24 F30 0.70 56 0.79 57 0.74 36 0.87 27 F31 0.77 60 0.83 54 0.78 38 0.90 27 F32 0.71 60 0.77 53 0.77 32 0.86 26 F33 0.79 59 0.83 55 0.81 35 0.91 29 Scale name: Student Effort and Involvement G34 0.58 60 0.78 59 0.75 33 0.73 24 G35 0.45 58 0.68 61 0.61 37 0.67 26 G36 0.69 60 0.85 59 0.83 33 0.87 26 Overall Evaluation 0.73 60 0.85 53 0.84 40 0.90 25

TABLE 8 SIR-II ITEM AND SCALE TEST/RETEST RELIABILITY COEFFICIENTS N = 42 CLASSES Scale A: Course Organization and Planning Scale r = .92 Item No. A1 .76 A2 .92 A3 .81 A4 .92 A5 .88 Scale B: Communication Scale r = .94 B6 .85 B7 .86 B8 .90 B9 .82 B10 .82 Scale C: Faculty/Student Interaction Scale r = .94 C11 .92 C12 .95 C13 .86 C14 .90 C15 .87 Scale D: Assignments, Exams, and Grading Scale r = .94 D16 .80 D17 .91 D18 .92 D19 .85 D20 .89 D21 .78 Scale F: Course Outcomes Scale r = .91 F29 .87 F30 .78 F31 .87 F32 .87 F33 .93

Scale G: Student Effort and Involvement Scale r = .88 G34 .87 G35 .80 G36 .88 Overall Evaluation Item r = .90

STUDENT INSTRUCTIONAL REPORT II (SIR II)

Student Comments Section

This section gives you the opportunity to add comments about the course and the way it was taught. You may want to look at your responses to the statements within each of the categories in the questionnaire. You can expand on those responses or add additional information below. A. Course Organization and Planning (Course requirements, use of class time) B. Communication (Class presentations, instructor enthusiasm and ability to communicate) C. Faculty/Student Interaction (Instructor availability and responsiveness to students) D. Assignments, Exams, and Grading (Fairness and quality of exams and assignments; grading)

E. Supplementary Instructional Methods (Your reactions to any particular practices used in the course, such as small group discussions, group projects, labs, case studies) F. Course Outcomes (What you did or did not get out of the course) G. Student Effort and Involvement (Did you put forth sufficient time and effort for the course?) H. Course Difficulty, Work Load, and Pace I. Overall Evaluation (1) What did you like most or least? (2) How can the course or the way it was taught be improved?

The Development of the Student Instructional Report II · PDF fileThe Development of the...

Documents

Transcript of The Development of the Student Instructional Report II · PDF fileThe Development of the...