Methods for Studying Leadership

23
Methods for Studying Leadership John Antonakis Chester A. Schriesheim Jacqueline A. Donovan Kishore Gopalakrishna-Pillai Ekin K. Pellegrini Jeanne L. Rossomme U nderstanding and conducting leadership research requires knowledge of research methodology. In this chapter, we provide an overview of important elements relating to the use of research methods and proce- dures for studying and making inferences about leadership. We anticipate that our chapter will provide readers with useful pointers for conducting and for assessing the quality of leadership research. Apart from a basic review, we also cover some advanced topics (e.g., use of levels of analysis, structural-equation modeling) that should be interesting to researchers and graduate students. Where possible, we have attempted to write the chapter using mostly non- technical and nonstatistical terminology so that its contents are accessible to a comparatively large audience. 48 CHAPTER 3 03-Antonakis.qxd 11/26/03 6:04 PM Page 48

Transcript of Methods for Studying Leadership

Page 1: Methods for Studying Leadership

Methods forStudying Leadership

John Antonakis

Chester A. Schriesheim

Jacqueline A. Donovan

Kishore Gopalakrishna-Pillai

Ekin K. Pellegrini

Jeanne L. Rossomme

U nderstanding and conducting leadership research requires knowledgeof research methodology. In this chapter, we provide an overview ofimportant elements relating to the use of research methods and proce-

dures for studying and making inferences about leadership. We anticipate thatour chapter will provide readers with useful pointers for conducting and forassessing the quality of leadership research. Apart from a basic review, we alsocover some advanced topics (e.g., use of levels of analysis, structural-equationmodeling) that should be interesting to researchers and graduate students.Where possible, we have attempted to write the chapter using mostly non-technical and nonstatistical terminology so that its contents are accessible to acomparatively large audience.

48

CHAPTER 3

03-Antonakis.qxd 11/26/03 6:04 PM Page 48

Page 2: Methods for Studying Leadership

In the sections that follow, first we define science and differentiate it fromintuition. Then we discuss the nature of theory and how it is tested and developed.Next, we discuss methods of research, differentiating between quantitative andqualitative modes of inquiry. We also discuss the major categories of research, includ-ing experimental and non-experimental methods. Finally, we discuss potentialboundary conditions of theories and how boundaries can be detected.

Our Knowledge of the World: Science Versus Intuition

As readers have ascertained from the first two chapters of this book, leadership is acomplex concept; however, its complexity does not render it immune from scien-tific study. To appreciate or conduct research in the field of leadership, one mustfirst understand some fundamentals about science, scientific theories, and the wayin which theories are tested.

Science differs from common knowledge in that it is systematic and controlled(Kerlinger & Lee, 2000). Scientists first build theories, then conduct research in asystematic fashion that subjects their theories to controlled empirical testing. Lay ornonscientific persons tend to use theory loosely and to accept theories or explana-tions that have not been subjected to rigorous testing. Scientists also use controls inan effort to isolate relationships between variables so as to uncover causality andensure that other presumed causes, as well as chance, have been ruled out. In con-trast, lay people tend to accept causal explanations that may be based merely onassociations—whether putative, spurious, or actual—between variables. Scientistsalso discard untestable explanations of phenomena, whereas nonscientists mayaccept such explanations (Kerlinger & Lee, 2000).

In our day-to-day lives, we are all “commonsense” or “intuitive” psychologistsin the sense that, by using data from personal observations or secondary sources,we try to figure out the world and people around us (L. Ross, 1977). In our inter-actions with the world, we try to make sense of events or causes of outcomes andoften use explanations that are intuitively appealing. We all have “theories” abouthow the world works and “test” these theories daily through our observations.Sometimes our theories are confirmed, either because our theories are correct,because we have created a self-fulfilling prophecy, or perhaps even as a resultof chance; other times, we process information consistent with our theories (seeS. T. Fiske, 1995; Merton, 1948b). As intuitive psychologists, however, we often arewrong (L. Ross, 1977; Tversky & Kahneman, 1974). We probably are more likelythan others to fall prey to our false beliefs in trying to understand the phenome-non of leadership because many of us, as followers or leaders, have experiencedleadership first hand. Thus, it is important that we establish our knowledge of theworld using the scientific method.

Readers of this book probably would not trust a “witch doctor”—offering aconcoction of crushed rhinoceros nose, bat claws, and frog eyes, mixed in pythonblood—to cure a deadly disease, no matter how much experience or how manysatisfied clients the witch doctor may have or claims to have. Readers would rightly

Methods for Studying Leadership——49

03-Antonakis.qxd 11/26/03 6:04 PM Page 49

Page 3: Methods for Studying Leadership

expect that a medical treatment has been subjected to rigorous scientific tests todetermine whether the treatment is safe and efficacious in fighting a particular illness.

Similarly, one would expect that readers, as users of leadership theory oras consumers of leadership training programs, self-help guides, books, and thelike, would select models and methods whose validity has been established.Unfortunately, consumers of leadership products cannot turn to an authority—akin, for example, to the Food and Drug Administration—to determine whethera particular product is “safe for consumption” (i.e., valid). Whether intentionallyor unintentionally, producers of leadership products may cut a few corners hereor there knowing that there is no administrative body that keeps a watchful eye onwhat they peddle to unsuspecting consumers. Thus, we believe that the need to under-stand the methods and procedures used in leadership research is imperative not onlyfor those who study leadership, but for those who consume its products as well.

The Scientific Method

Slife and Williams (1995) stated that the goal of science is to “establish withsome authority the causes of events, and provide an understanding of phenomenathat is objective and uncontaminated by traditions and subjective speculations”(p. 173). The scientific approach involves four steps (Kerlinger & Lee, 2000). Thefirst is the expression of a problem, obstacle, or idea (e.g., why do some leadersmotivate followers better than do other leaders?). In this step, the idea is notrefined; it may be vague or based on unscientific “hunches.” The second step is thedevelopment of conjectural statements about the relationship between two or morephenomena (e.g., behavioral leadership style is associated with motivation forreason X). The next step is the reasoning or deduction step. During this step, theconsequences of the conjectural statements are developed. Specifically, the typesand results of hypothesized relationships between variables are speculated (e.g.,democratic leadership is positively associated to motivation, whereas autocraticleadership is negatively associated with motivation, under conditions Y). Thelast step includes observation, testing, and experimentation to put the scientist’sreasoning to empirical test to determine if the hypothesized relation (or difference)between the variables is statistically probable and not merely attributable to chance(i.e., the scientist measures the leadership styles of the leader and the correspondingdegree of motivation in followers to determine if there is a statistically significantrelation between leader style and motivation).

Rigorous scientific research adheres to a specific set of standards and guidelinesat every step in the process (Filley, House, & Kerr, 1976). The process must be rig-orous in order for drawn inferences to be valid. Rigor is defined as “methodologyand argument that is systematic, sound, and relatively error free” (Daft, 1984, p. 10).Rigor deals with the process and not the outcome, whereas quality deals with theoutcome of the process, which includes the overall contribution of the work and thedegree to which findings mirror the real world (Kerlinger & Lee, 2000). There maybe a trade-off between the rigor of the process, on one hand, and the real-world

50——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 50

Page 4: Methods for Studying Leadership

contribution and generalizability of the study, on the other. This trade-off occursbecause there are many interactions among phenomena in real-world settings,and studying specific, isolated relationships in scientific research may yield findingsthat do not mirror the relationships that occur in their natural settings. Becauseadherence to the scientific process creates these types of generalizability issues, thistrade-off must be considered when any type of scientific research is planned orevaluated. We revisit this issue later.

What Is Theory?

The creation of theoretical frameworks that can explain a practical phenomenon isthe ultimate aim of science (Kerlinger, 1986). Unfortunately, nonscientists may notnecessarily have a positive perspective of theory. One often hears definitions of theorythat include such phrases as “something that is not testable,” “something unproven,”“something hypothetical,” and “an ideal.” These definitions of theory are incorrect.Lewin (1945) once stated,“Nothing is as practical as a good theory” (p. 129). A theorymust, therefore, reflect reality and be applicable to practice. If it does not reflectreality, it cannot be applicable to practice; hence, it is not a good theory.

Theories advance our knowledge of social life by “proposing particular concepts(or constructs) that classify and describe [a] phenomenon: then they offer a set ofinterrelated statements using these concepts” (Pettigrew, 1996, p. 21). Theories arenot absolute laws; they are always tentative and are based on probabilities of eventsoccurring (Pettigrew, 1996; see also Popper, K. R. (1965)). According to J. P. Campbell(1990), the functions of theory are to (a) differentiate between important and lessimportant facts, (b) give old data new understanding, (c) identify new and innov-ative directions, (d) provide for ways to interpret new data, (e) lend an under-standing to applied problems, (f) provide a resource for evaluating solutions toapplied problems, and (g) provide solutions to previously unsolvable problems.

A theory is a collection of assertions that identify which elements are importantin understanding a phenomenon, for what reasons they are important, how theyare interrelated, and under what conditions (i.e., boundary conditions) the elementsshould or should not be related (Dubin, 1976). Specifically, Kerlinger (1986)defined theory as being “a set of interrelated constructs (concepts), definitions, andpropositions that present a systematic view of phenomena by specifying relationsamong the variables, with the purpose of explaining and predicting the phe-nomena” (p. 9). At this point, we should clarify the distinction made between theterms construct and variable (Bacharach, 1989). A construct cannot be observeddirectly, whereas a variable can be observed and measured. Based on this distinc-tion, relationships among constructs are set forth by propositions, and relationshipsamong variables are described by hypotheses. A theory thus includes propositionsregarding constructs and their hypothesized relations. The consequence of thehypothesized relation, in the form of measured variables and their interrelations(or differences), is what researchers directly test (Kerlinger, 1986). The types ofstatistical methods to test for relations and differences are briefly introduced later.

Methods for Studying Leadership——51

03-Antonakis.qxd 11/26/03 6:04 PM Page 51

Page 5: Methods for Studying Leadership

Following Filley et al. (1976), we make two distinctions concerning theory:(a) the method of theory development and its implications for testing and (b) thepurpose of the theory. Theories may be inductive or deductive (Dubin, 1976).Inductive theories flow from data to abstraction. Here, the emphasis is on theorybuilding, whereby a researcher uses observations, data, or the conclusions andresults of other studies to derive testable theoretical propositions. The flow ofdevelopment is from specific observations to more general theory. Deductivetheory focuses on theory testing by beginning with a strong theory base. Deductivetheory building flows from abstraction to specific observation; it places the empha-sis on testing hypotheses derived from propositions and thus attempts to confirmhypotheses derived from theory.

The second distinction is the purpose of the theory, which may be descriptiveor prescriptive. Descriptive theories concern actual states—they describe what “is”or “was.” For example, the full-range leadership theory is purported to include nineleadership dimensions, ranging from the passive-avoidant leader to the inspiringand idealized leader, and each of the dimensions is hypothesized to predict certainleader outcomes (see Avolio, 1999; Bass, 1998). In contrast, prescriptive theories arenormative and describe what “should occur.” For example, Vroom and associates(Vroom & Jago, 1988; Vroom & Yetton, 1973) have developed a normative decision-making model of leadership that suggests which leadership style should be used incertain environmental contingencies. Of course, a descriptive theory also can beprescriptive, and prescriptive theory must be built on a description of sorts.For example, based on empirical studies and theoretical reasoning, Bass (1998) andAvolio (1999) stated that passive-avoidant leadership should be used least often,because it is associated with negative follower outcomes, whereas active and ideal-ized leadership should be used most often.

Filley et al. (1976) developed five evaluative criteria for judging the acceptabilityof theory. First, a good theory should have internal consistency. That is, it should befree of contradictions. Second, it should be externally consistent. That is, it shouldbe consistent with observations and measures found in the real world (i.e., data).Third, the theory must be testable. The theory must permit evaluation of itscomponents and major predictions. Fourth, the theory must have generality. Thatis, it must be applicable to a range of similar situations and not just a narrow setof unique circumstances. Finally, a good theory must be parsimonious. A simpleexplanation of a phenomenon is preferred over a more complex one.

Hypotheses

Once a theory and its propositions have been developed, it needs to be testedfor external consistency. To do this, constructs need to be associated with mea-sured variables (i.e., empirical indicators), and propositions must be converted intotestable hypotheses. A hypothesis is a conjectural statement about one or more vari-ables that can be tested empirically. Variables are defined as observable elementsthat are capable of assuming two or more values (Schwab, 1980). Whereas variables,

52——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 52

Page 6: Methods for Studying Leadership

as manifest indicators, are observable, constructs are not directly observable(Maruyama, 1998). Hypotheses are the working instruments of theory because theyhelp the scientist disconfirm (or fail to disconfirm) a theory by not supporting (orsupporting) predictions that are drawn from it.

Good hypotheses must contain variables that are measurable (i.e., that canassume certain values on a categorical or continuous scale) and must be testable(Kerlinger & Lee, 2000). Hypotheses should not be in the form of a question. Forexample, “Does leader supportiveness have an effect on group performance?” is nottestable and therefore is not a hypothesis but instead a research question, whichmay be valuable for guiding research per se. An acceptable hypothesis would be that“Leader supportiveness is positively associated with group performance.”Furthermore, hypotheses should not contain value statements such as “should,”“must,” “ought to,” and “better than.” Finally, hypotheses should not be too generalor too specific. If they are too vague and general, they cannot be tested. If they aretoo specific, research findings may not be generalizable and therefore may make lessimportant substantive contributions to knowledge.

Theory Development

The process of developing a good theory involves four steps (Dubin, 1976). Thefirst step is the selection of elements whose relationships are of interest (e.g., the rela-tion between leader style and follower motivation). All relevant elements should beincluded in the theory, but those that may be viewed as extraneous should be left out.

Once the above elements are chosen, the second step is to specify how theelements are related. In particular, what impact do the elements have on oneanother? In this step, specific relationships must be articulated (e.g., democraticleadership is positively associated with motivation).

The third step specifies the assumptions of the theory. These include justifica-tions for the theory’s elements and the relationships among them. The step alsoinvolves the specification of boundary conditions within which the elements inter-act and are constrained (Dubin, 1969). The boundary conditions set forth the cir-cumstances to which the theory may be generalized and are necessary because notheory is without boundaries. Bacharach (1989) noted that theories are bounded bythe implicit values of the theorist, as well as by space and time boundaries.Boundaries refer to the limits of the theory; that is, the conditions under which thepredictions of the theory hold (Dubin, 1976). The more general a theory is, the lessbounded it is (Bacharach, 1989).

To revisit our democratic-autocratic example, a potential boundary could benational culture or risk conditions. That is, in some cultural contexts, autocraticleadership may be more prevalent because of large power differentials betweenthose who have power (e.g., leaders) and those who do not (e.g., followers). Thus,in that context (i.e., boundary), the nature of the relation between a leader’sstyle and follower motivation changes, and we may find that autocratic leadershipis positively related to motivation. Thus, cultural context can be referred to as a

Methods for Studying Leadership——53

03-Antonakis.qxd 11/26/03 6:04 PM Page 53

Page 7: Methods for Studying Leadership

moderator of leadership effectiveness but also as a contextual factor affecting thetype of leadership that may emerge, as we will discuss later.

In the fourth step, there must be specification of the system states in which thetheory operates. A system state is “a condition of the system being modeled inwhich the units of that system take on characteristic values that have a persistencethrough time, regardless of the length of time interval” (Dubin, 1976, p. 24). Inother words, the system states refer to the values that are exhibited by units consti-tuting the theoretical system and the implications associated with those values.

Methods of Research

Qualitative and Quantitative Distinctions

According to F. Williams (1992), there are two major methodological researchtraditions: qualitative and quantitative methods. Quantitative methods should beutilized when the phenomenon under study needs to be measured, when hypothe-ses need to be tested, when generalizations are required to be made of the measures,and when generalizations need to be made that are beyond chance occurrences.As noted by Williams, “if measures are not apparent or if researchers cannotdevelop them with confidence, then quantitative methods are not appropriate tothe problem” (p. 6). Thus, choice of which approach to use will depend on a numberof factors.

Leadership researchers typically have used quantitative approaches; however, tobetter understand complex, embedded phenomena, qualitative approaches tostudying leadership are also necessary (see Alvesson, 1996; Bryman, Stephens, &Campo, 1996; Conger, 1998). However, “qualitative studies remain relativelyrare . . . [and should be] the methodology of choice for topics as contextually richas leadership” (Conger, 1998, p. 107). Given the contextual and complex nature ofleadership, it is important that qualitative methods—as a theory-generatingapproach—complement quantitative methods, whose strengths are in theorytesting. The reasons for using qualitative methods to study contextually rich andholistically embedded phenomena are discussed below.

Quantitative approaches to scientific inquiry generally rely on testing theoreticalpropositions, as previously discussed, in regard to the scientific method. Contrarily,qualitative approaches focus on “building a complex, holistic picture, formed withwords, reporting detailed views of informants, and conducted in a natural setting”(Creswell, 1994, p. 2). A qualitative study focuses on meanings as they relate incontext. Lincoln and Guba (1985), referred to the qualitative approach as a post-positivist [i.e., postquantitative] naturalistic inquiry method of inquiry. Unlike thepositivist, the naturalist “imposes no a priori units on the outcome” (Lincoln &Guba, 1985, p. 8).

Lincoln and Guba (1985) also argued that positivism constrains the manner inwhich science is conceptualized, is limited in terms of theory building, relies toomuch on operationalizing, ignores meanings and contexts, and attempts to reduce

54——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 54

Page 8: Methods for Studying Leadership

phenomena to universal principles. Lincoln and Guba stated that the weakness ofthe positivist approach includes the assumption that phenomena can be brokendown and studied independently, while ignoring the whole. Moreover, the distanc-ing of the researcher from what is being researched, the assumption that samplingobservations in different temporal and spatial dimensions can be invariant, and theassumption that the method is value free further limit this approach. Lastly, Lincolnand Guba stated that instead of internal and external validity, reliability, and objec-tivity, naturalists strive for credibility, transferability, dependability, and confirma-bility of results.

Qualitative research has often been criticized for being biased. Because qualitativeanalysis is constructive in nature, the data can be used to construct the realitythat the researcher wishes to see—thus creating a type of self-fulfilling prophecystemming from expectancy-based information processing. The result, therefore,may be in “the eye of the beholder,” in a manner of speaking. That is, the observermay see evidence when he or she is looking for it, even though contrary evidencealso is present (see S. T. Fiske, 1995). Thus, special controls—or triangulation—must be used to ensure that results converge from different types or sources ofevidence (Maxwell, 1996; Stake, 1995; Yin, 1994).

As can be seen from the above discussion, qualitative and quantitative paradigmsof research differ on a number of assumptions, and the selection of qualitative overquantitative approaches will depend entirely on the purpose and nature of theinquiry (Creswell, 1994). Similar to quantitative methods, qualitative methodsemploy a variety of techniques to acquire knowledge and may include (Creswell,1994): (a) ethnographies, in which a cultural group is studied over a period of time;(b) grounded theory, where data are continually used, categorized, and refined toinform a theory; (c) case studies, in which a single bounded case or multiple casesare explored using a variety of data-gathering techniques; and (d) phenomeno-logical studies, which examine human experiences and their meanings. Qualitativeresearch also can include the active participation of the researcher in an interven-tion, a practice normally labeled as participatory action research (Greenwood &Levin, 1998; W. F. Whyte, 1991).

For some specific examples relating qualitative and quantitative research-gathering methods for leadership research, refer to Kroeck, Lowe, and Brown(Chapter 4, this volume). Because the vast majority of research that is conducted inthe leadership domain is quantitative in nature and because theory can be testedappropriately only with quantitative methods, we will focus the rest of the chapteron the quantitative paradigm and its associated methods.

Categories of Research

Kerlinger (1986) stated that although a large part of research using the scientificmethod was generally experimental in nature, much research nowadays in thebehavioral sciences involves non-experimental research. Because there are manypossible causes for behavior, Kerlinger argued that the most dangerous fallacy in

Methods for Studying Leadership——55

03-Antonakis.qxd 11/26/03 6:04 PM Page 55

Page 9: Methods for Studying Leadership

science is the “post hoc, ergo propter hoc [fallacy]: after this, therefore caused bythis” (p. 347). This fallacy is most likely to occur in non-experimental conditions;that is, conditions in which the scientist does not have control over the independentvariables “because their manifestations have already occurred or because they areinherently not manipulable. Inferences about relations among variables are made,without direct intervention, from concomitant variation of independent anddependent variables” (p. 348). Based on the above distinction between experi-mental and non-experimental research Kerlinger (1986) classified research into fourmajor categories or types, namely (a) laboratory experiments, (b) field experiments,(c) field studies, and (d) survey research.

Laboratory Experiments

According to Kerlinger (1986), laboratory experiments are useful for manipulatingindependent measures in an isolated situation, thus yielding precise and accuratemeasures. Variables are studied in pure and uncontaminated conditions, becauseparticipants are assigned randomly to groups to ensure that the resultant variancein dependent outcomes is accounted for by the independent measures. Laboratoryexperiments are useful for testing predictions and for refining theories andhypotheses. Kerlinger, however, noted that some disadvantages exist in laboratoryexperiments; namely that the effect of the independent measures may be weakbecause they do not exist in real-life conditions or have been artificially created.Thus, the results of a laboratory experiment may not be generalizable to real-lifesituations and may lack external validity even though they have high internal validity.

Binning, Zaba, and Whattam (1986), for example, experimentally manipulatedgroup performance cues (i.e., indicators of effectiveness) to determine whetherthey influenced ratings of the group’s leader. Binning et al. exposed subjects to avideo of a problem-solving group and then obtained ratings of the leader’s behav-ior and effectiveness under either good or bad group performance cues. They foundthat leader behavior and global effectiveness ratings were susceptible to perfor-mance cues, especially leader behavior that was not richly described (i.e., for whichraters did not have sufficient information to construct an accurate schema). Thus,important here is the identification of the conditions (i.e., low information condi-tions) under which ratings are susceptible to performance-cue effects.

A second example is a study by J. M. Howell and Frost (1989), who experimentallymanipulated three leadership styles using trained confederate leaders (i.e., individ-uals working for the researcher) and who exposed participants to either a high orlow productivity group, also including confederates. Their results showed thatparticipants working under a charismatic leader generally had better work outcomesand were more satisfied than were participants under a considerate or structuringleader. Furthermore, followers under charismatic leaders were equally influencedirrespective of group productivity standards. An important element of this study isthat leader charisma was manipulated in a laboratory setting.

Laboratory studies thus can provide valuable insights about leader processesand perceptions and are particularly useful for studying research from many

56——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 56

Page 10: Methods for Studying Leadership

perspectives (e.g., information processing, trait, behavioral, contingency). Indeed,laboratory studies of leadership have been conduced as far back as the 1930s (e.g.,Lewin & Lippitt, 1938) to assess the impact of leadership behavioral style (e.g., demo-cratic, autocratic). However, according to D. J. Brown and Lord (1999), leadershipresearchers, especially from the “new leadership” paradigm, have not yet made fulluse of experimental methods. They argued that nonconscious information-processing functions—integral to how leaders are legitimized and how followersperceive leaders—can be studied only in controlled laboratory settings.Furthermore, Brown and Lord stated that laboratory research is useful for studyingunique interactions of independent variables and for untangling the unique effectsof highly correlated leader dimensions that are not easily distinguishable in the fieldbecause of their co-occurrence.

Field Experiments

Kerlinger (1986) defined a field experiment as a study that occurs in a realsetting in which both the independent variables and group assignment are underthe control of the researcher. The degree of control that the researcher has incomparison to laboratory conditions is less, suggesting that other independentvariables that are not under the control of the researcher may have a bearing on thedependent measures. Like laboratory experiments, field experiments are useful fortesting theory and are applicable to studying leadership from many points of view;however, because field experiments occur in real settings, they are also useful forfinding answers to practical problems. Similar to laboratory studies, field experi-ments unfortunately are not very common in leadership domain. We discuss twoexamples below, focusing on the effects of leadership training interventions.

In a field experiment, Barling, Weber, and Kelloway (1996) demonstrated thattraining interventions can change managers’ leadership styles (including theircharisma), resulting in increased organizational commitment of subordinates(i.e., a subjective dependent measure) and improved organizational financial out-comes (i.e., an objective dependent measure). In another example, using a militarysample, Dvir, Eden, Avolio, and Shamir (2002) tested the efficacy of a leadershiptraining intervention on leaders’ direct and indirect followers. They found thattransformational leadership training had a more positive effect on direct followerdevelopment and on indirect follower performance than did eclectic leadershiptraining. A major implication of these studies is that leadership can be taughtthrough interventions by making leaders aware of their leadership style as they andothers perceive it and by facilitating a learning and goal-setting process conduciveto the development of a more effective leadership style.

Field Studies

According to Kerlinger (1986), field studies are non-experimental, occur inreal-life situations, and attempt to discover important variables and their inter-relationship by studying “the relations among the attitudes, values, perceptions,

Methods for Studying Leadership——57

03-Antonakis.qxd 11/26/03 6:04 PM Page 57

Page 11: Methods for Studying Leadership

and behaviors of individuals and groups in the situation” (p. 372). The majorweakness of field studies is that the researcher has no control over the independentvariables and, hence, the variance that is exhibited in the outcome measures. Aswith laboratory and field experiments, many types of research questions can beanswered using this type of research.

For example, Atwater, Dionne, Avolio, Camobreco, and Lau (1999) linked indi-vidual characteristics of military cadets to their leadership effectiveness/emergenceboth cross-sectionally (i.e., at one point in time) and longitudinally (i.e., wheresubjects were tracked over a period of time—another rarity in leadership research).As regards to the longitudinal results, they found that measures of physical fitnessand prior leadership experience (measured at Year 1) were significantly associatedwith leader effectiveness and emergence (measured at Year 4). These results suggestthat the potential to lead can be predicted using individual-difference measures.

Bass, Avolio, Jung, and Berson (2003) studied the relations between the leader-ship style (i.e., transformational or transactional leadership, or nonleadership) ofmilitary commanders and the simulated performance of their units. Leadershipstyle of platoon lieutenants and sergeants was measured several weeks before theirunits participated in a training simulation. Bass et al. found that the leadershipstyles of the lieutenants and the sergeants were significantly associated with platoonperformance in the field and that the impact of the leadership styles was partiallymediated by the potency and cohesion of the platoon.

Survey Research

Researchers use survey research to determine the characteristics of a populationso that inferences about populations can be made. Surveys generally focus on “thevital facts of people, and their beliefs, opinions, attitudes, motivations, and behav-ior” (Kerlinger, 1986, p. 378) and generally are cross-sectional in nature. Data forsurveys can be gathered in a number of ways, including in-person interviews, mailquestionnaires, telephone surveys, and the Internet. Survey research is practical touse because it is cheap and relatively easy to conduct. Although this type of researchis limited by the questionnaire or interview format used, it can be quite useful ifproperly designed. Survey methods have been used to answer many types ofresearch questions emanating from all leadership perspectives. Indeed, the leader-ship field is replete with survey research; thus, in the interest of saving space, we donot list any studies here (refer to Kroeck et al., Chapter 4, this volume, for examplesof survey instruments).

In conclusion, the type of research used will depend on the type of researchquestions to be answered. For example, if a researcher wishes to determine whetherthere is cross-situational consistency in individuals’ leadership styles and whethercertain traits are predictive of leadership in different situations, then that researcherwould be advised to use rotation designs, similar to those used by Kenny andZaccaro (1983) and Zaccaro, Foti, and Kenny (1991), in which leaders’ situationalconditions (e.g., group composition and task demands) vary (refer to Zaccaro,Kemp, & Bader, Chapter 5, this volume, for further discussion of these studies). The

58——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 58

Page 12: Methods for Studying Leadership

remaining chapters in this book—especially those in Part III—provide interestingexamples of the different types of research methods used.

Research Design and Validity Issues

The design of a study is integrally linked to the type of research used. There arethree important facets of design (see Kerlinger, 1986). First, the design shouldensure that the research question is answered. Second, the design should control forextraneous variables, which are independent variables not intended to be measuredby the study that may have an effect on the dependent variables. This is referred toas internal validity. Third, the study should be able to generalize its results to a cer-tain extent to other subjects in other conditions; that is, it should have some bear-ing on theory. This may also be referred to as external validity. Thus, Kerlingerstated that congruence must exist between the research problem and the design,what is to be measured, the analysis, and the inferences to be drawn from the results(refer also to Cook, Campbell, & Peracchio, 1990, for issues relating to validity).By adequately taking the above points into account in the planning stages ofresearch, the researcher increases the likelihood of drawing accurate inferencesand conclusions about the nature of the problem investigated. This is because thedesign dictates what is to be observed, how it is to be observed, and how the datamust be analyzed.

As suggested previously, external validity may be an issue with experiments;however, according to C. A. Anderson, Lindsay, and Bushman (1999), the externalvalidity threats of experiments are exaggerated, given that findings in laboratoryand field settings generally converge. To avoid problems associated with externalvalidity, researchers should draw from a sampling frame that is representative of thepopulation and control as well as possible the level of artificiality, recognizing thatsome artificiality may be inherent in laboratory experiments. Furthermore, theresearcher must ensure that the experimental manipulation has had the requiredeffect (i.e., the manipulation was perceived as intended). A manipulation checkusually is used to address this issue.

As for non-experimental research, causation is an important caveat. One of thedicta of scientific research is that correlation does not imply causation.Furthermore, typical of survey methods is the assessment of two variables that areintended to represent different constructs, with the data gathered using the samemethod from the same source (e.g., a self-report questionnaire) (see Avolio,Yammarino, & Bass, 1991). The observed relationship between the variables may beattributable to a true relationship between the constructs or to bias resulting fromthe use of a common source and a common data collection method. This issue haslong been of concern to psychologists (D. T. Campbell & Fiske, 1959; D. W. Fiske,1982), particularly with respect to self-report measures (Sackett & Larson, 1990).

Finally, there are other ways in which to go about doing research, whichcannot be described by the four categories of research discussed above but which canshed interesting perspectives on the study of leadership. As an example, the analysis ofhistorical data, which is not traditionally used in applied psychological settings and

Methods for Studying Leadership——59

03-Antonakis.qxd 11/26/03 6:04 PM Page 59

Page 13: Methods for Studying Leadership

which is usually associated with the qualitative research paradigm (e.g., case studies)(see Yin, 1994), cannot be placed in the above classification scheme. Refer to Simonton(2003) for a discussion on the quantitative and qualitative analyses of historical data,and to House, Spangler, and Woycke (1991) for an example in leadership research.

Methods of Statistical Analysis

Analytical techniques (e.g., regression analysis) are nothing more than tools bywhich the researcher is able to test and refine research hypotheses and, subse-quently, theories. The hypotheses and characteristics of the data should drive thetypes of analysis conducted, not vice versa. Readers who have not had a thoroughintroduction to statistics and psychometrics should consult one of the manyavailable books (see, e.g., Nunnally & Bernstein, 1994; Sharma, 1996; Tabachnik &Fidell, 2001). Briefly, there are three major classes of methods: (a) dependencemethods of analysis, which are used in order to test the presence and strength of aneffect; (b) interdependence methods, which are used to determine relationshipsbetween variables and the information they contain; and (c) structural-equationmodels, which are the newest generation of analytical techniques useful for testingcomplex models of both presumed cause and effect that include latent/unobservedvariables and the effects of measurement error. Structural-equation modeling(SEM) is based on, and simultaneously uses principles of, path analysis, regressionanalysis, and factor analysis. It is thus a very powerful and flexible statisticalmethodology for testing theoretical frameworks (see Bollen, 1989; Kline, 1998;Maruyama, 1998). Applications of SEM in leadership are discussed below.

Context and the BoundaryConditions of Leadership Theories

The context in which leadership is enacted has not received much attention,leading House and Aditya (1997) to state that “it is almost as though leadershipscholars . . . have believed that leader-follower relationships exist in a vacuum”(p. 445). Further calls have been made to integrate context into the study of leader-ship (Lowe & Gardner, 2000) and organizational behavior (Johns, 2001; Rousseau &Fried, 2001). Context constrains the variability that potentially can be measured(Rousseau & Fried, 2001), is linked to the type of leadership that is considered asbeing prototypically effective (Lord, Brown, Harvey, & Hall, 2001), and determinesthe dispositional antecedents of effective leadership (Zaccaro, 2001).

Zaccaro and Klimoski (2001) argued that “unlike the situation as a moderator[as in contingency theories of leadership], we view situation or context as bound-ary conditions for theory building and model specification” (p. 13). According toR. M. Baron and Kenny (1986), a moderator is a categorical or continuous “variablethat affects the direction and/or strength of the relation between an independentor predictor variable and a dependent or criterion variable” (p. 1174). Thus, inaddition to finding moderators that alter the strength or the direction of a

60——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 60

Page 14: Methods for Studying Leadership

relationship between an independent (e.g., leader behavior) and dependent(e.g., leader outcomes) variable, situations also could be conceived as rangerestrictors of the types of independent variables that emerge. In other words,context should be considered in attempts to understand how phenomena likeleadership emerge, and not only the extent to which or how context may affectthe strength of relations between independent (e.g., leadership) and dependent(e.g., organizational effectiveness) variables (see Shamir & Howell, 1999).

In striving to generalize phenomena, leadership researchers often overlookcontext and draw incorrect conclusions about how a phenomenon is modeled(Antonakis, Avolio, & Sivasubramaniam, 2003). For example, a leadership model(e.g., the transformational, transactional, and laissez-faire models) may be univer-sal in the sense that the constructs of which it is composed can be measured acrossdifferent contexts; however, if the phenomenon is by nature contextually sensitive(i.e., leader prototypes and expected behaviors vary by context), the phenomenonmay “work” in a similar manner only when sampling units are homogeneous andnot when sampling units are heterogeneous (Antonakis, Avolio, et al., 2003). Inother words, the emergence and enactment of a behavior may vary by context,which includes the following, among others:

1. National culture—some leader behaviors and their enactment may be uni-versal or may vary systemically as a function of national culture (e.g., refer toDen Hartog & Dickson, Chapter 11, this volume; see also Brodbeck et al.,2000; Hofstede, 1980; Koopman et al., 1999; Meade, 1967).

2. Hierarchical leader level—leadership differs qualitatively depending on whetherleadership is exercised at a high (e.g., strategic) or low (e.g., supervisory)hierarchical level (e.g., refer to Sashkin, Chapter 8, this volume; see alsoAntonakis & Atwater, 2002; D. J. Brown & Lord, 2001; J. G. Hunt, 1991;House & Aditya, 1997; Lord, Brown, Harvey, et al., 2001; Lowe, Kroeck, &Sivasubramaniam, 1996; Sashkin, 1988a; Waldman & Yammarino, 1999;Zaccaro, 2001; Zaccaro & Klimoski, 2001).

3. Organizational characteristics—organizational stability, organizationalstructure (e.g., bureaucratic vs. organic environments), and so forth affectthe type of leadership that may be necessary (refer to Brown, Scott, & Lewis,Chapter 6, this volume; see also Antonakis & House, 2002; Avolio, 1999; Bass,1998; Lord & Emrich, 2000).

4. Leader and/or follower gender—leader behaviors may vary systematically asa function of leader gender or follower gender because of gender-role expec-tations and other factors (refer to Eagly & Carli, Chapter 12, this volume; seealso Eagly & Johnson, 1990; Lord, Brown, Harvey, et al., 2001).

5. Leadership mediated by electronic means (i.e., e-leadership; see Avolio,Kahai, & Dodge, 2001; Avolio, Kahai, Dumdum, & Sivasubramaniam, 2001;see also Antonakis & Atwater, 2002; Shamir, 1999; Shamir & Ben-Ari,1999)—the nature of the influence process may vary in conditions whereleaders and followers are not face-to-face.

Methods for Studying Leadership——61

03-Antonakis.qxd 11/26/03 6:04 PM Page 61

Page 15: Methods for Studying Leadership

Types of leader behaviors that are enacted may vary as a function of context;however, so may the dispositional (i.e., leader characteristics) antecedents linked tothose behaviors. For example, cognitive requirements differ as a function of leaderhierarchical level (Zaccaro, 2001; see also Zaccaro et al., Chapter 5, this volume, andSashkin, Chapter 8, this volume). Finally, the level of analysis at which a leadershipphenomenon may hold will also vary by context (Antonakis & Atwater, 2002;Waldman & Yammarino, 1999). That is, the level at which leadership operates mayvary from individual to group, or organizational level, depending on the context(e.g., CEO-level vs. supervisory-level leadership) in which it is enacted.

It thus becomes evident that researchers need to consider contextual factorsand levels of analysis in theory development and measurement (see Antonakis,Avolio, et al., 2003; Dansereau, Alutto, & Yammarino, 1984; Schriesheim, Castro,Zhou, & Yammarino, 2001; Zaccaro & Klimoski, 2001) and then use the appropri-ate methods to gauge these factors and levels (i.e., boundaries). This perspectiveshould complement traditional notions of context, in which context is seen as amoderator of the relation between leader characteristics (e.g., traits, behaviors) andleader outcomes.

The Detection of Boundary Conditions of Theories

How are boundary conditions of theories detected? In this section, we discussfour quantitative approaches that are useful in detecting moderators or contextualfactors, which can be considered as boundary conditions of theories (see James,Mulaik, & Brett, 1982). For each of the four methods discussed below (i.e., levels ofanalysis, structural-equation modeling, moderated regression, and meta-analysis),we provide relevant examples from the leadership literature to which readers mayrefer. We do not cover interaction effects—as in traditional factorial ANOVAdesigns, which are useful for the detection of moderation (see R. M. Baron & Kenny,1986), especially in experimental designs—because we assume that most readersare familiar with the technique. In our discussions below, we focus extensively onlevels-of-analysis issues, because many, if not most, researchers in psychologyand management generally ignore levels-of-analysis concerns. We also extensivelydiscuss structural-equation modeling, a very powerful but underutilized statisticalmethod in leadership research.

Levels of Analysis

In recent years, level of analysis has emerged as an important issue in organiza-tional research (e.g., Klein, Dansereau, & Hall, 1994; Rousseau, 1985). Organizationsare inherently multilevel, because individuals work in dyads and groups. Furthermore,several groups or departments make up an organization, organizations make up anindustry, and so on. Few, if any, constructs are inherently restricted to one level ofanalysis. Typically, constructs may operate at one or more levels, such as individ-uals, dyads, groups, departments, organizations, and industries (Rousseau, 1985).

62——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 62

Page 16: Methods for Studying Leadership

Thus, it is now increasingly recognized that good theory should also specify thelevel of analysis at which its phenomena operate (see Dansereau, Alutto, et al.,1984), because level of analysis can be conceived as a boundary condition of atheory (Dubin, 1976). Levels of analysis might even be considered as a sixth crite-rion that should be assessed when one considers a theory’s adequacy (in additionto Filley et al.’s, 1976, original five criteria discussed previously).

Levels of analysis make studying leadership in organizations very complexbecause leadership phenomena may operate on any or all levels. Consider, forexample, performance. This variable can be examined on an individual, dyadic,group, departmental, organizational, or industry level. The specification and test-ing of the levels at which constructs of a theory operate is therefore importantbecause research conclusions may differ as a function of the level of analysis that isemployed (Dansereau, Alutto, et al., 1984; Klein et al., 1994).

The correlates or causes of individual performance may be very different fromthe correlates or causes of group or organizational performance. Because obtainedresults may vary depending on the level at which the data are analyzed, erroneousconclusions can be drawn if the level tested is incongruent with the level specifiedby a theory (Dansereau, Alutto, et al., 1984; W. H. Glick, 1985). These erroneousconclusions include, among others, ecological fallacies (i.e., using aggregated datato make inferences about individuals) or individualistic fallacies (i.e., usingindividual-level data to make inferences about groups; see Pedhazur, 1997, forexamples) and other types of misspecifications (Rousseau, 1985).

A recent special issue of Leadership Quarterly (2002, Vol. 13, No. 1) highlightedproblems of failing to take levels into account. For example, a data set purposefullyanalyzed at the individual level of analysis using regression techniques failed todetect a moderation effect of leadership climate on task significance of followers,because the moderation effect did not operate at the individual level of analysis(Bliese, Halverson, & Schriesheim, 2002). However, when using various methodsdesigned to test for multilevel effects, evidence of a moderator (i.e., leadershipclimate) was evident (see Bliese & Halverson, 2002; Gavin & Hofmann, 2002;Markham & Halverson, 2002) and leadership climate exhibited group-level affectsthat were ignored by other methods.

Thus, if theory, analysis, and measurement level are not correctly specified andaligned, “we may wind up erecting theoretical skyscrapers on foundations ofempirical jello” (Schriesheim, Castro, Zhou, et al., 2001, p. 516). For example, ifa leader’s behavior—as perceived by followers—is not homogeneously viewed byfollowers, then the leader’s behavior operates at the individual level of analysis.Therefore, any inferences that are made should be based on the individual and useindividual-level data, because individual responses are independent. However, ifthe leader’s behavior is viewed homogeneously, then it is justifiable to aggregatethe individual data to the group level and make inferences at the group level ofanalysis, because individual responses are dependent on group membership.

There are several examples of leadership research incorporating levels-of-analysisperspectives. However, in leadership research only a small group of researchers haveused methods developed to test levels-of-analysis effects. We cite three examples

Methods for Studying Leadership——63

03-Antonakis.qxd 11/26/03 6:04 PM Page 63

Page 17: Methods for Studying Leadership

below of levels-of-analysis perspectives regarding leadership theories and show thatconsideration of the boundary condition of levels leads to better theories that aremore applicable to practice. Although there is now a broader range of researcherstesting theories using multilevel methods, we stress that there is still insufficientattention paid to levels-of-analysis issues by mainstream leadership, organizationalbehavior, and management researchers.

In the first example, Yammarino (1990) demonstrated that correlationsbetween leader behavior and outcome variables may exhibit differential levels-of-analysis effects. For instance, the correlation between a leader’s group-directedinitiating-structure behavior and role ambiguity perceptions of followers may bevalid at the individual level of analysis; however, the correlation between aleader’s group-directed consideration behavior and role ambiguity perceptions offollowers may be valid at the group level of analysis. The practical implications ofsuch results suggest that leaders enact their initiating-structure behaviors in anindividualized manner with respect to follower role ambiguity (because individualperceptions of these behaviors are not homogeneously held by followers). In con-trast, when follower perceptions with an outcome (e.g., role ambiguity) are homo-geneous, then the leader can enact group-wide behaviors. Yammarino (1990) alsodemonstrated that raw (i.e., total) correlations may be ambiguous because cor-relations of similar magnitudes may have different levels of analysis effects. Finally,Yammarino’s study is notable for showing that wording questions in a way thatuses the group as a referent does not ensure that the group will be the resultinglevel of analysis. This finding has implications for question or item design.

In the second example, Schriesheim, Castro, Zhou, et al. (2001) argued thatalthough leader-member exchange (LMX) theory was originally conceived of asbeing a dyadic theory (i.e., that the level of analysis is the leader-follower dyad),current conceptualizations of the theory basically ignore the theory’s dyadic rootsand generally analyze LMX data either from the leaders’ or followers’ perspectives.Unfortunately, the theory is still not well understood, because level-of-analysiseffects associated with the theory have hardly been tested (Schriesheim, Castro, &Cogliser, 1999; Schriesheim, Castro, Zhou, et al., 2001). After testing the levels-of-analysis effects of the theory, Schriesheim, Castro, Zhou, et al. (2001) found rel-atively strong support for the dyadic nature of the theory, a finding with importantimplications for data gathering and analysis practices. That is, matching data (i.e.,using both followers and corresponding leaders) should be collected, and levels-of-analysis effects should be examined on the dyadic relation (while ruling outcompeting levels). Until we have more research that tests the level of analysisat which the theory is valid, “all the extant [LMX] research is fundamentallyuninformative” (Schriesheim, Castro, Zhou, et al., 2001, p. 525).

In the third example, Yammarino and Dubinsky (1994) showed that even thoughthe literature explicitly or implicitly asserted that transformational leadership wouldbe evident at a higher level of analysis than the individual level, empirical datasuggest that most of the effects are evident at the individual level of analysis. Thatis, “what one individual perceives differs from what others perceive” (Yammarino &Dubinsky, 1994, p. 805), even though these individuals may be perceiving the same

64——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 64

Page 18: Methods for Studying Leadership

social object (i.e., their leader). These results have been demonstrated repeatedly(e.g., Yammarino & Bass, 1990; Yammarino, Spangler, & Dubinsky, 1998), suggest-ing that transformational (and transactional) leaders have differential effects onfollowers and that they do not cause similar outcomes in groups of followers (referto Antonakis & Atwater, 2002, for theoretical arguments concerning when level ofanalysis may be expected to change).

Multilevel Analysis Methods. There are many ways in which one can test for levelseffects, from whether it is appropriate to aggregate data to a higher level of analysis(e.g., aggregating a group of followers’ ratings of a leader) to partitioning variancein outcome measures (i.e., determining the extent to which outcomes areaccounted for by various independent variables). We briefly discuss four statisticalprocedures that are useful for conducting research at multiple levels of analysis. Formore technical discussions regarding the strengths and weaknesses of the proce-dures, refer to Castro, (2002), George and James (1993), Schriesheim (1995),Yammarino (1998), and Yammarino and Markham (1992).

The rwg coefficient (James, Demaree, & Wolf, 1984) was developed primarily asan index of interrater agreement within a single group (Castro, 2002). As such, it isuseful to determine whether responses of group members can be aggregated to thegroup level; however, it is less useful for making comparisons between groups (i.e.,to determine whether variability exists between groups) because it was notdesigned to perform such a function (see Castro, 2002; Yammarino & Markham,1992). Thus, it may be more lenient than the other aggregation-testing methodsdiscussed below.

Intraclass correlation coefficients (ICCs) are useful for determining the extent towhich variance of individual responses is attributed to group membership andwhether means of groups are reliably differentiated (see Bliese, Halverson, et al.,2002; Castro, 2002). Thus, by design, ICCs may be more generally useful than is rwg,especially if between-group variability is theoretically important and requiresdetailed examination.

Within and between analysis (WABA) assess the extent to which variables—andtheir covariation—exhibit variation within or between groups by testing thewithin- and between-group variation for statistical and practical significance(Dansereau, Alutto, et al., 1984). WABA should be used when the assumption foraggregating to a higher level requires testing, especially if between-group variationis theoretically important (see Yammarino, Dubinsky, Comer, & Jolson, 1997).WABA also can be used to test certain types of cross-level effects (Castro, 2002;Yammarino, 1998).

WABA has been criticized for being too conservative, especially in situations whererange restrictions may exist between groups. For example, in many of the studiescited previously that used WABA (e.g., Yammarino & Bass, 1990; Yammarino &Dubinsky, 1994; Yammarino, Spangler, et al., 1998) and found individual-leveleffects for leadership, sampling units were restricted to the same or similar contexts.Thus, it is possible that because of similar between-group contextual conditionsand the resulting range restriction that occurs, higher levels of analysis effects were

Methods for Studying Leadership——65

03-Antonakis.qxd 11/26/03 6:04 PM Page 65

Page 19: Methods for Studying Leadership

attenuated and thus not detected by WABA (see George & James, 1993;Schriesheim, 1995). Therefore, one must very carefully choose heterogeneoussampling units if one is looking for between-group differences (Yammarino &Markham, 1992). Practically speaking, though, as an omnibus test, WABA isparticularly useful for determining the extent of within-group homogeneity,whether higher-level relations are evident, and whether higher-level entities areuseful as predictors of outcome variables (Schriesheim, 1995; Yammarino, 1998).

Hierarchical linear modeling (HLM) is a regression-based procedure usefulfor testing relationships that exists at different levels of analysis (i.e., cross-levelhierarchical relationships) (Hofmann, 1997). Specifically, HLM can estimate differ-ent sources of variance in dependent variables attributable to independent variablesat the same or higher level of analysis and can be used for detecting cross-levelmoderators (Castro, 2002; Gavin & Hofmann, 2002). HLM does not test whether itis appropriate to aggregate data at a higher level of analysis (Castro, 2002;Hofmann, 1997; Yammarino, Dubinsky, et al., 1997). Thus, HLM should be com-plemented by other procedures designed to test whether aggregation is appropriate(Castro, 2002). For extensions of HLM-type procedures using structural-equationmodeling (i.e., MLM or multilevel modeling), refer to Heck and Thomas (2000).

Structural-Equation Modeling

Multiple-groups structural-equation modeling (SEM) can be useful for the detec-tion of moderators (R. M. Baron & Kenny, 1986; James, Mulaik, et al., 1982; Kline,1998). Although this is a potentially powerful and useful procedure, unfortunately,it is “not used all that frequently . . . in social science literature” (Maruyama, 1998,p. 257) and in leadership research in particular. Indeed, we found it difficult tolocate a good example (see Dorfman, Howell, Hibino, Lee, Tate & Bautista, 1997) ofthe procedure in the leadership literature.

Essentially, the question asked in this type of analysis is “Does group member-ship moderate the relations specified in the model[?]” (Kline, 1998, p. 181). Inthis type of analysis, one is thus interested in testing whether structural parametersare equivalent across groups. For example, one can test whether the relationbetween an independent variable (e.g., leader behavior) and an outcome factor(e.g., follower motivation) is the same across groups. In this case, the groupingvariable is the moderator. If the constraint of equivalence in the parameter doesnot result in a significant decrement in model fit, then one can conclude thatgroup membership does not moderate the relation between the independent anddependent variable (Kline, 1998). If, however, the fit of the constrained model issignificantly worse, then one can conclude that the grouping variable moderates theparticular structural relation (Kline, 1998).

Using the procedures discussed above, Dorfman, Howell, Hibino, et al. (1997)found that the relations between leader behaviors and leader outcomes varied whenusing several national cultures as moderator groups (for further discussion of theirresults, refer to Den Hartog & Dickson, Chapter 11, this volume). Note that in thisexample, the moderator variable was a grouping factor (i.e., a categorical variable).

66——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 66

Page 20: Methods for Studying Leadership

However, interaction effects using continuous moderator variables also can bespecified in SEM (see Kenny & Judd, 1984).

On a more basic level, SEM also is useful for determining whether constructs areequivalent across theoretically distinct groups (e.g., based on national culture).There are various degrees of equivalence or invariance. For example, one can deter-mine whether the same indicators for a factor are associated in the same way acrossgroups (i.e., factor pattern loadings are the same). This condition is referred to asconfigural invariance (Vandenberg & Lance, 2000). Establishing configural invari-ance is absolutely critical so that further tests between groups can be conducted,because if the factors are not associated with the same indicators, then the “respon-dent groups were [not] employing the same conceptual frame of reference” withrespect to how the factor is perceived (Vandenberg & Lance, 2000, p. 37). Once con-figural invariance is established, further intergroup restrictions can be placed todetermine if the model fit becomes significantly worse based on the chi-squareddifference test (i.e., likelihood ratio test between the configural condition and themore restrictive condition). An example of a study in which a configural invari-ance condition was tested is that of Gillespie (2002), who found that a 360-degreeleadership feedback survey did not satisfy the criteria for configural invarianceacross four cultures. This result suggests that the raters in the various countriesconceptualized the factors in different ways, indicating that further cross-countrycomparisons on the factors would be unwarranted.

A more stringent condition of equivalence is metric invariance, which, if found,suggests that a unit change in the factor can affect the indicators in the same wayacross different groups (Vandenberg & Lance, 2000). If metric invariance is found,one is confident that the loadings of the respective indicators of the factor areequivalent between groups (e.g., see Dorfman, Howell, Hibino, et al., 1997). Finally,scalar invariance tests whether the intercepts of the indicators are equivalent acrossgroups, a result that has implications regarding systematic bias in responses(Steenkamp & Baumgartner, 1998; Vandenberg & Lance, 2000). If a model hasconfigural, metric, and scalar invariance, or at least partial configural, metric, andscalar invariance, then further substantive tests (e.g., latent mean difference tests,see Byrne, Shavelson, & Muthén, 1989; Steenkamp & Baumgartner, 1998) canbe performed (e.g., Antonakis, Avolio, et al., 2003, conducted a latent mean differ-ence test on a range of leadership dimensions using leader-follower gender as agrouping factor).

Unfortunately, however, Cheung and Rensvold (2000) noted that oftentimesresearchers take what they called a naive approach, especially when conductingcross-cultural research, by simply comparing scales (i.e., item composites) acrossgroups using a t test or an ANOVA-type procedure, without having verifiedwhether what is being compared is equivalent (see X. Zhou, Schriesheim, & Beck,2001, for an illustration of how invariance may be tested). Thus, the “observeddifference, if there is one, may be due to a [grouping] difference; however, it mayalso be due to one or more invariance failures in the measurement model” (p. 188).As an example, Kakar, Kets de Vries, Kakar, and Vrignaud (2002) compared Indianand U.S. managers on a range of leadership factors and found significant mean

Methods for Studying Leadership——67

03-Antonakis.qxd 11/26/03 6:04 PM Page 67

Page 21: Methods for Studying Leadership

differences between the groups using a t test. Thus, because they did not establishmodel equivalence before they made comparisons, their results should be viewed ashighly suspect.

Moderated Regression

As noted by Schriesheim, Cogliser, and Neider (1995), testing for moderationusing regression was (and often still is) conducted by using a median split in themoderator variable and then running separate regressions for the groups to deter-mine whether parameter estimates differed. A more powerful procedure, moder-ated multiple regression (see Cohen & Cohen, 1983), treats the moderator variableas a continuous variable, thus retaining more information on the interaction effect(Schriesheim, Cogliser, et al., 1995). Moderated multiple regression uses a hierar-chical regression procedure in which the independent variable is added first to theregression model, followed by the moderator variable. In the final step, the interac-tion (i.e., the cross-product) is added and assessed to determine the unique vari-ance that is predicted by the interaction term (see also Schriesheim, 1995).

As demonstrated by Schriesheim, Cogliser, et al. (1995), who reanalyzed the dataof a study conducted by Schriesheim and Murphy (1976), major discrepancies canarise between the split-group and moderated regression procedures. For example,although Schriesheim and Murphy originally found that role clarity did not mod-erate the relation between leader behaviors and outcomes (e.g., follower perfor-mance and satisfaction), Schriesheim, Cogliser, et al. (1995) actually foundmoderation effects. Specifically, initiating structure was positively associated withperformance under high role clarity conditions but negatively associated withperformance under low role clarity conditions. However, initiating structure (andconsideration) was negatively associated with satisfaction under high role clarityconditions but positively associated with performance under low role clarity con-ditions. The results relating to performance were opposite to what was hypothe-sized, as based on the prevailing literature. Given that previous studies testing formoderation have used the weaker procedure, Schriesheim, Cogliser, et al. (1995)concluded “we do not believe that sound conclusions can be drawn from much ofthe existing leadership literature—at least not without making heroic (and quiteprobably unfounded) assumptions about the equivalency of analytic proceduresand, of course, the results they produce” (p. 135). For further information aboutmoderated regression analysis and how to improve the likelihood of finding mod-erators in leadership research, refer to Villa, Howell, Dorfman, and Daniel (2003).

Meta-Analysis

Glass (1976) differentiated three levels of analysis in research: primary analysis, inwhich researchers analyze the results of data they have gathered; secondary analysis,which refers to the reanalysis of primary data to better answer the original researchquestion or to answer new questions; and meta-analysis, which is the analysis andsynthesis of analyses of independent studies. This technique is useful where a domain

68——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 68

Page 22: Methods for Studying Leadership

needs to be synthesized, by integrating the results of various studies and reconcilingtheir diverse or conflicting findings (for various meta-analytic techniques, refer toHedges & Olkin, 1985; Hunter & Schmidt, 1990; Rosenthal, 1991). Essentially, withmeta-analysis, the researcher can determine the population correlation coefficient—or other indicators of effect—between independent and dependent measures bycontrolling for measurement and statistical artifacts (i.e., errors). Meta-analysis alsois useful for detecting moderator effects (see Sagie & Koslowsky, 1993).

Lord, De Vader, and Alliger (1986) reanalyzed Mann’s (1959) data and foundthat intelligence showed a mean corrected correlation of .52 with leadership emer-gence (whereas Mann had declared that intelligence was not strongly associatedwith leadership). As mentioned by Antonakis, Cianciolo, and Sternberg (Chapter 1,this volume) and Zaccaro et al. (Chapter 5, this volume), the Lord et al. study wasinstrumental in refocusing efforts on individual-difference predictors of leadership,and the main the reasons why Lord et al. were able to show strong links betweenintelligence and leadership was that they used a more sophisticated and applicablemethod to synthesize research findings.

An example of a meta-analysis in which moderators were detected is the studyof Lowe, Kroeck, et al. (1996), who analyzed the results of studies using theMultifactor Leadership Questionnaire (MLQ). They found that the type of organi-zation (i.e., public or private) in which leadership measures were gathered and thetype of criterion used (i.e., subjective or objective) moderated the relations betweenleadership measures and outcome variables. For example, Lowe et al. reported thatthe mean corrected correlation between charisma and effectiveness was .74 and .59in public and private organizations, respectively, and that the difference betweenthe correlations was significant. Many other interesting examples of meta-analyticstudies have been conducted in the leadership field (e.g., DeGroot, Kiker, & Cross,2000; Eagly & Johnson, 1990; Judge, Bono, Ilies, & Gerhardt, 2002; Schriesheim,Tepper, & Tetrault, 1994), to which readers are encouraged to refer.

Conclusion

In conclusion, we have presented a summary of key research-method subdomainsand other important research-related issues that are likely to have value for personsinterested in reading about or conducting leadership research. Because compe-tency in research methods requires extensive knowledge, we could only brush thesurface in many of the areas that we have discussed. We therefore would like toencourage readers to pursue further knowledge in the different areas by consultingand reading the reference citations that appear throughout this chapter and inthe other chapters that have discussed research findings. By doing so, they are likelyto improve the quality of future leadership research, as consumers demand higher-quality research and producers take the necessary precautions to ensure the validityof their findings.

Prior to concluding this chapter, we would like to address one final issue—thatof judging the quality of the “concluding” or “discussion” section of a paper, article,

Methods for Studying Leadership——69

03-Antonakis.qxd 11/26/03 6:04 PM Page 69

Page 23: Methods for Studying Leadership

or chapter. The discussion section of any of these addresses the implications of theresults for future theory and, perhaps, leadership practice. Discussion sectionssometimes get speculative, with implications and conclusions being drawn that gowell beyond what the data actually support. Readers therefore need to evaluate suchdiscussions carefully and in the light of the actual study findings or the findings ofother research in the same domain. Discussion sections oftentimes talk aboutneeded new directions for future research and comment on limitations of thereported research, such as a study’s use of convenience sampling, which can behelpful in highlighting concerns that others can address in future research. Thepractice of acknowledging study shortcomings is also congruent with the philoso-phy of science position that all research is flawed in one way or another and thatknowledge can be built only through multiple investigations of the same phenom-ena, using different samples, methods, analytic practices, and so forth. The analogythat we sometimes use is that doing research is like fishing with a net that has oneor more holes (shortcomings). Using multiple nets, each with holes in differentplaces, enables us to catch and hold that which we seek (knowledge), whereas theuse of a single net is much less likely to do so.

To conclude, we will ask a question that probably crossed the minds of manyreaders. How is it possible that we need to make exhortations about raising thequality of research? Do journals not ensure that the research they publish is of ahigh quality? Indeed, most research published in top-quality journals is well done;however, from time to time, even top journals have holes in their review nets.Furthermore, there are journals and then there are journals! In other words, jour-nal quality is variable, and not all journals publish high-quality scientific work.Thus, research that is not published in high-quality journals or that does not meetthe necessary standards of scientific research must be looked at skeptically, espe-cially if this research is published in popular books that address fashionable topics(e.g., traits that have not been tested scientifically and that have not demonstratedincremental predictive validity over and above established psychological traits)(see Antonakis, in press; Zeidner, Matthews, & Roberts, in press).

We trust that we have made the point that to generate scientific knowledgethat is useful for society is not an easy task and requires extensive methodologicalexpertise. If one chooses to drink the elixirs of witch doctors, one does so at one’sown peril!

70——THE COMPLEXITY, SCIENCE, AND ASSESSMENT OF LEADERSHIP

03-Antonakis.qxd 11/26/03 6:04 PM Page 70