Writing and marking examinations for paediatrics

5
Archives of Disease in Childhood 1996; 74: 469-473 MEDICAL EDUCATION Writing and marking examinations for paediatrics Graham Clayden Examiner's responsibility There are some general principles in assess- ment that should always be considered irre- spective of the assessment tool and the subject matter. The elements being assessed should obviously be those that should have been learned. This presupposes that the candidate has been guided by a list of clear learning objectives, has had teaching and clinical experience based on these objectives, and the knowledge that the examination is based on these. This sounds easy but is very difficult to achieve. Aims and objectives of courses are very straightforward to write even though an onerous and tedious task at times. However, the depth of acquisition of knowledge or the level of skill required at a particular stage is much more difficult to define or about which to achieve a consensus. This is the challenge for the question designer. The healthy recogni- tion of the need to provide a statement of core knowledge and skill at the different stages of undergraduate medical education and later postgraduate training is helping to focus many minds on this problem. The bare minimum in paediatrics for undergraduates is that we should insist that every student should have reached a level where they are able to recognise a potentially ill child, know the most appropri- ate immediate action, be aware of the changing susceptibility and vulnerability to disease, trauma and treatment with age, respect the child's relationship with the family and with the society within which the child lives. This acquisition must then be so indelible as to last the professional life of the doctor assuming he or she will be one of those who might have no further training in paediatrics or child health. We need to worry less about those who eventu- ally pursue a career involving further training in our subject, as they will be taught and tested again. However, undergraduate course design should have the recruitment of future children's doctors in mind. Blocking the further progress of an undergraduate because of failing to reach the minimal level in paedi- atrics is a serious step. We must be confident for the sake of the candidate that we are testing vital areas. On this reasoning it should be expected that anyone completing their under- graduate course in paediatrics, and who is taught effectively and who works reasonably diligently, should pass the core examination easily. If they fail it should be because they have become mentally ill and/or have had insufficient motivation to attend the course provided they were appropriately selected for medical school initially. It must not be because they have been subjected to tests that miss the points for which they have prepared. So questions must be prepared with care and be rigorously reviewed or discarded according to changes in knowledge and importance. The demands for examining postgraduate trainees are no less challenging. Careers may be blighted by an unfair failure at a critical stage in a candidate's life. Equally an unjusti- fied pass allowing a candidate to be promoted into a position of responsibility for which he or she is ill equipped may have even longer devas- tating effects. The risks to the child patients are obvious but the lasting damage to the trainee's confidence, as well as the potential legal con- sequences make this a very real factor for examiners to balance against their natural sym- pathy for the anxious candidate with whom every examiner can identify as a result of their own experience. Choosing the type of paediatric examination questions It is important to decide on the area of know- ledge/ability and the depth to which it must be tested. Table 1 covers the range of competen- cies that most doctors need to use in practice. Each area is also subdivided into increasing depth. For every stage of medical education and paediatric training the appropriate level for these should be agreed. Ideally, the examination should be con- structed of a mixture of methods and ques- tions. These should test as wide a range of these competencies within the practical con- straints of the examination while recognising the limitations of the mental and physical stamina of the candidate and examiner. The balance between the subjects is usually a compromise between the availability of rele- vant clinical material and the time available to examine a particular number of candidates. A key task for the examiner is to choose the assessment tool to match the content. Table 2 shows how the commonly used assessment tools fit in with the competencies and their levels in table 1 above. Written papers' The problem of the traditional essay question is the subjective nature of the marking. This may be improved by the provision of detailed 7This is the tenth in a series on medical education. Department of Paediatrics, St Thomas's Hospital, Lambeth Palace Road, London SEI 7EH Correspondence to: Dr Clayden. 469 on December 9, 2021 by guest. Protected by copyright. http://adc.bmj.com/ Arch Dis Child: first published as 10.1136/adc.74.5.469 on 1 May 1996. Downloaded from

Transcript of Writing and marking examinations for paediatrics

Archives ofDisease in Childhood 1996; 74: 469-473

MEDICAL EDUCATION

Writing and marking examinations for paediatrics

Graham Clayden

Examiner's responsibilityThere are some general principles in assess-ment that should always be considered irre-spective of the assessment tool and the subjectmatter. The elements being assessed shouldobviously be those that should have beenlearned. This presupposes that the candidatehas been guided by a list of clear learningobjectives, has had teaching and clinicalexperience based on these objectives, and theknowledge that the examination is based onthese. This sounds easy but is very difficult toachieve. Aims and objectives of courses arevery straightforward to write even though anonerous and tedious task at times. However,the depth of acquisition of knowledge or thelevel of skill required at a particular stage ismuch more difficult to define or about whichto achieve a consensus. This is the challengefor the question designer. The healthy recogni-tion of the need to provide a statement of coreknowledge and skill at the different stages ofundergraduate medical education and laterpostgraduate training is helping to focus manyminds on this problem. The bare minimum inpaediatrics for undergraduates is that weshould insist that every student should havereached a level where they are able to recognisea potentially ill child, know the most appropri-ate immediate action, be aware of the changingsusceptibility and vulnerability to disease,trauma and treatment with age, respect thechild's relationship with the family and withthe society within which the child lives. Thisacquisition must then be so indelible as to lastthe professional life of the doctor assuming heor she will be one of those who might have nofurther training in paediatrics or child health.We need to worry less about those who eventu-ally pursue a career involving further trainingin our subject, as they will be taught and testedagain. However, undergraduate coursedesign should have the recruitment of futurechildren's doctors in mind. Blocking thefurther progress of an undergraduate becauseof failing to reach the minimal level in paedi-atrics is a serious step. We must be confidentfor the sake of the candidate that we are testingvital areas. On this reasoning it should beexpected that anyone completing their under-graduate course in paediatrics, and who istaught effectively and who works reasonablydiligently, should pass the core examinationeasily. If they fail it should be because theyhave become mentally ill and/or have hadinsufficient motivation to attend the course

provided they were appropriately selected formedical school initially. It must not be becausethey have been subjected to tests that miss thepoints for which they have prepared. Soquestions must be prepared with care and berigorously reviewed or discarded according tochanges in knowledge and importance.The demands for examining postgraduate

trainees are no less challenging. Careers maybe blighted by an unfair failure at a criticalstage in a candidate's life. Equally an unjusti-fied pass allowing a candidate to be promotedinto a position of responsibility for which he orshe is ill equipped may have even longer devas-tating effects. The risks to the child patients areobvious but the lasting damage to the trainee'sconfidence, as well as the potential legal con-sequences make this a very real factor forexaminers to balance against their natural sym-pathy for the anxious candidate with whomevery examiner can identify as a result of theirown experience.

Choosing the type ofpaediatricexamination questionsIt is important to decide on the area of know-ledge/ability and the depth to which it must betested. Table 1 covers the range of competen-cies that most doctors need to use in practice.Each area is also subdivided into increasingdepth. For every stage of medical educationand paediatric training the appropriate level forthese should be agreed.

Ideally, the examination should be con-structed of a mixture of methods and ques-tions. These should test as wide a range ofthese competencies within the practical con-straints of the examination while recognisingthe limitations of the mental and physicalstamina of the candidate and examiner. Thebalance between the subjects is usually acompromise between the availability of rele-vant clinical material and the time available toexamine a particular number of candidates.A key task for the examiner is to choose the

assessment tool to match the content. Table 2shows how the commonly used assessmenttools fit in with the competencies and theirlevels in table 1 above.

Written papers'The problem of the traditional essay questionis the subjective nature of the marking. Thismay be improved by the provision of detailed

7This is the tenth in a serieson medical education.

Department ofPaediatrics, StThomas's Hospital,Lambeth Palace Road,London SEI 7EH

Correspondence to:Dr Clayden.

469 on D

ecember 9, 2021 by guest. P

rotected by copyright.http://adc.bm

j.com/

Arch D

is Child: first published as 10.1136/adc.74.5.469 on 1 M

ay 1996. Dow

nloaded from

Clayden

Table 1 Paediatric competency and levels of depth

LevelCompetencies Increasing depth

Knowledge: Recall Interpretation JudgmentRecognition: Naming Linking Weighing evidenceInvestigation: Using InterpretingQuestioning: Recall Listening Understanding Ensuring feedback Sensitivity/rapportExamining: Technique Completeness Relevance Style InterpretationPlanniing: Recall Synthesis Audit AdaptationExplaining: Recall Clarity Accuracy SensitivityEmpathising: Respect Motivation Sympathy

marking schemes especially if essential keypoints are listed. Failure to mention an agreednumber of these points in an answer wouldlead to a low mark. Double marking byexaminers should also help especially if blindto each others marks.

This is a cumbersome method of testingfactual recall but does provide a medium totest written communication skills. The abilityto explore complicated subjects and to useclear arguments can be tested by essay ques-tions such as writing a letter to a parent,teacher, or general practitioner about anethical problem or explaining complex physiol-ogy in simple language. Marking essays is verytime consuming for the examiners. Short essayquestions are reasonably reliable,2 and struc-tured short answer questions are also prefer-able to the long essay.3

Modified essay questions (MEQ) allow theexaminer to present sections of informationsuch as parts of a case history and to ask ques-tions requiring short statement answers at thesestages. It is easier to achieve a consensus byexaminers for the answers to the section ques-tions than to find an agreed model answer foressay questions. All forms of modified essayquestions require a great deal of time in theirpreparation and their value is still under debate.4

Essentials ofthe multiple choice questionThe multiple choice question (MCQ) is oftenthe brunt of criticism by examiners and nearlyovert hatred by candidates. The examinersmay feel that they are constrained by thenarrow structure and scope of the MCQ,although few doubt the need to assess the basicknowledge that should be resident in the

Table 2 Paediatric competency, depth, and assessment method

LevelCompetencies Increasing depth >

Knowledge: Recall Interpretation JudgmentTests MCQ Case commentary Best offive choice

Short note questions Oral examinationsRecognition: Naming Linking Weighing evidence

Tests Photo/video Case commentary Best offive choiceOSCE and quizes

Investigation: Using InterpretingTests MCQ Data interpretation questions, essays andMEQ

Questioning: Recall Listening Understanding Ensuring feedback Sensitivity/rapportTests Long clinical case History OSCE station Observed history

Examining: Technique Completeness Relevance Style InterpretationTests Examination OSCE station Short clinical cases Video OSCE

Planing: Recall Synthesis Audit AdaptationTests MCQ Essays Oral exams and best offives

Explaining: Recall Clarity Accuracy SensitivityTests Long dinical case Surrogate OSCE station Observed explanation

Empathising: Respect Motivation SympathyTests Observed history and actual practice

MEQ=modified essay questions

memory. No one can read even this pagewithout the knowledge of words being used.The candidates' antipathy would evaporate ifthey were to compare the objectivity of theMCQ with any other knowledge test. A greatdeal of effort and analysis has shown thatMCQs have high reliability if constructedproperly.5 6

There are essentials for constructing MCQssuch as:* Using a succinct stem where each itemcreates a complete sentence* Avoiding cueing the answer with items orother questions* Avoiding mutually exclusive answers* Avoiding testing more than one fact peritem* Avoiding vague words like usually,maybe, sometimes* Avoiding unwitting conventions like'characteristic' always being used for 'true'answers* Avoiding 'always' or 'never' which mustbe false in medicine* Avoiding repetitive words in items whereit is possible to refine the stem.For example that list could have been writ-

ten as follows.MCQs can be improved by avoiding:

* Grammatical mismatch of the stem witheach item which should form a completesentence* Cueing answer with items or other ques-tions* Mutually exclusive answers* Testing more than one fact per item* Vague words like usually, maybe, some-times* Unwitting conventions like characteristicalways being used for 'true' answers* 'Always' or 'never' which must be false inmedicine* Repetitive words in items where it ispossible to refine the stem.Probably the commonest error in setting

MCQs is the overestimation of the abilities ofthe candidate; there is seldom a too easy ques-tion. If examiners tried to write questions thatthey would be shocked if a student at that stagewere to get wrong if asked orally on an infor-mal ward round, the level of difficulty wouldbe about right. For example an undergraduateattending a ward round might be expected toknow the following about a child who hasrecently been diagnosed as having diabetesmellitus:

Polyuria presenting as secondary enuresis inthis child is not unusual. Infection may havebeen a precipitator of the onset of diabeticketoacidosis at this stage. Diabetes was diag-nosed by the general practitioner discoveringglycosuria and ketonuria and the hospitalconfirmed the hyperglycaemia and the dehy-dration. The child was tachvpnoeic as a con-sequence of the acidosis. Careful rehydrationwith respect for the potassium concentrationwas carried out intravenously in view of thechild's vomiting. Insulin is given parenterallyand care is taken to avoid hypoglycaemia.Abdominal pain is a frequent finding in

470 on D

ecember 9, 2021 by guest. P

rotected by copyright.http://adc.bm

j.com/

Arch D

is Child: first published as 10.1136/adc.74.5.469 on 1 M

ay 1996. Dow

nloaded from

Writing and marking examinations for paediatics

children with diabetic ketoacidosis but thedifferential diagnosis must always includeappendicitis and urinary tract infection.Children with diabetes are prone to urinarytract infections. The prognosis is good forsurvival but the diabetes will be permanent.Control during childhood is likely to berelatively straightforvard but problems ofcompliance may arise in adolescence. Thelong term complications of diabetes areunlikely to occur in childhood but good controlmay delay their onset.If this set of information bites is converted

into MCQs a series of questions of appropriatedifficult can be constructed.

MCQ 1 - Recognised presenting features ofdiabetes mellitus in childhood include:

A. Secondary enuresisB. Excessive weight gainC. Abdominal painD. Reluctance to drinkE. Tachypnoea

Key: ACE are true or often expressed asTFTFT.

MCQ 2 - Diabetes mellitus in childhood:A. Is often temporaryB. Is effectively treated with insulin given

orallyC. Leads to renal failure in adolescents in

the majority of casesD. Predisposes to urinary tract infectionsE. Becomes easier to treat during adoles-

cenceKey: D is true or often expressed as FFFTF.

MCQ 3 - In a 6 year old boy who presents withincreasingly severe abdominal pain the follow-ing would suggest a diagnosis of diabetesmellitus rather than acute appendicitis:

A. A three week history of nocturnal enure-sis

B. Increasing frequency of vomiting overtwo days

C. A trace of ketones on urine testingD. PolyuriaE. Periumbilical abdominal pain

Key: AD are true or often expressed asTFFITF.MCQ 3 is testing at a deeper level where judg-ment as well as factual recall are necessary.Knowledge of diabetes and appendicitis areneeded but each item tests a single element inthis problem.The examples given here are the multiple

true/false type ofMCQ, which is the type usedmainly in the UK. They have the advantagethat each MCQ tests five elements of know-ledge whereas the 'best of five' MCQ tests one.However, these may be very useful in testingdecision making and the interpretation ofinformation. For example, keeping to our 6year old boy presenting with vomiting:

Question: A 6 year old boy who presents withincreasingly severe abdominal pain over thepreceding two to three days is noted to have atrace of ketones and 4+ glycosuria on urinetesting. He has been vomiting large volumesof bile stained liquid for the last 24 hours.

He is now drowsy and both pulse rate andrespiratory rate have steadily risen over the lasttwo hours. Initial results: plasma glucose con-centration 14 mmol/l, urea 17 mmol/l, sodium129 mmol/l, and potassium 4-9 mmol/l.

In addition to intravenous rehydration,choose from this list the single most valuablenext action:

A. Given intravenous insulinB. Perform an erect and supine abdominal

x rayC. Give broad spectrum antibiotics intra-

venouslyD. Perform a lumbar punctureE. Proceed to laparotomy

Answer key AO, B+4, C+2, DO, E+ 1.This tests an awareness of intestinal obstruc-

tion (volvulus), that a trace of ketonuria and aplasma glucose of 14 mmol/l may occur in anyill child, and the risk of performing lumberpuncture on a drowsy child. However the singlechoice gives no indication on the knowledge ofthese other areas. If a wide range of knowledgeis to be tested then the traditional multipletrue/false MCQs are preferred. The best of fivecan be greatly improved by extending them to10 or 15 choices - the extended choice ques-tions. These can be used in combination withmore detailed case histories or with illustrativematerial such as the actual x ray in the caseabove showing fluid levels and a collection ofgas in the splenic area only. These can be partof a written paper, although printing clinicalphotographs and radiographic images is expen-sive, or, alternatively, as part of an objectivelystructured clinical examination (OSCE).There has been a lively debate whether neg-

ative marking should be used. Traditionallythe multiple true/false MCQ has been markedawarding + 1 for a correct statement of true orfalse and -1 for an incorrect response withzero for a blank or don't know response. Thosewho support negative marking claim thatdoctors are penalised if they guess rather thanadmit their ignorance and seek more experthelp in clinical practice. The opposition claimthat negative marking tests both knowledgeand the strategy used for uncertainty in candi-dates thus damaging the effectiveness of thetest. Bright but anxious candidates may do lesswell than more ignorant but confident (orcavalier) students.7 8 Students can be helpedwith the certainty variable by being advised toput themselves in the position of the examinerwho must construct the items. True items arerelatively easy to set but false ones must befound by using opposites (for example MCQs1B and 1D, 2A and 2E), false modifications ofa true statement (for example MCQ 2B and2C), or false dimensions of time or severity forexample. It is much easier to find false optionswhen comparing two diagnoses (for exampleMCQ 3). The candidate can then support theirconclusion about the truth of the item beyondreasonable doubt from the basis of their factualknowledge that should offset their anxiety to adegree. However, this is a personality trait thatmay put some candidates at a permanent dis-advantage. The removal of the penalty markmay help them and encourage them to attempt

471 on D

ecember 9, 2021 by guest. P

rotected by copyright.http://adc.bm

j.com/

Arch D

is Child: first published as 10.1136/adc.74.5.469 on 1 M

ay 1996. Dow

nloaded from

Clayden

more questions where their first instinctiveresponse is likely to be correct.

All examinations are subject to samplingerror when trying to estimate a candidate'soverall knowledge or skill. The MCQ doesallow a very wide range of knowledge testing.Some examinations are composed of 60MCQs each with five parts but candidatesappear to cope with 100 MCQs each with fiveitems without too much fatigue.The effort for the examiner is in the design

and syntax of the question. Marking can beperformed by optically scanning candidates'answer sheets directly into the computer data-base. This allows detailed statistical analysisof each question as well as providing theresults for the candidates. Most computer pro-grammes will provide the mean scores for eachMCQ as well as reporting the percentage whoresponded correctly, incorrectly, and who didnot attempt each item. The discriminatingpower of the question is usually assessed bycalculating the correlation coefficient for theMCQ by comparing the candidates' rank orderfor that question with their rank order for thewhole MCQ test. The discriminating power ofeach item can also be checked by comparingthe overall MCQ exam performance of thosecorrectly answering the item with the overallperformance of those who answered the itemincorrectly. The mean score for each MCQand the percentage of candidates whoanswered each item correct will indicate howdifficult the question was for the candidates.The correlation coefficients indicate howdiscriminating the test has been. This alsofacilitates measures of internal reliability forthe MCQ test.9 These data help reviewingcommittees of examiners to modify the ques-tions, thus building up banks of used but finelyhoned questions. By using a number of identi-cal questions in subsequent examinations,comparisons can be made of a particular can-didate cohort with previous cohorts. This canbe one way of using a norm referenced exami-nation fairly provided the results are modifiedif significant differences are found betweencohorts that cannot be expected by growth inknowledge or familiarity over a period of time(for example knowledge of retroviruses overthe 1980 decade).The debate over norm referencing (competi-

tive examination with a predetermined per-centage passing) or criterion referencing(non-competitive with predetermined passmark) will probably be resolved when the corecurriculum for each stage in training is estab-lished. Criterion referencing will then allowexaminations based on the essential facts withsome leniency for the odd gap in knowledge ortechnical mistake in making a response.Currently most MCQ examinations are normreferenced where there is a real risk of unfair-ness if a candidate is part of a very brightcohort or conversely a risk of passing theunprepared. To avoid this examinationsshould be taken by reasonable numbers of can-didates, should include marker questions fromprevious examinations, and examiners shouldbe prepared to modify their pass rate in the

light of previous results. Another defenceagainst the vagaries of the norm referencedexam is to make this just one part of a numberof tests. Multipart or serial examination willallow any cohort effect to be reduced, althoughit does raise the question of how much sum-mation of results should occur. Is it logical forone test to compensate for another? Beforeconsidering this it is important to review themethods available to test clinical skills as it isresults in these parts of the examination thatare often added to knowledge tests to come upwith an overall pass mark. Most examinationsin paediatrics put a greater weight on theclinical examination than on the knowledgetest and many have veto marks available whenexaminers fear that clinical competence fallsbelow levels that are safe for future patients.

Tests of clinical competenceThe traditional long and short case examina-tions are increasingly being replaced by theOSCE. These examinations have been rigor-ously studied since their first description.'0They have been shown to be both reliable andvalid' I provided a reasonable number of'stations' are used. They compare favourablywith other tests of clinical competence.'2 13OSCEs are usually cycles of 15-25 stations

of tasks lasting between four and 10 minutes.They may be patient and examiner basedstations or non-patient stations. The candidateis asked to examine one bodily system and theexaminer uses an agreed checklist to rewardeach section of that clinical examination that issuccessfully carried out as well has having theopportunity of rewarding marks for attitude tothe patient, recognition of abnormal signs,making diagnoses, and mentioning a plan forfurther management when time allows. Non-patient based stations may test communicationor history taking skills, the ability to under-stand charts or pieces of equipment, observingphotographs, radiographs or videos, andanswering either extended choice questions orwith free text single word or brief statements.The educational advantages ofusing OSCEs

include:* A potential for a much wider number ofdifferent types of clinical skills to be tested'4* The same selection of clinical skills beingtested for every candidate* A series of tests that are objectivelymarked using standardised checklists thusreducing observer bias* A method for testing a large number ofcandidates in a relatively short period.Some of the problems of OSCEs include:* Strict time constraint on candidates andexaminer* Potential boredom from repetitive tasksfor examiners* High demand for large number ofexaminers, although for shorter number ofdays.The traditional long clinical case is where

a candidate is asked to take a history from aparent and child, perform a detailed examina-tion, and prepare to present the findings to the

472

on Decem

ber 9, 2021 by guest. Protected by copyright.

http://adc.bmj.com

/A

rch Dis C

hild: first published as 10.1136/adc.74.5.469 on 1 May 1996. D

ownloaded from

Writing and marking examinations for paediatrics 473

examiners. They usually have between 40minutes and one hour to complete their historyand examination and about 15 to 25 minutesto report back to a pair of examiners whoexplore the candidates' understanding of theproblems and his/her plans for further manage-ment.Some ways of improving traditional long

case clinical exams include:* Observing the candidate taking history14* Structured checklist for examiners15* Providing clear instructions for patientsand parents about how much of the diag-nosis to tell the candidates* Double marking by two examiners blindto each others marks with no negotiation.The traditional short clinical case is where

the candidate is taken to a series of three to fivechildren with a variety of physical signs. Thecandidate is usually tested in three to four bodysystems and reports the findings as they arediscovered or invited to make a diagnosis andsupport this with evidence from the examina-tion.Some ways of improving traditional short

cases:* Select cases on educational grounds, ie,testing specific skills rather than con-veniently available cases* Group a series of short cases for exam-iners to use in a planned way* Introduce the case to the candidate with abrief but realistic history* Use standard checklists for clinical skills* Provide positive feedback to candidates toreduce anxiety.The oral examination usually lasts about

20 minutes with the candidate being askedquestions on a wide range of topics by twoexaminers. There has been a great deal ofdebate on its value as a measure ofprofessionalcompetence.16 Many candidates feel that theiranxiety inhibits their performance in this con-frontation although anxiety does not seem toaffect results. 17Some ways of improving traditional oral

examination (viva):* Use a structured series of questions* Focus on communication and explana-tion skills* Use a wider range of subjects such asethics, management and others which aredifficult to test in MCQ format.

Marking schemesThe marking of these rather confrontationalexaminations is subjective but the use ofmarking schemes which use words rather thanscores may help. For example grading:Outstanding ... Excellent ... Very good ... Good... Satisfactory ... Barely acceptable ... Poor ...Very poor ... Abysmal rather than scores of 10to 1 may help examiners to decide on whethercompensation between one part of an exami-nation should be permitted to allow the candi-date to reach a pass mark. Many examinationsmerely add the scores for each section of theexamination and allow compensation, others

have veto marks or allow the candidate to fallbelow 'satisfactory' in only one part of theexamination. Compensation does reduce therisk of one very strict or generous scorer lead-ing to a pass or fail by his/her decision alone.

ConclusionExaminations are an essential element ofmedical education, which generates vehementdebate but unfortunately a relative lack ofrigorous critical analysis. There appears to be abackground anxiety that research findings thatmight suggest an examination has been lessthan fair will lead to endless arguments withcandidates who have failed that examination. Itis a major responsibility of all those involved inexamining to seek evidence of the fairness,reliability, and validity of the methods'8 andthe organisation of the tests. Computers havemade analysis of results much easier. Access toshared banks of all types of questions andanswer sheets should allow examiners to selectthe subject first and the assessment tool secondbut from a range of tested and continuallymodified questions which allow comparison ofcandidates' performance both in time andbetween institutions. The creation of examina-tion materials and their evaluation must beconsidered as valuable an activity as research inacademic life. There is little point in childhealth research if the advances in knowledgeand skills that this generates cannot be shownto have been acquired eventually by presentand future paediatricians.

1 Neufeld VR, Norman GR, eds. Written examinations.Assessing clinical competnce. New York: Springer, 1985:94-118.

2 Wakefield RE, Roberts S. A pilot experiment on the inter-examiner reliability of short essay questions. Med Educ1979; 13: 342-4.

3 Webber RH. Structured short-answer questions: an alterna-tive examination method. Med Educ 1992; 26: 58-62.

4 Feletti GI, Smith EKM. Modified essay questions: are theyworth the effort? Med Educ 1986; 20: 126-32.

5 Anderson J. The multiple choice question in medicine. 2nd Ed.London: Pitman, 1982.

6 Harden RM. Constructing multiple choice questions of themultiple true/false type. ASME medical education bookletNo 10. Association for the Study of Medical Education.Med Educ 1979; 13: 305-12.

7 Harden RM, Brown RA, Biran LA, Dallas Ross WP,Wakefield RE. Multiple choice questions: to guess or notto guess. Med Educ 1976; 10: 27-32.

8 Fleming PR. The profitability of 'guessing' in multiplechoice question papers. Med Educ 1988; 22: 509-13.

9 Kuder GF, Richardson MW. The theory of the estimationof reliability. Psychometrica 1937; 2: 151-60.

10 Harden RM, Stevenson M, Downie WW, Wilson GM.Assessment of clinical competence using objective struc-tured examination. BMJ 1975; i: 447-51.

11 Matsell DG, Wolfish NM, Hsu E. Reliability and validity ofthe objective structured clinical examination in paedi-atrics. Med Educ 1991; 25: 293-9.

12 Smith U, Price DA, Houston IB. Objective structuredclinical examination compared with other forms ofstudent assessment. Arch Dis Child 1984; 59: 1173-6.

13 Newble DI, Swanson DB. Psychometric characteristic ofthe objective structured clinical examination Med Educ1988; 22: 325-34.

14 Newble DI. The observed long-case in clinical assessment.Med Educ 1991; 25: 369-73.

15 Van Thiel J, Kraan HF, Van Der Vleuten CPM. Reliabilityand feasibility of measuring medical interviewing skills:the revised Maastrich history-taking and advice checklist.Med Educ 1991; 25: 224-9.

16 McGuire CH. The oral examination as a measure of profes-sional competence. Journal of Med Educ 1966; 41:267-74.

17 Arndt CB, Guly UMV, McManus IC. Preclinical anxiety:the stress associated with a viva voce examination. MedEduc 1986; 20: 274-80.

18 Hodgkin K, Knox JDE. Problem centred learning. Edinburgh:Churchill Livingstone, 1975.

on Decem

ber 9, 2021 by guest. Protected by copyright.

http://adc.bmj.com

/A

rch Dis C

hild: first published as 10.1136/adc.74.5.469 on 1 May 1996. D

ownloaded from