StatAnalysis Howarth 2010

download StatAnalysis Howarth 2010

of 27

  • date post

    25-Feb-2018
  • Category

    Documents

  • view

    230
  • download

    0

Embed Size (px)

Transcript of StatAnalysis Howarth 2010

  • 7/25/2019 StatAnalysis Howarth 2010

    1/27

    Statistical analysis and data display at the Geochemical ProspectingResearch Centre and Applied Geochemistry Research Group,

    Imperial College, London

    Richard J. Howarth1,* & Robert G. Garrett21Dept. of Earth Sciences, University College London, Gower Street, London WC1E 6BT, United Kingdom2Emeritus Scientist, Geological Survey of Canada, 601 Booth St., Ottawa, Ontario, K1A 0E8, Canada

    *Corresponding author: (e-mail: r.howarth@ucl.ac.uk)

    ABSTRACT: The Imperial College of Science and Technology, a constituent collegeof the University of London in the 1960s, had the good fortune to be one of the firstcolleges in the United Kingdom to have access to digital computing facilities. Thisreview traces the history of the application of computing in the GeochemicalProspecting Research Centre and its successor, the Applied Geochemistry ResearchGroup, as computing moved from being a frontier research area to becoming acommonplace tool. The three principal areas in which it was involved comprised: thequality control, and thereby assurance, of analytical data; the production ofpioneering atlases of regional geochemical variation in Northern Ireland ( 1973) andEngland and Wales (1978); and the application of methods introduced by workersin pattern-recognition and statistics to the interpretation of land-based and marineregional geochemical data.

    KEYWORDS: computers, computing, applied geochemistry, history of geochemistry, history of statistics, history of cartography, regional mapping, spatial filters, geochemical atlas, SC4020,LGP2703, multi-element maps, data transformation, factor analysis, cluster analysis, discriminantanalysis, ridge regression, Kleiner-Hartigan trees, robust statistics, quality assurance

    The Geochemical Prospecting Research Centre (GPRC ) wasestablished in 1954, under the direction of Professor JohnStuart Webb (19202007), in the Mining Geology section ofthe Royal School of Mines (RSM), Imperial College of Scienceand Technology (ICST), London. Initial studies were con-cerned with mineral prospecting using soil and drainage sam-pling in Northern Rhodesia (Zambia), Uganda, Sierra Leone,Bechuanaland (Botswana), Tanganyika (Tanzania), BritishNorth Borneo (Sabah, East Malaysia), Burma (Myanmar) andthe Federation of Malaya (West Malaysia), and extended in the1960s to Southern Rhodesia (Zimbabwe), the PhilippineRepublic, Borneo (now divided between Malaysia and

    Indonesia), Fiji, East Africa, Australia, and the UnitedKingdom. By 1960, its studies had broadened into regionalgeochemistry, based on the analysis of stream sediments. In1963, Webb initiated the first of a series of investigationsconcerning the relationship between regional geochemistry andagricultural problems in livestock in Eire (Webb 1964; Webb &

    Atkinson 1965). The application of geochemistry to marinemineral exploration began in 1964 (Tooms 1967). Conse-quently, by 1963, the Centres name was changed to the

    Applied Geochemistry Research Group (AGRG ) to reflect theincreasing breadth of its applications.

    The work of the GPRC and AGRG was underpinned bydevelopments in two complementary spheres: methods andinstrumentation for chemical analysis (discussed in the paper by

    Michael Thompson (2010) and computing (Fig. 1). The latterfacilitated: (i) statistical quality-assurance in the analytical lab-

    oratory; (ii) the display of large, multi-element, data sets in mapform; and (iii) the interpretation of such multi-element datasets.

    First steps

    Many of the early studies undertaken by research students inthe GPRC included simple, manually-based, statistical analyses.

    Fig. 1.Annual numbers of GPRC/AGRG publications (total n=76)and theses (n=24) with a substantial computing and/or statisticalcontent over the years 195488.

    Geochemistry: Exploration, Environment, Analysis, Vol. 10 2010, pp. 289315 1467-7873/10/$15.00 2010 AAG/Geological Society of LondonDOI 10.1144/1467-7873/09-238

  • 7/25/2019 StatAnalysis Howarth 2010

    2/27

    The situation in the early 1960s is summarized in Hawkes &Webb (1962 ). The use of histograms to display the frequencydistributions of element concentrations was commonplace,

    while probability plots of cumulative frequency distributionswere less frequently prepared. In both cases, the data for therequisite plots were compiled by hand through the preparationof tally tables. At that time, analytical quality assurance wasbased on the use of statistical series samples. These were aseries of synthetic samples (each of which was composed ofknown proportions of two natural end-members, one having alow concentration of the element of interest, the other a highconcentration) which were included in analytical batches fol-lowing the procedure developed by the ex-RSM geologist andchemical engineer, Charles Alex Urton Craven (191893), withadvice from Professor George Alfred Bernard (19152002) ofthe Mathematics Department, ICST (Craven 1954), in order toestimate analytical accuracy and precision. For the largeamounts of photographic-plate spectrographic data generatedat the GPRC, bins for data concentration-ranges were selected(because of a tendency of operators to unconsciously interpo-

    late values which were biased towards those of the analyticalstandards used), using a logarithmic concentration scale, andbin boundaries were placed mid-way between the knownconcentrations of the geochemical standards. A tick (the tally)

    was placed in the appropriate bin for each analysis falling in thatrange, every fifth count being drawn as a horizontal linethrough the previous four ticks. This facilitated counting thetotal numbers of analytical results falling into each bin. A book,

    widely used by students at the time, was Moroneys (1960 )Facts from Figures, which gave formulae for the calculation ofmeans and standard deviations from such grouped data, asaccumulated in the tally tables.

    For those more interested in statistical analysis, Dixon &Massey (1957) was the text of choice. However, in the early-

    and mid-1960s, textbooks written by geologist and statisticianco-authors started to be published on the topics of statisticaldata analysis and modelling, e.g. Miller & Kahn (1962) andKrumbein & Graybill (1965), and these, together with agrowing number of research papers, did much to exposestudents to the possibilities of the application of mathematicsand statistics to applied geochemical problems. In the early1960s such computations were carried out by means of tablesof logarithms and a six-inch (15 cm) slide-rule, with whichstudents were as adept as todays are with pocket calculators.

    To assist in the calculations (based on a linear regressionmodel) required by Cravens (1954) method of estimation ofanalytical accuracy and precision, preprinted work-sheets wereused; one simply followed the steps and the results were arrived

    at very much a black box approach. In order to meet therequirements of normality of residual errors in the regressionmodelling, and homogeneity of variance when the concen-tration levels in the statistical series samples spanned over anorder-of-magnitude, it was desirable to carry out these calcula-tions following a logarithmic transformation. This was thesubject of an MSc thesis by Stern (1959), but the routineapplication of his method was computationally complex, andessentially impractical for routine application, even using theMonroe electro-mechanical calculator available in the GPRC.Sterns supervisor in the Department of Mathematics, Dr. G.M. Jenkins (193382), who later became an expert in time-series analysis and systems engineering, appears to have begun

    work on improving the deficiencies he recognised in Sterns

    approach, in an unpublished manuscript A statistical problemin geochemical prospecting (1959?, recently found in oldAGRG files). In 1970, an ex-member of the GPRC staff,Clifford (Cliff) Henry James (19312003), published a version

    of Cravens method still adapted to hand-calculation, on thegrounds that one of the difficulties of the method as originallydescribed is that the calculations involved require a computeror an electronic calculator with a memory unit . . . manylaboratories do not possess these facilities (James 1970, B88).

    REGIONAL MAPPING

    Following extensive fieldwork over several thousand squaremiles of Africa in the mid-1950s by Webb, Tooms and theirstudents, it became apparent that there was considerable scopefor regional geochemical surveys based on drainage reconnais-sance surveys. By 1960, this hypothesis was confirmed throughfurther studies in what was then Northern Rhodesia, elsewherein Africa, and in S.E. Asia. In 1960, a suite of drainage samplescollected for a base metal drainage reconnaissance survey over3000 mi2 (7770 km2) of the LivingstoneNamwala Concessionarea, Zambia, were made available to the GPRC by NamwalaConcessions Ltd. These were analysed spectrographically andchemically for 17 elements in 196162. Following a study of theassociation between trace element concentrations in the drainagematerials and the geology (Harden 1962), it was apparent thatthe

  • 7/25/2019 StatAnalysis Howarth 2010

    3/27

    Fig.

    2.Portionofapoint-symbolmapoftheconce

    ntrationofcold-extractablecopper(ppm)

    inthe 0 and xi,i=1,k =100) to a new set of

    variables y1, . . ., y(k1) where yj= loge[xi/x(k1)]; j=1, k1.In recent years, this transform has been widely promoted foruse in the earth sciences (most recently by Buccianti et al. 2006).Howarth tried on several occasions to apply the logratiotransform as a precursor to multivariate analysis of various

    AGRG geochemical data sets but found that, in practice, the

    results were often geochemically uninterpretable, and thatwhenever xi 0, the