Download - Autonomous Pipelines David Brett Leicester E-Science talk Edinburgh AUTONOMOUS PIPELINES David Brett, Leicester University.

E-Science talk Edinburgh

Autonomous Pipelines

David Brett Leicester

AUTONOMOUS AUTONOMOUS PIPELINESPIPELINES

David Brett, Leicester University




Project: Why work on an autonomous classification program?

WASP: Wide Angle Search for Planets telescope

Leicester, St-Andrews, Cambridge, QU Belfast and Open Universities.

Variable Identification: Period searching

Classification System: Artificial Neural Networks

Methods and Results.

Methods and the Future.

Talk Map




Why do any of this?Why do any of this?

Tera-scale computing age:

• Volume of collected data

• Repetitive nature of the data reduction

• “Brute force” approach

• Creates one more layer of abstraction




WASPWASP(Wide-Angle Search for Planets)

9.5o

9.5o

Four 20482 CCD chips (recently funding for five)

For comparison: the INT “wide-field camera” images roughly the size of the full

moon

1% photometry down to 13th magnitude and detections down to 17th (30s exposure)

5TB per year (raw)

But what do we do with all those bits?




Source Extraction and Data Reduction

Stages:

• Home grown programs for “cleaning” the raw data.

• Use of conventional packages such as SExtractor for source extraction

• Variability checking programs

• Periodic variability locating programs

• Phased lightcurve recognition software

• Results database




Periodic Variables

Phase-folding:

• Fast to execute

• Easy to implement

• Simple to understand

• e.g. 2 or the L-Statistic

Two Main Methods

Frequency Analysis:

• Slower to execute

• Trickier to code

• More reliable

• e.g. Lomb-Scargle or Schwarzenberg-Czerny




Periodic Variables

Phase-Folding: 2

• Maximum deviation from a constant line.

• Binned data, uses bin mean.

• Intra-bin deviation not taken into account

• Very quick to implement and compute.

• REM! Looking for a maximum, not a minimum.




Periodic Variables

Phase-Folding: L-Statistic

• Also uses binned data.

• Additionally considers intra-bin deviation from bin-mean.

• Divide 2 value by the intra-bin dispersion, enhancing low deviation trial periods.

• Quick and accurate with medium to low-noise data.

• Created by S. Davies, 1990.




Periodic Variables

Frequency Analysis:

Lomb-Scargle

• Uses the whole unbinned data time series (DTS).

• Created by Lomb 1976, refined by Scargle 1982. Code adapted from NR in C.

Period (days)

Stat




Periodic Variables

Frequency Analysis:

Schwarzenberg-Czerny

• Uses the whole unbinned data time series (DTS).

• Created by A. Schwarzenberg-Czerny 1996. Code adapted from S-C code.

Period (days)

Stat




Periodic Variables

Choice of Trial Periods:

• Linear difference in period, dP.

• Linear difference in phase, d.

• Too small a dP and we may search too fine a parameter space and waste CPU time.

• Too large a dP and we will not search finely enough.

OK

7%




Periodic Variables




Periodic Variables

Conclusions:

• Phase-folding methods are swiftest

• Frequency Analysis methods are generally more reliable

• Autonomous pipelines require reliability over speed

• Schwarzenberg-Czerny would be the method of choice

• In which case a better period choice method is needed




Autonomous Classification

2 Main Stages:

• Memory Pattern Matching

• Modification of the Artificial Neural Network (ANN)

INITIAL FINAL





Memory Pattern Matching:

Why?






• It allows us to begin grouping similar shapes together

• This grouping encourages self-organisation

• To pattern-match is the underlying goal!

• Finding a sensible position on the network for a pattern allows us to change the network

How?






Lightcurve Pattern

Node 0 Pattern

Node 1 Pattern

Node 0 has the lowest weight difference vector, node 0 wins

WEIGHTS





Modification of the ANN:

• Modification affects an area

• Lessens as geometrical distance increases

• Area mixing encourages grouping

• The network can self-organise

• Hotspots occur





Modification of the ANN:

• Adjust the weights on the network nodes so that they better represent the lightcurve.

• is the learning parameter. It decreases on each learning iteration of the network. 00.

• P is the power (from the neighbour function) of the current node.

)( iii LwPdw




The Future

• Enhanced clustering mechanism.

• More precise shape-similarity evaluating methods.

• More dynamically adaptive choice of trial periods for period searching.

• Refinement of these ideas and trying other methods.

• Research if >2D networks are worthwhile in the current format.




Questions?Questions?