Monitoring Wafer Geometric Quality using Additive Gaussian...

42
Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department of Industrial Engineering, Tsinghua University May 23, 2013

Transcript of Monitoring Wafer Geometric Quality using Additive Gaussian...

  • Monitoring Wafer Geometric Quality usingAdditive Gaussian Process

    Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1

    1Department of Industrial and Systems Engineering, National University ofSingapore

    2Department of Industrial Engineering, Tsinghua University

    May 23, 2013

  • Outline

    1 Introduction

    2 Statistical Quantification using AGP Model

    3 Statistical Monitoring of Geometric Quality

    4 Case Studies

    5 Conclusion and Future Directions

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Integrated Circuits

    3 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Semiconductor Fabrication Process

    IngotSlicing Lapping Polishing Cleaning

    Wafer

    InspectionReject

    Disposal

    Accept

    Front End Back EndChips

    4 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Challenges

    Transistor size: 32nm → 28nm→ 22nm → 16nm → 14nm →· · ·

    Wafer size: 130mm → 150mm→ 200mm → 300mm → 450mm

    5 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Wafer Preparation Process

    Good

    Bad

    Wa

    fer

    Qu

    ality

    Require

    Cause

    Diameter

    Larger

    Higher

    Integration

    IC Companies

    Wafer Fabs

    6 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Wafer’s Geometric Quality

    Contact method: touching probes;

    Non-contact method: wavelength scanning interferometer;

    −60 −40 −20 0 20 40 60−60

    −40

    −20

    0

    20

    40

    60

    x1

    x2

    Measurements

    Thinner

    Thicker

    Engineers’ problem: how to check whether the surface is desirable?

    7 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Motivation

    Testing Problem

    −3 −2 −1 0 1 2 3

    −2

    −1

    01

    8 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Problem formulation

    Framework

    Surface as the Response Variable

    Modeling Monitoring Process Control

    • Without covariate• Regression with

    covariates

    • Design optimization

    • Change detection• Design optimization

    • Run-to-Run control• Fault diagnostics

    9 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Problem formulation

    Difficulties

    Complete measurement of the wafer is slow

    Geometric profile is too complex to be modeled by parametricfunctions

    Measurements on different surfaces might not be aligned well

    Deviations (errors) are spatially correlated

    10 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Problem formulation

    State of the art

    One sample model: Gaussian process (Jin, Chang, and Shi2012), PDE-constrained Gaussian process (Zhao, Jin, Wu,and Shi 2011)

    Only applicable for a single surface

    Primitive testing: summary indicators of the whole profile

    Total Thickness Variation (TTV), Bow, Warp, Site TIR(Doering and Nishi 2007);

    Need to fill in the gap

    11 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Review of GP

    Gaussian Process

    Y (x) = µ+ Z (x) with PD covariance function k(xi , xj)

    Suitable for spatially correlated data (Cressie 1993);

    Able to approximate complex function (Sacks et al. 1989);

    Able to evaluate prediction error (Santner et al. 2003).

    0 0.1 0.2 0.3 0.4 0.5−1.5

    −1

    −0.5

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    PredictionSampleTrue Function

    0 0.1 0.2 0.3 0.4 0.50

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    MSE of Prediction

    12 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Review of GP

    Gaussian Process with Errors

    Errors present in physcial processes or stochastic simulationsY (x) = µ+ Z (x) + �(x)

    �(xi ) are i.i.d. normally distributed: Σ + σ2I

    �(xi ) are independently and normally distributed, butvar(�(xi )) = σ

    2(xi ): Σ + Λ (Ankenman et al. 2010)

    �(xi ) are correlated, then?

    0 0.1 0.2 0.3 0.4 0.5−2

    −1

    0

    1

    2

    3

    4

    Predicted MeanSamplesStandard

    Cycle time estimation 187

    50th Quantile Regression Curve 85th Quantile Regression Curve

    0.5 0.6 0.7 0.8 0.9

    510

    15

    Throughput x

    Cyc

    le T

    ime

    Qua

    ntile

    0.5 0.6 0.7 0.8 0.9

    1020

    3040

    Throughput x

    Cyc

    le T

    ime

    Qua

    ntile

    Fig. 5. G/G/1 quantile regression curve with empirical quantile estimates.

    runs). From the numbers provided, we can see that theRMSE obtained using down-sampling is only about 20–30% of that from the original observations.

    From both analytical and experimental results, it can beobserved that the correlations among successive cycle timesare much stronger in higher throughput ranges. Therefore,it is possible to devise adaptive down-sampling based onEquation (28) in Section 3.3.3, whose sampling rate is de-termined by the correlations. In this way, the simulationlength needed can be reduced without much sacrifice onthe estimation accuracies.

    4.2. G/G/1 system with FCFS queues

    Generally, the G/G/1 queuing model provides more flexi-bility in approximating real systems compared to M/M/1queues. However, often the stationary distribution of thecycle time cannot be analytically derived. Therefore, in-stead of computing the relative error and absolute er-ror between our fitted model and analytical results, wecan instead illustrate the prediction accuracy of the re-gression quantile model. In this experiment, the inter-arrival time was assumed to have a lognormal distribu-tion with the log-variance one and log-mean adjusted ac-cording to the throughput requirement. The server pro-cessing time was assumed to follow an Erlang (2) distribu-tion with the rate one-half. Therefore, the mean process-ing time was one in order to be consistent with previousassumptions. As in previous experiments, ten throughputrates equally spaced between 0.5 and 0.95 were selectedfor the simulation. Ten thousand cycle time observationsunder each throughput rate were collected for model fit-ting. A new set of throughputs was chosen ranging from0.5 to 0.95, incremented in steps of 0.01. Under each

    throughput, new simulations were conducted and 50 000observations were collected. The empirical sample quan-tile was estimated by using the �Tτ�th order statisticY�Tτ�,where T is the sample size (50 000 in this case). At eachthroughput point, this procedure was repeated five timesand the estimated sample quantiles are plotted along withthe fitted quantile curves in Fig. 5.

    From Fig. 5, we can see that the quantile regressioncurve can satisfactorily predict the quantiles under differentthroughput rates. Only in the high throughput range do thepredictions have a large variance and are thus not reliable.However, this issue can be solved by using additional repli-cations in the simulations to collect more data for modelfitting and thus control the accuracy level of predictions.

    4.3. Serial production lines

    In this section, we consider a serial production system con-sisting of four workstations. The processing times at eachworkstation and the inter-arrival times are random vari-ables following general distributions. Buffers exist betweentwo adjacent workstations. The production line is illus-trated in Fig. 6.

    W1 W2

    W3 W4

    B2

    B3 B4

    B1

    Receiving

    Shipping

    Fig. 6. Illustration of the serial production line.

    Downloaded By: [Chen, Nan] At: 00:44 7 January 2011 13 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Data Characteristics

    Location

    Profile Value

    f (x)

    f (x) + �1(x)

    •(x11, y11)

    •(x12, y12)

    f (x) + �2(x)

    •(x21, y22)•

    (x21, y22)

    14 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    AGP Model

    Yi (x) = f (x) + �i (x)

    Standard

    surface

    Deviation

    surface

    Assumption

    f (x) is a realization of GP(µ, s(·))�i (x) is a realization of GP(0, v(·))f (x) and �i (x) are independent

    �i (x) and �j(x) are independent for i 6= j

    15 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Distributional view

    A Gaussian process can be used as a prior probability distributionover functions in Bayesian inference (Rasmussen and Williams2006).

    0 0.2 0.4 0.6 0.8 1

    0

    0.5

    1

    1.5

    x

    Gen

    erat

    ed V

    alue

    Realization 1Realization 2

    Linear model: Y (x) = f (x) + � i .i .d ∼ F�AGP model: Y (x) = f (x) + �(x) i .i .d ∼ GP(0, v(·))

    16 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Model Estimation

    Estimate the model parameters β ≡ [µ, σ21,θ1, σ22,θ2] fromobservations

    Location

    Profile Value

    f (x)

    f (x) + �1(x)

    •(x11, y11)

    •(x12, y12)

    f (x) + �2(x)

    •(x21, y22)•

    (x21, y22)

    ∗∗∗

    17 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Structure of Σ0

    cov(yij , yi ′k) =

    {s(xij , xi ′k) + v(xij , xi ′k), ∀i = i ′

    s(xij , xi ′k), ∀i 6= i ′i , i ′ = 1, 2, · · · ,N0

    +

    0

    0

    M0 ×M0

    n1 × n1

    n2 × n2

    nN0 × nN0

    s(xij , xi ′k |θ1) v(xij , xi ′k |θ2)

    XIC

    XIC

    X1 X2 XN0

    X1

    X2

    XN0

    18 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    MLE

    Given the data from all surface profiles XIC ,YIC , we canestimate β as

    β̂ = arg maxβ

    {−1

    2log[det(σ21S + σ

    22V)]

    −12

    (YIC − µ1M0)T (σ21S + σ

    22V)

    −1(YIC − µ1M0)}.

    Maximizing profile likelihood: given θ1,θ2, the correlationmatrix S,V are fixed. Then µ, σ21, σ

    22 can be obtained easily.

    µ =1TM0(S + ρV)

    −1YIC

    1TM0(S + ρV)−11M0

    , ρ = σ22/σ21

    σ21 =(YIC − µ1M0)T (S + ρV)−1(YIC − µ1M0)

    M0

    19 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Prediction

    For new unmeasured site (Xl ,Yl):(Yl

    YIC

    )∼ N

    [(µ1nlµ1M0

    ),

    (Σl Σl,0

    ΣTl,0 Σ0

    )]

    Yl |YIC ∼ N(µ̃l , Σ̃l), where

    µ̃l = µ1nl + Σl,0Σ−10 (YIC − µ1M0 )

    Σ̃l = Σl −Σl,0Σ−10 ΣTl,0

    Σl ,0 may have a different form depending on whether Yl aretaken from existing profiles or new ones.

    20 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    AGP Model

    Prediction Demonstration

    0 0.1 0.2 0.3 0.4 0.5−2

    −1

    0

    1

    2

    3

    4

    Predicted MeanSamplesStandard

    0 0.1 0.2 0.3 0.4 0.50.0581

    0.0582

    0.0583

    0.0584

    0.0585

    0.0586

    0.0587

    0.0588

    0.0589

    0.059

    Predicted Variance

    Predicted mean Predicted variance

    21 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    T 2 Test

    Statistical Testing

    Location

    Profile Value

    •• • •

    Whether the new profile deviates from f (x) within acceptableregion

    Statistical testing based on the samples (where to sample?)

    22 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    T 2 Test

    T 2 Test

    If the new surface conforms with the model, Yl ∼ N(µ̃l , Σ̃l)Reducing surface comparison to multivariate normal datacomparison

    H0 : Yl ∼ N(µ̃l , Σ̃l) H1 : Yl 6∼ N(µ̃l , Σ̃l).

    Testing statistic:

    T 2l = (Yl − µ̃l)T Σ̃−1l (Yl − µ̃l),

    Under H0, T2l ∼ χ2nl . Reject H0 when T

    2l > HT .

    23 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Generalized likelihood ratio test

    GLR Test

    Only focus on a certain class of alternative models

    Another deviation source is considered as the alternativemodels

    Yl(x) = f (x) + �l(x) + ξ(x)ξ(x) is a realization of another GP(δ,w(·)).Suitable to model the global change effects

    Testing hypothesis

    H0 :Yl(x) = f (x) + �l(x)

    H1 :Yl(x) = f (x) + �l(x) + ξ(x)

    24 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Generalized likelihood ratio test

    GLR Test

    With finite number of observations

    Testing hypothesis:

    H0 :Yl ∼ N(µ̃l , Σ̃l)H1 :Yl ∼ N(µ̃l + δ1nl , Σ̃l + Σw ) for some nonzero δ, γ2,θl

    GLR statistic:

    Rl = 2 lnsupδ,γ2,θl

    det(Σ̃l + Σw )−1/2 exp

    [−(Yl − µ̃l − δ1nl )

    T (Σ̃l + Σw )−1(Yl − µ̃l − δ1nl )/2

    ]det(Σ̃l )

    −1/2 exp[−(Yl − µ̃l )T Σ̃

    −1l (Yl − µ̃l )/2

    ]

    Rl ∼ equal mixture χ21 - χ22 asymptotically under H0. RejectH0 when: Rl > HR .

    25 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Generalized likelihood ratio test

    Summary

    N0 IC Units

    ni on Unit iAGP Model

    (µ̃l , Σ̃l )

    New UnitYl

    Xl

    T 2 TestGLR Test

    Accept

    Reject

    Continue

    Disposal

    26 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Approximation and Estimation Performance

    Approximation Performance

    Standard profile (Shpak 1995):

    f (x) = sin(x) + sin(10x/3) + log(x)− 0.84x + 3

    Spatially correlated error: �(x) ∼ GP(0, 0.05× v(· |5))

    3 4 5 6 7−2

    −1

    0

    1

    2

    3

    x

    Pre

    dict

    ed m

    ean

    f(x)MeasurementsAGPOGP

    3 4 5 6 70

    0.5

    1

    1.5

    2

    2.5

    x

    Pred

    icte

    d va

    rian

    ce

    AGPOGP

    OGP Model: Yi (x) = µ+ �i (x)

    27 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Approximation and Estimation Performance

    Bias and RMSE of MLE

    Accuracy of the MLE with different sample size:

    (N0, n0) µ = 1 σ2 = 0.2 θ1 = 3 τ

    2 = 0.05 θ2 = 10

    (10,10)Bias -0.0043 -0.0189 0.4375 -0.0002 0.7089RMSE 0.1824 0.1001 1.6348 0.0080 4.3834

    (10,20)Bias -0.0013 -0.0189 0.1756 0.0001 0.0011RMSE 0.1831 0.0975 0.9608 0.0066 0.9204

    (20,10)Bias 0.0106 -0.0103 0.2528 0.0000 0.4140RMSE 0.1903 0.1038 1.1990 0.0056 3.1826

    (20,20)Bias 0.0015 -0.0169 0.1317 0.0002 0.0001RMSE 0.1850 0.0920 0.7562 0.0045 0.5976

    28 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Monitoring Performance

    Three Change Scenarios

    Y (x) = f (x) + �(x)

    Mean (µ)

    2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

    −1

    0

    1

    2

    3

    4

    StandardShifted

    Variance (σ22)

    2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

    −1

    0

    1

    2

    3

    4

    StandardShifted

    Correlation (θ2)

    2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

    −1

    0

    1

    2

    3

    4

    StandardShifted

    29 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Monitoring Performance

    Performance of Different Tests

    Three tests to compare:Max-Min TestGLR TestT 2 Test

    Shift magnitude

    Bet

    a er

    ror

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0.0 0.1 0.2 0.3 0.4 0.5

    Mean

    0.1 0.2 0.3 0.4 0.5

    Variance

    0 5 10 15

    Correlation

    MaxMin GLR T2

    30 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Monitoring Performance

    Effect of Testing Sample Size (nl)

    31 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Monitoring Performance

    Effect of In Control Sample Size (N0, n0)

    Shift magnitude

    Bet

    a er

    ror

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0.0 0.1 0.2 0.3 0.4 0.5

    Mean

    GLR

    0.1 0.2 0.3 0.4 0.5

    Variance

    GLR

    0 5 10 15

    Correlation

    GLR

    0.0 0.1 0.2 0.3 0.4 0.5

    Mean

    T2

    0.1 0.2 0.3 0.4 0.5

    Variance

    T2

    0 5 10 15

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Correlation

    T2

    (10,10) (20,10) (10,20) (20,20)

    32 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Real Application

    Monitoring Wafer Thickness Profile

    Data are collected from real production plant;

    8 in control wafers to construct AGP model, 30 wafers to betested;

    120 measurements from each in control wafer to constructAGP model;

    480 measurements from each testing wafer to conduct tests.

    33 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Real Application

    Demos of Thickness Profile

    In control wafer #2 In control wafer #7Approximatedstandard profile

    34 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Real Application

    p-Values of the Tests

    0 5 10 15 20 25 300

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    30 Wafer Surfaces

    p−

    Val

    ue

    T2

    GLRSignificant Level

    35 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Real Application

    Rejected Wafers (p-values)

    #12 (T 2:0.9250

    GLR:1.3051×10−4)#23 (T 2:0.0018

    GLR:2.2178×10−11)#24 (T 2:3.7191×10−4

    GLR:3.4084×10−14)

    #26 (T 2:2.5678×10−4

    GLR:9.5180×10−9)#28 (T 2:1.1102×10−16GLR:0)

    #30 (T 2:7.2819×10−4

    GLR:2.5700×10−11)

    36 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Open issues

    Optimal Design for AGP

    Nonparametric model, Fisher information matrix is notenough

    −60 −40 −20 0 20 40 60−60

    −40

    −20

    0

    20

    40

    60

    Ordinary space filling design for deterministic experiments

    does not consider geometric featuredoes not consider the error process

    37 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Open issues

    Optimality Criteria

    Prediction accuracy: minimize (integrated) RMSE

    Determine N0, n0 and xijApproximation accuracy of f (x) and error process estimationσ22 ,θ2Sequential allocation strategy (Ankenman et al. 2010)

    Detection power: minimize β error

    T 2 test: when only µ changes, the Mahalanobis distance

    δ′Σ̃−1l δ determines the power, where

    Σ̃l = Σl −Σl,0Σ−10 ΣTl,0

    Constant mean shift: maxXl 1′Σ̃−1l 1

    D-optimal: maxXl det Σ̃−1l =⇒ minXl det Σ̃l

    38 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Open issues

    GP with “Covariates”

    Surface profile depends on other factors: speed, force, materials,etc.

    GP modelGP with inde-pendent errors

    • Ankenman et al. (2010)

    GP withdependent errors

    Multivariateoutput/response

    • Co-kriging• Zhou et al. (2011); Qian et al. (2008)• Different distance metrics

    Surface response

    39 / 42

  • Introduction AGP Model Statistical Testing Case Studies Conclusion References

    Open issues

    Conclusion

    AGP model is suitable to approximate surface profile andquantify dependent deviations;

    A simple and flexible framework for process monitoring

    Need to further consider design issues and extend the modelto the case with covariate

    40 / 42

  • Reference I

    Ankenman, B., Nelson, B., and Staum, J. (2010), “Stochastic kriging for simulation metamodeling,” OperationsResearch, 58, 371–382.

    Cressie, N. (1993), Statistics for Spatial Data, revised edition, vol. 928, Wiley, New York.

    Doering, R. and Nishi, Y. (2007), Handbook of semiconductor manufacturing technology, CRC Press, Boca Raton,FL.

    Jin, R., Chang, C., and Shi, J. (2012), “Sequential measurement strategy for wafer geometric profile estimation,”IIE Transactions, 44, 1–12.

    Qian, P. Z. G., Wu, H., and Wu, C. F. J. (2008), “Gaussian Process Models for Computer Experiments withQualitative and Quantitative Factors,” Technometrics, 50, 383–396.

    Rasmussen, C. E. and Williams, C. K. I. (2006), Gaussian Processes for Machine Learning, MIT Press, Boston.

    Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989), “Design and analysis of computer experiments,”Statistical science, 4, 409–423.

    Santner, T., Williams, B., and Notz, W. (2003), The design and analysis of computer experiments, Springer, NewYork.

    Shpak, A. (1995), “Global optimization in one-dimensional case using analytically defined derivatives of objectivefunction,” Computer Science Journal of Moldova, 3, 168–184.

    Zhao, H., Jin, R., Wu, S., and Shi, J. (2011), “Pde-constrained gaussian process model on material removal rate ofwire saw slicing process,” Journal of Manufacturing Science and Engineering, 133, 21012.1–21012.9.

    Zhou, Q., Qian, P. Z. G., and Zhou, S. (2011), “A Simple Approach to Emulation for Computer Models withQualitative and Quantitative Factors,” Technometrics, 53, 266–273.

  • Thanks and questions

    IntroductionStatistical Quantification using AGP ModelStatistical Monitoring of Geometric QualityCase StudiesConclusion and Future Directions