Monitoring Wafer Geometric Quality using Additive Gaussian...

Monitoring Wafer Geometric Quality usingAdditive Gaussian Process

Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1

1Department of Industrial and Systems Engineering, National University ofSingapore

2Department of Industrial Engineering, Tsinghua University

May 23, 2013

Outline

1 Introduction

2 Statistical Quantification using AGP Model

3 Statistical Monitoring of Geometric Quality

4 Case Studies

5 Conclusion and Future Directions

Introduction AGP Model Statistical Testing Case Studies Conclusion References

Motivation

Integrated Circuits

3 / 42


Motivation

Semiconductor Fabrication Process

IngotSlicing Lapping Polishing Cleaning

Wafer

InspectionReject

Disposal

Accept

Front End Back EndChips

4 / 42


Motivation

Challenges

Transistor size: 32nm → 28nm→ 22nm → 16nm → 14nm →· · ·

Wafer size: 130mm → 150mm→ 200mm → 300mm → 450mm

5 / 42


Motivation

Wafer Preparation Process

Good

Bad

Wa

fer

Qu

ality

Require

Cause

Diameter

Larger

Higher

Integration

IC Companies

Wafer Fabs

6 / 42


Motivation

Wafer’s Geometric Quality

Contact method: touching probes;

Non-contact method: wavelength scanning interferometer;

−60 −40 −20 0 20 40 60−60

−40

−20

0

20

40

60

x1

x2

Measurements

Thinner

Thicker

Engineers’ problem: how to check whether the surface is desirable?

7 / 42


Motivation

Testing Problem

−3 −2 −1 0 1 2 3

−2

−1

01

8 / 42


Problem formulation

Framework

Surface as the Response Variable

Modeling Monitoring Process Control

• Without covariate• Regression with

covariates

• Design optimization

• Change detection• Design optimization

• Run-to-Run control• Fault diagnostics

9 / 42


Problem formulation

Difficulties

Complete measurement of the wafer is slow

Geometric profile is too complex to be modeled by parametricfunctions

Measurements on different surfaces might not be aligned well

Deviations (errors) are spatially correlated

10 / 42


Problem formulation

State of the art

One sample model: Gaussian process (Jin, Chang, and Shi2012), PDE-constrained Gaussian process (Zhao, Jin, Wu,and Shi 2011)

Only applicable for a single surface

Primitive testing: summary indicators of the whole profile

Total Thickness Variation (TTV), Bow, Warp, Site TIR(Doering and Nishi 2007);

Need to fill in the gap

11 / 42


Review of GP

Gaussian Process

Y (x) = µ+ Z (x) with PD covariance function k(xi , xj)

Suitable for spatially correlated data (Cressie 1993);

Able to approximate complex function (Sacks et al. 1989);

Able to evaluate prediction error (Santner et al. 2003).

0 0.1 0.2 0.3 0.4 0.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

PredictionSampleTrue Function

0 0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05

0.06

0.07

MSE of Prediction

12 / 42


Review of GP

Gaussian Process with Errors

Errors present in physcial processes or stochastic simulationsY (x) = µ+ Z (x) + �(x)

�(xi ) are i.i.d. normally distributed: Σ + σ2I

�(xi ) are independently and normally distributed, butvar(�(xi )) = σ

2(xi ): Σ + Λ (Ankenman et al. 2010)

�(xi ) are correlated, then?

0 0.1 0.2 0.3 0.4 0.5−2

−1

0

1

2

3

4

Predicted MeanSamplesStandard

Cycle time estimation 187

50th Quantile Regression Curve 85th Quantile Regression Curve

0.5 0.6 0.7 0.8 0.9

510

15

Throughput x

Cyc

le T

ime

Qua

ntile

0.5 0.6 0.7 0.8 0.9

1020

3040

Throughput x

Cyc

le T

ime

Qua

ntile

Fig. 5. G/G/1 quantile regression curve with empirical quantile estimates.

runs). From the numbers provided, we can see that theRMSE obtained using down-sampling is only about 20–30% of that from the original observations.

From both analytical and experimental results, it can beobserved that the correlations among successive cycle timesare much stronger in higher throughput ranges. Therefore,it is possible to devise adaptive down-sampling based onEquation (28) in Section 3.3.3, whose sampling rate is de-termined by the correlations. In this way, the simulationlength needed can be reduced without much sacrifice onthe estimation accuracies.

4.2. G/G/1 system with FCFS queues

Generally, the G/G/1 queuing model provides more flexi-bility in approximating real systems compared to M/M/1queues. However, often the stationary distribution of thecycle time cannot be analytically derived. Therefore, in-stead of computing the relative error and absolute er-ror between our fitted model and analytical results, wecan instead illustrate the prediction accuracy of the re-gression quantile model. In this experiment, the inter-arrival time was assumed to have a lognormal distribu-tion with the log-variance one and log-mean adjusted ac-cording to the throughput requirement. The server pro-cessing time was assumed to follow an Erlang (2) distribu-tion with the rate one-half. Therefore, the mean process-ing time was one in order to be consistent with previousassumptions. As in previous experiments, ten throughputrates equally spaced between 0.5 and 0.95 were selectedfor the simulation. Ten thousand cycle time observationsunder each throughput rate were collected for model fit-ting. A new set of throughputs was chosen ranging from0.5 to 0.95, incremented in steps of 0.01. Under each

throughput, new simulations were conducted and 50 000observations were collected. The empirical sample quan-tile was estimated by using the �Tτ�th order statisticY�Tτ�,where T is the sample size (50 000 in this case). At eachthroughput point, this procedure was repeated five timesand the estimated sample quantiles are plotted along withthe fitted quantile curves in Fig. 5.

From Fig. 5, we can see that the quantile regressioncurve can satisfactorily predict the quantiles under differentthroughput rates. Only in the high throughput range do thepredictions have a large variance and are thus not reliable.However, this issue can be solved by using additional repli-cations in the simulations to collect more data for modelfitting and thus control the accuracy level of predictions.

4.3. Serial production lines

In this section, we consider a serial production system con-sisting of four workstations. The processing times at eachworkstation and the inter-arrival times are random vari-ables following general distributions. Buffers exist betweentwo adjacent workstations. The production line is illus-trated in Fig. 6.

W1 W2

W3 W4

B2

B3 B4

B1

Receiving

Shipping

Fig. 6. Illustration of the serial production line.

Downloaded By: [Chen, Nan] At: 00:44 7 January 2011 13 / 42


AGP Model

Data Characteristics

Location

Profile Value

f (x)

f (x) + �1(x)

•(x11, y11)

•(x12, y12)

f (x) + �2(x)

•(x21, y22)•

(x21, y22)

14 / 42


AGP Model

AGP Model

Yi (x) = f (x) + �i (x)

Standard

surface

Deviation

surface

Assumption

f (x) is a realization of GP(µ, s(·))�i (x) is a realization of GP(0, v(·))f (x) and �i (x) are independent

�i (x) and �j(x) are independent for i 6= j

15 / 42


AGP Model

Distributional view

A Gaussian process can be used as a prior probability distributionover functions in Bayesian inference (Rasmussen and Williams2006).

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

x

Gen

erat

ed V

alue

Realization 1Realization 2

Linear model: Y (x) = f (x) + � i .i .d ∼ F�AGP model: Y (x) = f (x) + �(x) i .i .d ∼ GP(0, v(·))

16 / 42


AGP Model

Model Estimation

Estimate the model parameters β ≡ [µ, σ21,θ1, σ22,θ2] fromobservations

Location

Profile Value

f (x)

f (x) + �1(x)

•(x11, y11)

•(x12, y12)

f (x) + �2(x)

•(x21, y22)•

(x21, y22)

∗∗∗

∗

17 / 42


AGP Model

Structure of Σ0

cov(yij , yi ′k) =

{s(xij , xi ′k) + v(xij , xi ′k), ∀i = i ′

s(xij , xi ′k), ∀i 6= i ′i , i ′ = 1, 2, · · · ,N0

+

0

0

M0 ×M0

n1 × n1

n2 × n2

nN0 × nN0

s(xij , xi ′k |θ1) v(xij , xi ′k |θ2)

XIC

XIC

X1 X2 XN0

X1

X2

XN0

18 / 42


AGP Model

MLE

Given the data from all surface profiles XIC ,YIC , we canestimate β as

β̂ = arg maxβ

{−1

2log[det(σ21S + σ

22V)]

−12

(YIC − µ1M0)T (σ21S + σ

22V)

−1(YIC − µ1M0)}.

Maximizing profile likelihood: given θ1,θ2, the correlationmatrix S,V are fixed. Then µ, σ21, σ

22 can be obtained easily.

µ =1TM0(S + ρV)

−1YIC

1TM0(S + ρV)−11M0

, ρ = σ22/σ21

σ21 =(YIC − µ1M0)T (S + ρV)−1(YIC − µ1M0)

M0

19 / 42


AGP Model

Prediction

For new unmeasured site (Xl ,Yl):(Yl

YIC

)∼ N

[(µ1nlµ1M0

),

(Σl Σl,0

ΣTl,0 Σ0

)]

Yl |YIC ∼ N(µ̃l , Σ̃l), where

µ̃l = µ1nl + Σl,0Σ−10 (YIC − µ1M0 )

Σ̃l = Σl −Σl,0Σ−10 ΣTl,0

Σl ,0 may have a different form depending on whether Yl aretaken from existing profiles or new ones.

20 / 42


AGP Model

Prediction Demonstration

0 0.1 0.2 0.3 0.4 0.5−2

−1

0

1

2

3

4

Predicted MeanSamplesStandard

0 0.1 0.2 0.3 0.4 0.50.0581

0.0582

0.0583

0.0584

0.0585

0.0586

0.0587

0.0588

0.0589

0.059

Predicted Variance

Predicted mean Predicted variance

21 / 42


T 2 Test

Statistical Testing

Location

Profile Value

•• • •

Whether the new profile deviates from f (x) within acceptableregion

Statistical testing based on the samples (where to sample?)

22 / 42


T 2 Test

T 2 Test

If the new surface conforms with the model, Yl ∼ N(µ̃l , Σ̃l)Reducing surface comparison to multivariate normal datacomparison

H0 : Yl ∼ N(µ̃l , Σ̃l) H1 : Yl 6∼ N(µ̃l , Σ̃l).

Testing statistic:

T 2l = (Yl − µ̃l)T Σ̃−1l (Yl − µ̃l),

Under H0, T2l ∼ χ2nl . Reject H0 when T

2l > HT .

23 / 42


Generalized likelihood ratio test

GLR Test

Only focus on a certain class of alternative models

Another deviation source is considered as the alternativemodels

Yl(x) = f (x) + �l(x) + ξ(x)ξ(x) is a realization of another GP(δ,w(·)).Suitable to model the global change effects

Testing hypothesis

H0 :Yl(x) = f (x) + �l(x)

H1 :Yl(x) = f (x) + �l(x) + ξ(x)

24 / 42



GLR Test

With finite number of observations

Testing hypothesis:

H0 :Yl ∼ N(µ̃l , Σ̃l)H1 :Yl ∼ N(µ̃l + δ1nl , Σ̃l + Σw ) for some nonzero δ, γ2,θl

GLR statistic:

Rl = 2 lnsupδ,γ2,θl

det(Σ̃l + Σw )−1/2 exp

[−(Yl − µ̃l − δ1nl )

T (Σ̃l + Σw )−1(Yl − µ̃l − δ1nl )/2

]det(Σ̃l )

−1/2 exp[−(Yl − µ̃l )T Σ̃

−1l (Yl − µ̃l )/2

]

Rl ∼ equal mixture χ21 - χ22 asymptotically under H0. RejectH0 when: Rl > HR .

25 / 42



Summary

N0 IC Units

ni on Unit iAGP Model

(µ̃l , Σ̃l )

New UnitYl

Xl

T 2 TestGLR Test

Accept

Reject

Continue

Disposal

26 / 42


Approximation and Estimation Performance

Approximation Performance

Standard profile (Shpak 1995):

f (x) = sin(x) + sin(10x/3) + log(x)− 0.84x + 3

Spatially correlated error: �(x) ∼ GP(0, 0.05× v(· |5))

3 4 5 6 7−2

−1

0

1

2

3

x

Pre

dict

ed m

ean

f(x)MeasurementsAGPOGP

3 4 5 6 70

0.5

1

1.5

2

2.5

x

Pred

icte

d va

rian

ce

AGPOGP

OGP Model: Yi (x) = µ+ �i (x)

27 / 42


Approximation and Estimation Performance

Bias and RMSE of MLE

Accuracy of the MLE with different sample size:

(N0, n0) µ = 1 σ2 = 0.2 θ1 = 3 τ

2 = 0.05 θ2 = 10

(10,10)Bias -0.0043 -0.0189 0.4375 -0.0002 0.7089RMSE 0.1824 0.1001 1.6348 0.0080 4.3834

(10,20)Bias -0.0013 -0.0189 0.1756 0.0001 0.0011RMSE 0.1831 0.0975 0.9608 0.0066 0.9204

(20,10)Bias 0.0106 -0.0103 0.2528 0.0000 0.4140RMSE 0.1903 0.1038 1.1990 0.0056 3.1826

(20,20)Bias 0.0015 -0.0169 0.1317 0.0002 0.0001RMSE 0.1850 0.0920 0.7562 0.0045 0.5976

28 / 42


Monitoring Performance

Three Change Scenarios

Y (x) = f (x) + �(x)

Mean (µ)

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

−1

0

1

2

3

4

StandardShifted

Variance (σ22)

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

−1

0

1

2

3

4

StandardShifted

Correlation (θ2)

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5−2

−1

0

1

2

3

4

StandardShifted

29 / 42



Performance of Different Tests

Three tests to compare:Max-Min TestGLR TestT 2 Test

Shift magnitude

Bet

a er

ror

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5

Mean

0.1 0.2 0.3 0.4 0.5

Variance

0 5 10 15

Correlation

MaxMin GLR T2

30 / 42



Effect of Testing Sample Size (nl)

31 / 42



Effect of In Control Sample Size (N0, n0)

Shift magnitude

Bet

a er

ror

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5

Mean

GLR

0.1 0.2 0.3 0.4 0.5

Variance

GLR

0 5 10 15

Correlation

GLR

0.0 0.1 0.2 0.3 0.4 0.5

Mean

T2

0.1 0.2 0.3 0.4 0.5

Variance

T2

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Correlation

T2

(10,10) (20,10) (10,20) (20,20)

32 / 42


Real Application

Monitoring Wafer Thickness Profile

Data are collected from real production plant;

8 in control wafers to construct AGP model, 30 wafers to betested;

120 measurements from each in control wafer to constructAGP model;

480 measurements from each testing wafer to conduct tests.

33 / 42


Real Application

Demos of Thickness Profile

In control wafer #2 In control wafer #7Approximatedstandard profile

34 / 42


Real Application

p-Values of the Tests

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

30 Wafer Surfaces

p−

Val

ue

T2

GLRSignificant Level

35 / 42


Real Application

Rejected Wafers (p-values)

#12 (T 2:0.9250

GLR:1.3051×10−4)#23 (T 2:0.0018

GLR:2.2178×10−11)#24 (T 2:3.7191×10−4

GLR:3.4084×10−14)

#26 (T 2:2.5678×10−4

GLR:9.5180×10−9)#28 (T 2:1.1102×10−16GLR:0)

#30 (T 2:7.2819×10−4

GLR:2.5700×10−11)

36 / 42


Open issues

Optimal Design for AGP

Nonparametric model, Fisher information matrix is notenough

−60 −40 −20 0 20 40 60−60

−40

−20

0

20

40

60

Ordinary space filling design for deterministic experiments

does not consider geometric featuredoes not consider the error process

37 / 42


Open issues

Optimality Criteria

Prediction accuracy: minimize (integrated) RMSE

Determine N0, n0 and xijApproximation accuracy of f (x) and error process estimationσ22 ,θ2Sequential allocation strategy (Ankenman et al. 2010)

Detection power: minimize β error

T 2 test: when only µ changes, the Mahalanobis distance

δ′Σ̃−1l δ determines the power, where

Σ̃l = Σl −Σl,0Σ−10 ΣTl,0

Constant mean shift: maxXl 1′Σ̃−1l 1

D-optimal: maxXl det Σ̃−1l =⇒ minXl det Σ̃l

38 / 42


Open issues

GP with “Covariates”

Surface profile depends on other factors: speed, force, materials,etc.

GP modelGP with inde-pendent errors

• Ankenman et al. (2010)

GP withdependent errors

Multivariateoutput/response

• Co-kriging• Zhou et al. (2011); Qian et al. (2008)• Different distance metrics

Surface response

39 / 42


Open issues

Conclusion

AGP model is suitable to approximate surface profile andquantify dependent deviations;

A simple and flexible framework for process monitoring

Need to further consider design issues and extend the modelto the case with covariate

40 / 42

Reference I

Ankenman, B., Nelson, B., and Staum, J. (2010), “Stochastic kriging for simulation metamodeling,” OperationsResearch, 58, 371–382.

Cressie, N. (1993), Statistics for Spatial Data, revised edition, vol. 928, Wiley, New York.

Doering, R. and Nishi, Y. (2007), Handbook of semiconductor manufacturing technology, CRC Press, Boca Raton,FL.

Jin, R., Chang, C., and Shi, J. (2012), “Sequential measurement strategy for wafer geometric profile estimation,”IIE Transactions, 44, 1–12.

Qian, P. Z. G., Wu, H., and Wu, C. F. J. (2008), “Gaussian Process Models for Computer Experiments withQualitative and Quantitative Factors,” Technometrics, 50, 383–396.

Rasmussen, C. E. and Williams, C. K. I. (2006), Gaussian Processes for Machine Learning, MIT Press, Boston.

Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989), “Design and analysis of computer experiments,”Statistical science, 4, 409–423.

Santner, T., Williams, B., and Notz, W. (2003), The design and analysis of computer experiments, Springer, NewYork.

Shpak, A. (1995), “Global optimization in one-dimensional case using analytically defined derivatives of objectivefunction,” Computer Science Journal of Moldova, 3, 168–184.

Zhao, H., Jin, R., Wu, S., and Shi, J. (2011), “Pde-constrained gaussian process model on material removal rate ofwire saw slicing process,” Journal of Manufacturing Science and Engineering, 133, 21012.1–21012.9.

Zhou, Q., Qian, P. Z. G., and Zhou, S. (2011), “A Simple Approach to Emulation for Computer Models withQualitative and Quantitative Factors,” Technometrics, 53, 266–273.

Thanks and questions

IntroductionStatistical Quantification using AGP ModelStatistical Monitoring of Geometric QualityCase StudiesConclusion and Future Directions

Monitoring Wafer Geometric Quality using Additive Gaussian...

Documents

Transcript of Monitoring Wafer Geometric Quality using Additive Gaussian...