Huong Dan Su Dung Stata

110
Sö dông ch¬ng tr×nh Stata ®Ó khai th¸c sè liÖu §iÒu tra Møc sèng hé gia ®×nh (VLSS) * néi dung CH¬NG I: GIÍI THIÖU CHUNG VÒ CH¬NG TR×NH STATA.................2 1. TÆ CHØC LU TR÷ D÷ LIÖU TRONG STATA (DATASET IN STATA)...........2 2. KHËI ®ÉNG THO¸T KHÁI STATA (OPEN AND EXIT)..................3 3. GIAO DIÖN STATA 7 (STATA INTERFACE)...........................3 4. BIªN BN LΜM VIÖC (LOG FILE).................................6 5. NHËP LU D÷ LIÖU (USE, INPUT AND AND SAVE)...................7 CH¬NG II: KHAI TH¸C D÷ LIÖU...................................10 1. CÊU TRÓC LÖNH TRONG STATA (STATA COMMAND SYNTAX)...............10 2.TO¸N HΜM (OPERATORS AND FUNCTIONS)....................13 3. M« TD÷ LIÖU (DATA REPORTING)..............................14 4. BIªN TËP SÖA CH÷A D÷ LIÖU (DATA MANIPULATION)..............26 5. QUYÒN TRONG VHLSS (WEIGHT)..............................39 CH¬NG III: KIÓM ®ÞNH GI¶ THIÕT VΜ PH©N TÝCH HÅI QUY...........44 1. ¦ÍC LÎNG KIÓM ®ÞNH GITHIÕT (ESTIMATION AND HYPOTHESIS TESTING)44 2. PH©N TÝCH T¬NG QUAN HÅI QUY (CORRELATION AND REGRESSION).....50 CH¬NG IV: VÏ ®Å THÞ...........................................57 1. VÏ ®Å THÞ (GRAPH).........................................57 2. MÉT LO¹I ®Å THÞ THÊNG DÏNG...............................65 3. LU TR÷ HIÓN THÞ ®Å THÞ (SAVING AND GRAPH USING)............72 CH¬NG V: LËP TR×NH TRONG STATA................................74 1. GIÍI THIÖU CHUNG VÒ CH¬NG TR×NH DO-FILE.......................74 2. LOCAL GLOBAL MACROS......................................78 3. TÝCH V« HÍNG MA TRËN (SCALAR AND MATRIX)...................81 4. LÖNH ®IÒU KIÖN VSSNG LÆP................................83 5. GIÍI THIÖU VÒ FILE ADO.....................................85 TΜI LIÖU THAM KH¶O............................................87 Phô lôc.......................................................87 1

Transcript of Huong Dan Su Dung Stata

S dng ch-ng trnh Stata khai thc s liu iu tra Mc sng h gia nh (VLSS) *

ni dung Ch-ng I: Gii thiu chung v ch-ng trnh Stata ............... 2 1. 2. 3. 4. 5. T chc l-u tr d liu trong Stata (Dataset in Stata) ... 2 Khi ng v thot khi Stata (Open and exit) ............ 3 Giao din Stata 7 (Stata interface) ...................... 3 Bin bn lm vic (log file) ............................. 6 Nhp v l-u d liu (Use, input and and save) ............ 8

Ch-ng II: Khai thc d liu .................................. 10 1. Cu trc lnh trong Stata (Stata command syntax) ........ 11 2.Ton t v hm s (Operators and functions) .............. 14 3. M t d liu (Data reporting) .......................... 16 4. Bin tp v sa cha d liu (Data manipulation) ........ 28 5. Quyn s trong VHLSS (Weight) ........................... 43 Ch-ng III: Kim nh gi thit v phn tch hi quy .......... 47 1. c l-ng v kim nh gi thit (Estimation and hypothesis testing) ................................................... 47 2. Phn tch t-ng quan v hi quy (Correlation and regression) ................................................ 54 Ch-ng IV: V th .......................................... 61 1. V th (graph) ....................................... 61 2. Mt s loi th th-ng dng .......................... 70 3. L-u tr v hin th th (Saving and graph using) ..... 77 Ch-ng V: Lp trnh trong Stata ............................... 79 1. 2. 3. 4. 5. Gii thiu chung v ch-ng trnh do-file ................ 79 Local v global macros .................................. 84 Tch v h-ng v ma trn (scalar and matrix) ............ 87 Lnh iu kin v vng lp .............................. 90 Gii thiu v file ado .................................. 92

Ti liu tham kho ............................................ 94 Ph lc ....................................................... 94

1

Ch-ng I: Gii thiu chung v ch-ng trnh Stata

1. T chc l-u tr d liu trong Stata (Dataset in Stata) Stata l phn mm thng k s dng qun l, phn tch s liu v v th. Stata cho php l-u tr thng tin v cc c im ca cc i t-ng nghin cu. S liu l-u tr trong Stata c th -c hin th d-i dng bng nh- v d sau: hhcode 101 102 103 headname Nguyen Van A Le Thi B Tran Van C hhsize 6 5 10 incomepc 2100 3210 1200

Quan st (bn ghi) Mi mt hng ngang ca bng s liu -c gi l mt quan st (observation), hay mt bn ghi (record) l-u tr s liu v mt i t-ng nghin cu. v d trn c 3 quan st l-u tr s liu v M h (hhcode); Tn ch h (headname); Quy m h (hhsize); Thu nhp bnh qun (incomepc) ca 3 h gia nh. Bin (tr-ng; thuc tnh) Thng tin v i t-ng nghin cu -c thu thp v l-u tr theo cc c im ca chng. Cc c im ny -c gi l bin (variable), hay tr-ng (field). Bin -c xem l cc ct ca bng s liu. v d trn c 4 bin, vi tn l hhcoed, hedname, hhsize, v incomepc. Tn bin di t 1 n 32 k t, -c bt u ch hoc du gch d-i (_). Tn bin ch bao gm ch, s v du gch d-i. Cc k t c bit khc khng th dng t tn cho bin. Bin xc nh (identifying variables) Thng th-ng trong cc bin s c cc bin dng nhn dng quan st, -c gi l bin xc nh. Nh c cc bin xc nh ny m cc quan st c th phn bit -c vi nhau. Mi mt quan st c mt gi tr ca cc bin ny. v d trn, bin xc nh l hhcode, i vi mi mt quan st bin hhcode nhn mt gi tr. Cc c im ca bin Cc bin c th -c gn nhn (ch thch). V d bin hhcode c th -c gn nhn l M h. Bin c th -c nh dng (format) l bin s v bin k t vi cc loi l-u tr khc nhau. Bin s c th l-u tr d-i loi byte; int; long; float; double. Cn bin k t th c th l-u tr d-i dng str1 n str80 cho cc di khc nhau.2

Kiu l-u tr dng s byte int long float double

Dung l-ng (Byte) 1 2 4 4 8

Gi tr nh nht

Gi tr ln nht

Kiu

-127 126 -32,767 32,766 -2,147,483,647 2,147,483,646 -10^36 10^36 -10^308 10^308

S nguyn S nguyn S nguyn S thc S thc

Cc bin s c th bao gm cc bin ri rc v lin tc. Cc bin nh- l quy m h gia nh, gii tnh ch h, vng a l, trnh gio dc l cc bin ri rc (discrete) (hay cn gi l bin phn loi (categorical)). Cc bin ny c th -c l-u tr d-i dng byte, int, v long. Cc bin lin tc (continuous) nhthu nhp, chi tiu ca h th l-u tr d-i dng float hoc double. Bin k t (string) dng l-u tr cc loi k t. V d bin headname l bin kiu k t dng l-u tr tn ca ch h. Kiu l-u tr dng ch str1 str2 ... str80 Byte di ln nht 1 2 80

1 2 80

2. Khi ng v thot khi Stata (Open and exit) Stata -c khi ng t-ng t nh- cc ch-ng trnh tin hc ng dng khc, bng cch kch vo biu t-ng ca tp wstata.exe trong Windows explorer, hoc chn bng cch chn Start -> Program -> Stata. Ch-ng trnh -c thot ra bng lnh exit t ca s lnh Stata Command, hoc tu chn exit trong thc n (menu) File. 3. Giao din Stata 7 (Stata interface)1 Sau khi Stata -c khi ng, giao din ca Stata s -c hin ln, bao gm thanh thc n (menu bar) trn cng, d-i l thanh cng c (tool bar) v cc ca s (windows).

1

Phin bn Stata 8 c giao din t-ng t nh- phin bn Stata 7. Khc bit ln nht l Stata 8 c thm tu chn Statistics trong thanh thc n. Tu chn ny cho php thc hin cc mt s lnh thng k bng cc tu chn qua giao din ca s m khng phi g cc lnh trong ca s Command. 3

Cc ca s ca Stata Cc ca s ca Stata -c m ra bng vic la chn cc tu chn thanh thc n Windows (menu bar). Cc ca s ny bao gm: Results Graph Viewer Command Review Variables Data editor Do-file editor Hin th cc lnh v kt qu Hin th th Hin th ca s tr gip (help) v hin th ni dung cc file vn bn (text) Dng g cc cu lnh Hin th cc lnh thc hin Hin th danh sch cc bin ca tp s liu Hin th v sa cha s liu d-i dng bng Hin th ca s son tho ch-ng trnh

Thanh thc n (Menu bar)

4

Bng cch kch vo thanh thc n v cc tu chn trong , Stata s thc hin cc lnh khc nhau. Thanh thc n bao gm cc nhm lnh sau y: File Open View Save Save as File name Log Save graph Print graph Print results Exit M file s liu Xem cc file ca Stata trong ca s Viewer L-u file s liu L-u file s liu d-i tn mi Chn tn file -a vo ca s lnh ng, m, xem li log file L-u gi file th In th In kt qu Thot khi Stata

Edit Copy text Copy tables Paste Table options Graph options Sao chp vn bn (text) Sao chp bng biu Dn copy La chn sao chp bng s liu copy La chn sao chp th (khng c trong Stata 7)

Prefs kch c

Cc tu chn v mu sc, phng ch, v

Windows Results Graph Log Viewer Command M ca s kt qu M ca s th M ca s log file M ca s tr gip (help) v xem ni dung file M ca s cu lnh5

Review Variables Help/Search Data editor Do-file editor

M ca s cc lnh thc hin M ca s danh sch cc bin ca tp s liu M ca s tr gip (help) M ca xem s liu l-u tr d-i dng bng M ca s vit ch-ng trnh

Help dng Stata Thanh cng c (tool bar)

Cc tr gip lin quan n vic s

Cc tu chn trn thanh cng c -c thit k thc hin cc lnh thng dng ca Stata. Nu chng ta di chuyn con tr n cc nt ny th s hin ln cc cu hung dn, bao gm: Open (use) Save Print results Begin log Start viewer Bring Dialog to font Bring Result to font M file s liu Stata L-u tr file s liu ra a In ni dung ca ca s kt qu M, ng v xem ni dung ca file log M ca s tr tr (help) Window -a ca s hp thoi ra pha tr-c Window -a ca s kt qu ra pha tr-c

Bring Graph Window to -a ca s v th ra pha tr-c font Do-file editor Data editor Data browser Clear condition Break M ca s son tho ch-ng trnh M ca s sa cha s liu M ca s xem s liu more- Tt lnh more Dng vic thc hin lnh hoc ch-ng trnh

4. Bin bn lm vic (log file) Thng th-ng khi lm vic vi Stata, ng-i s dng mun ghi li bin bn lm vic bao gm cc lnh, cc thng bo v cc kt qu6

phn tch thu -c. Stata cho php ghi li cc bin bn lm vic bng lnh log using. C php: log using (-ng dn\tn tp) [, append replace [ text | smcl ] ] Cc tu chn: append replace text smcl V d: log using baitap1 To tp baitap1 ghi li bin bn lm vic ti th- mc hin thi, phn m rng mc nh l smcl Ghi bin bn lm vic tip vo 1 file c sn Ghi li bin bn lm vic ln 1 file c sn To bin bn lm vic d-i (text) (phn m rng l log) dng vn bn

To bin bn lm vic d-i dng smcl (phn m rng l smcl), y cng l tu chn ngm nh

. log using baitap1 -----------------------------------------------------------------------------log: log type: opened on: log using replace C:\baitap1.smcl smcl 17 Feb 2004, 15:32:03 baitap1, To tp baitap1 baitap1 c sn ghi ln tp

log using To tp baitap2 ti a D, d-i d:\baitap2, text dng vn bn (text) (phn m rng l log) log using Ghi tip tc bin bn lm vic tp d:\baitap2, append baitap2 ti a D Cc tp vi phn m rng smcl c th chuyn thnh cc tp text bng lnh translate. V d: translate baitap1.smcl log off exercise1.log

7

Lnh ny tm thi dng vic ghi li bin bn lm vic vo tp log/smcl ang m log on Lnh ny tip tc ghi bin bn lm vic vo tp log ang m. Lnh ny -c dng sau ln log using hoc log off. log close Lnh ny ng v l-u tr tp log ang m. Ch : Stata cho php ch ghi li nhng g m ng-i s dng g trong ca s command, vic ny gip cho vic sau ny vit cc ch-ng trnh da trn nhng bin bn lm vic. C php: cmdlog using (-ng dn\tn tp) [, append replace] cmdlog {off | on | close} xem cc file log/smcl vo thanh thc n: file/log/view (hoc ca s lnh command g: view (tn tp)); hoc c th m bng cc ch-ng trnh son thao vn bn khc nhMS-Word; Notepad

5. Nhp v l-u d liu (Use, input and and save) M tp s liu ang c: C php: use (-ng dn\tn tp) Lnh ny m tp Stata, vi phn rng l .dta, -c ch ra tn tp. V d: use ho1.dta use "D:\VHLSS 2004\ho1.dta", clear m tp ho1.dta th- mc hin thi m tp ho1.ta th- mc VHLSS 2004 trn D

Tp s liu Stata c th -c m bng la chn Open trn thc n File; hoc nt Open (use) trn thanh cng c tool bar. Nu file s liu c dung l-ng ln th chng ta phi thit lp b nh cn dng cho Stata bng lnh: set memory #[k|m] V d: set mem 32m set mem 32000k8

Nhp s liu C mt s cch nhp s liu t bn phm vo b nh ca Stata. S dng ca s Stata editor nhp s liu. Hoc t ca s command, g lnh edit. Sau nhp s liu theo kiu biu bng trong ca s ny. S dng lnh: input [danh sch bin + nh dng nu cn] Sau s dng bn phm nhp s liu ln l-t cho cc bin ca tng quan st. Gi tr -c nhp cch nhau 1 k t trng. Kt thc nhp s liu bng lnh end. V d: . input hhcode str15 name income hhcode name income

-

1. 101 "Nguyen Van A" 1200 2. 102 "Nguyen Van B" 1350 3. 103 "Tran Thi C" 2310 4. end Stata cho php nhp s liu t cc file c s d liu khc. Tr-c ht cc file s liu ny cn -c l-u tr d-i dng text (c th bng ch-ng trnh Excel), cc quan st -c cc nhau 1 dng v cc gi tr cch nhau 1 du phy (commas) hoc du cch (tab). Sau dng lnh insheet nhp s liu ny vo Stata. C php: insheet [danh sch bin] using (tn tp text) [, [no]names comma tab clear] Lnh ny s c vo b nh ca Stata cc quan st ca tp text, v ch ra tn cc bin s -c to ra. Cc tu chn: [no]names comma tab clear V d: . insheet using c:\income.txt9

Cho php nhp tn bin -c ch ra dng th nht ca file text Thng bo l cc gi tr ca file text -c phn cch bng du phy Thng bo l cc gi tr ca file text -c phn cch bng du tab S liu -c c vo s thay th s liu ang -c th-ng tr trong b nh ca Stata

(3 vars, 4 obs) . insheet maho hoten thunhap using c:\income.txt (note: variable names in file ignored) (3 vars, 4 obs) L-u tr s liu C php: save (-ng dn\tn tp) [,replace] Lnh ny l-u tr s liu ang trong b nh ca Stata thnh tp ch nh d-i tn tp. Nu tu chn replace -c ch ra th tp s liu ny s ghi ln tp hin thi (tt nhin tn tp s liu l ging nhau). Vic l-u tr s liu c th thc hin bng cc ty chn Save v Save as trong thanh thc n (menu bar); hoc nt Save trn thanh cng c (tool bar). Ch : Xem thm lnh infile v outfile

Ch-ng II: Khai thc d liu

10

1. Cu trc lnh trong Stata (Stata command syntax) Cu trc c bn ca mt lnh trong Stata nh- sau: [by danh sch bin:] C php lnh [danh sch bin] [biu thc] [iu kin] [phm vi] [quyn s] [, tu chn] Trong phn H-ng dn s- dng (Help) ca Stata, c php lnh trnh by bng ting Anh nh- sau: [by varlist:] command [weight] [, options] [varlist] [=exp] [if exp] [in range]

Trong du ngoc vung k hiu cc tu chn. Ch : Cc cu lnh Stata -c vit bng ch th-ng. i vi tn bin, Stata phn bit ch vit th-ng vi ch vit hoa. V d, trong cng mt tp s liu, bin Ho_ten v bin ho_ten l 2 bin khc nhau. Cc tu chn -c k hiu trong du ngoc vung tu chn ny c th c hoc khng trong cu lnh. s bt buc (tn bin) -c t trong du ngoc cu lnh s khng thc hin -c nu cc tham s ny khng -c khai bo. [ ]. Cc Cc tham < >. Cc bt buc

-

-

Mt s lnh Stata cho php vit tt. V d lnh summarize c th vit tt l sum. Trong cun ti liu ny phn gch chn d-i c php ca cu lnh l c php vit tt ca cu lnh . Cc v d trong cun ti liu ny s dng s liu iu tra Mc sng dn c- nm 1998 do Tng cc Thng k tin hnh. Trong Tp chi tiu tng hp Hhexp98n.dta th-ng xuyn -c s dng.

-

by danh sch bin (by varlist): Stata s thc hin cu lnh vi theo tng gi tr -c ch ra bi danh sch bin. Bin -c ch ra bi danh sch bin -c yu cu sp xp tr-c khi thc hin lnh. V d:. sort sex . by sex: sum -> sex = 1 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1

11

rlpcex1 | -> sex = 2

4375

2980.906

2430.648

357.318

45801.71

Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 1624 3748.368 3231.241 376.9805 30624.77

. sort sex urban98 . by sex urban98: sum rlpcex1

-> sex = 1, urban98 = Rural Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 3344 2308.134 1345.671 357.318 24386.43 -> sex = 1, urban98 = Urban Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 1031 5163.01 3602.245 682.9575 45801.71 -> sex = 2, urban98 = Rural Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 925 2553.448 1776.178 376.9805 25527.95 -> sex = 2, urban98 = Urban Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 699 5329.628 3962.946 1057.797 30624.77

Danh sch bin (varlist) Ch ra danh sch cc bin chu tc ng ca cu lnh. Nu nhkhng c bin no -c ch ra th lnh Stata s c tc dng ln tt c cc bin (all variables) V d:. sum hhsize sex reg7 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------hhsize | 5999 4.752292 1.954292 1 19 sex | 5999 1.270712 .4443645 1 2 reg7 | 5999 4.01917 2.145305 1 7

12

. sum Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------househol | 5999 19617.86 11201.92 101 38820 year | 5999 97.94666 .2247337 97 98 month | 5999 6.340723 3.011082 1 12 --Break-r(1);

Lnh sum ny hin th thng k c bn ca tt c cc bin trong tp s liu. iu kin (if exp) Stata ch thc hin cu lnh i vi cc quan st m gi tr ca n cho kt qu ca biu thc l ng. V d:. sum poor if reg7==1

Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 859 .4982538 .5002882 0 1

Lnh ny ch c tc dng i vi cc quan st m bin reg7 c gi tr bng 1. Phm vi (in range) Ch ra phm vi cc quan st chu tc ng ca cu lnh. Range (phm vi) c th c cc dng sau: sum poor in 10 Tnh gi tr trung bnh ca bin poor cho quan st 10 (chnh bng gi tr ca bin poor ti quan st th 10)

sum poor 10/100 sum poor f/100 sum poor 100/l

in Tnh gi tr trung bnh ca bin poor cho quan st t 10 n 100 in Tnh gi tr trung bnh ca bin poor cho quan st t u tin n 100 in Tnh gi tr trung bnh ca bin poor cho quan st t th 100 n quan st cui cng

Quyn s (weight) Cho php tnh ton s dng quyn s. Tu chn v quyn s s -c trnh by k mc 5 ca ch-ng ny.13

Cc tu chn (Options) Nhiu cu lnh Stata cho php cc tu chn ring. Cc tu chn ny -c ch ra sau du phy. V d: Lnh sum c tu chn l detail, cho php tnh ton thm mt s thng k khc ngoi gi tr trung bnh v lnh chun.. sum rlpcex1, detail comp.M&Reg price adj.pc tot exp ------------------------------------------------------------Percentiles Smallest 1% 682.9575 357.318 5% 1012.433 366.2792 10% 1238.088 376.9805 Obs 5999 25% 1671.054 381.3502 Sum of Wgt. 5999 50% 75% 90% 95% 99% 2397.042 3711.917 5940.803 8045.32 14163.04 Largest 26944.64 30624.77 31066.5 45801.71 Mean Std. Dev. Variance Skewness Kurtosis 3188.667 2692.567 7249918 3.791027 29.21398

Ch : Stata cho php vit tt cc lnh v ty chn. Trong ti liu ny, phn gch chn d-i cc lnh c ngha l lnh c th vit tt bng k t trong phn gch chn ny. V d nh- lnh use c ngha l c th -c vit tt bi u. C php ca cc cu lnh trong ti liu ny -c vit bng ting Anh, cho php ng-i c c th i chiu vi phn h-ng dn s dng trong Stata.

-

2.Ton t v hm s (Operators and functions) Cc ton t (operators) Cc ton t trong Stata -c k hiu nh- sau: K hiu S hc + * / ^ Quan h14

ngha Cng Tr Nhn Chia Lu tha

> < >= tabulation of urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 4269 71.16 71.16

Cho php hin th gi tr s ca bin, ch

20

Urban | 1730 28.84 100.00 ------------+----------------------------------Total | 5999 100.00 -> tabulation of reg7 Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00

To bng tn sut 2 chiu C php: tabulate [quyn s] [iu kin] [phm vi] [, chi2 missing nofreq cell column row] tab2 [quyn s] [iu kin] [phm vi] [, chi2 missing nofreq cell column row] Lnh tablulate ny tnh v hin th bng tn sut 2 chiu ca 2 bin -c ch ra. Lnh tab2 to bng tn sut 2 chiu ca tng cp bin -c ch ra trong danh sch bin. V d:. tab urban98 farm 1:urban | Type of HH (1:farm; 98; | 0:nonfarm) 0:rural 98 | non farm farm | Total -----------+----------------------+---------Rural | 1021 3248 | 4269 Urban | 1540 190 | 1730 -----------+----------------------+---------Total | 2561 3438 | 5999

Cc tu chn: chi2 missing Thc hin kim nh gi thit l hai bin c lp Cho php cc quan st khng c gi tr -c xp vo 1 loi21

nofreq cell column row V d:. tab

Khng hin th tn sut Hin th tn sut t-ng i (t l %) ca cc Hin th tn sut t-ng i (t l %) ca cc theo ct Hin th tn sut t-ng i (t l %) ca cc theo hng

reg7 urban98, cell nof

| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 11.20 3.12 | 14.32 region2 | 13.05 6.53 | 19.59 region3 | 10.00 1.80 | 11.80 region4 | 8.37 4.20 | 12.57 region5 | 6.13 0.00 | 6.13 region6 | 8.57 8.48 | 17.05 region7 | 13.84 4.70 | 18.54 -----------+----------------------+---------Total | 71.16 28.84 | 100.00 . tab farm urban98, column row Type of HH | 1:urban 98; 0:rural (1:farm; | 98 0:nonfarm) | Rural Urban | Total -----------+----------------------+---------non farm | 1021 1540 | 2561 | 39.87 60.13 | 100.00 | 23.92 89.02 | 42.69 -----------+----------------------+---------farm | 3248 190 | 3438 | 94.47 5.53 | 100.00 | 76.08 10.98 | 57.31 -----------+----------------------+---------Total | 4269 1730 | 5999 | 71.16 28.84 | 100.00 | 100.00 100.00 | 100.00

3.11. To bng thng k tng hp bng lnh tabulatesummarize C php: tabulate [quyn s] [iu kin] [phm vi] , summarize(tn bin 3) [means standard freq missing ] Lnh ny to bng mt hoc hai chiu nh ngha bi bin 1 hoc bin 2 v mi cho gi tr thng k trung bnh, lch chun v tn sut ca bin 3.22

V d:. tab farm urban98, sum(poor) Means, Standard Deviations and Frequencies of poor Type of HH | 1:urban 98; 0:rural (1:farm; | 98 0:nonfarm) | Rural Urban | Total -----------+----------------------+---------non farm | .2791381 .06168831 | .14837954 | .44879538 .24066673 | .35554523 | 1021 1540 | 2561 -----------+----------------------+---------farm | .42302956 .12105263 | .4063409 | .4941161 .32705022 | .49122109 | 3248 190 | 3438 -----------+----------------------+---------Total | .3886156 .06820809 | .29621604 | .48749275 .25217555 | .45662551 | 4269 1730 | 5999

Cc tu chn: means standard freq missing V d:. replace poor=poor*100 (1777 real changes made) . format poor %4.2f . tab reg7 urban98, sum(poor) means Means of poor | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 61.46 8.02 | 49.83 region2 | 32.57 5.87 | 23.66 region3 | 44.83 10.19 | 39.55 region4 | 37.25 11.51 | 28.65 region5 | 47.28 . | 47.28 region6 | 12.45 2.16 | 7.33 region7 | 35.78 10.28 | 29.32 -----------+----------------------+---------Total | 38.86 6.82 | 29.62

Hin th mi gi tr trung bnh Hin th mi gi tr lch chun Hin th mi gi tr tn sut Cho php cc quan st khng c gi tr -c xp vo 1 loi

23

3.12. To bng thng k tng hp bng lnh tabstat C php: tabstat [quyn s] [iu kin] [phm vi] [, statistics(c php tk [...]) by(tn bin) missing format[(%fmt)]] Lnh ny tnh ton cc thng k ca cc bin -c ch ra bi danh sch bin cho tng gi tr ca bin phn loi (categorical) -c ch ra bi by(tn bin). V d:. tabstat rlfood rlhhex1, stats(mean median) by(reg7)

Summary statistics: mean, p50 by categories of: reg7 (Code by 7 regions) reg7 | rlfood rlhhex1 --------+-------------------region1 | 5595.556 9560.349 | 5350.916 8536.373 ----------------------------region2 | 6419.427 12951.14 | 5664.145 9997.146 ----------------------------region3 | 5692.201 10885.38 | 5369.411 9022.334 ----------------------------region4 | 6512.576 13525.41 | 5790.046 11077.51 ----------------------------region5 | 5894.983 11217.05 | 5380.505 9421.447 ----------------------------region6 | 9746.158 23515.01 | 8428.743 18514.39 ----------------------------region7 | 6556.616 13068.11 | 6066.128 11043.99 ----------------------------Total | 6787.898 14010.74 | 5951.567 10733.19 -----------------------------

Cc tu chn: statistics(statname [...]) by(tn bin) Missing Ch ra thng k cn tnh cho danh sch bin Ch ra bin phn loi (categorical) Gi tr thiu (mising) ca bin loi -c xem nh- 1 loi24

format[(%fmt)]

Ch ra nh dng ca s liu hin th

Stata cho php cc loi thng k -c ch ra bi statistics(c php thng k [...]) nh- sau: C php thng k mean count n quan st) sum max min range Gi tr nh nht sd sdmean skewness kurtosis median p1 p5 p10 p25 p50 p75 p90 p95 p99 iqr q V d:. tabstat rlpcex1, stats(mean sd q) by(reg7) format(%5.1f)

ngha Trung bnh mean m s quan st Ging nhlnh count (m s

Tng cng Gi tr ln nht Gi tr nh nht Bin = Gi tr ln nht -

lch chun lch chun ca trung bnh = lch chun / {(S quan st)^0.5} lch ca phn phi nhn Trung v (Ging nh- p50) 1% phn v 5% phn v 10% phn v 25% phn v 50% phn v (trung v) 75% phn v 90% phn v 95% phn v 99% phn v p75 - p25 t-ng -ng vi "p25 p50 p75"

Summary for variables: rlpcex1 by categories of: reg7 (Code by 7 regions)

25

reg7 | mean sd p25 p50 p75 --------+-------------------------------------------------region1 | 2174.8 1265.1 1328.0 1792.1 2710.8 region2 | 3294.0 2511.9 1816.7 2532.5 3822.0 region3 | 2503.3 1918.0 1489.7 2001.2 2808.1 region4 | 2933.7 2260.5 1697.9 2362.2 3471.4 region5 | 2087.3 1285.4 1217.3 1850.8 2700.5 region6 | 5257.5 4005.7 2676.7 4154.1 6431.8 region7 | 2931.1 2137.2 1680.1 2321.9 3414.7 ----------------------------------------------------------Total | 3188.7 2692.6 1671.1 2397.0 3711.9 -----------------------------------------------------------

3.13. To bng thng k tng hp bng lnh table C php: table [bin ct [bin ct trn cng]] [iu kin] [phm vi] [quyn s] [, contents(ni dung) row col format(%fmt) missing] Lnh ny cho php tnh cc thng k ca cc bin -c ch ra trong contents theo dng bng, trong cc hng -c nh ngha bi bin dng, cn cc ct -c nh ngha bi bin ct (v bin ct trn cng). Cc bin hng v ct ny l cc bin phn loi (categorical). V d:. table reg7 urban98 farm, contents(mean poor) ---------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and | 1:urban 98; 0:rural 98 Code by 7 | ---- non farm --------- farm -----regions | Rural Urban Rural Urban ----------+----------------------------------------region1 | 19.35484 6.015038 65.7377 12.96296 region2 | 26.66667 4.624278 33.96524 15.21739 region3 | 40.98361 10.11236 45.8159 10.52632 region4 | 21.6 11.63793 42.44032 10 region5 | 30.76923 49.24012 region6 | 15.04065 2.195609 10.07463 0 region7 | 38.62816 10.04184 34.35805 11.62791 ----------------------------------------------------

Cc tu chn: Contents(ni dung) Lit k danh sch cc bin v cc thng k. Cc k hiu thng k t-ng t nh- lnh tabstat26

row col format(%fmt) missing xem nh- 1 loi V d:

Hin th thng k tng ca cc dng Hin th thng k tng ca cc ct Ch ra nh dng ca s liu hin th Gi tr thiu (mising) ca bin loi -c

. table reg7 urban98 farm, contents(mean poor) row col format(%4.2f) -----------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and 1:urban | 98; 0:rural 98 Code by 7 | ----- non farm ---------- farm -----regions | Rural Urban Total Rural Urban Total ----------+------------------------------------------region1 | 19.35 6.02 10.26 65.74 12.96 61.45 region2 | 26.67 4.62 11.29 33.97 15.22 32.70 region3 | 40.98 10.11 27.96 45.82 10.53 44.47 region4 | 21.60 11.64 15.13 42.44 10.00 40.81 region5 | 30.77 30.77 49.24 49.24 region6 | 15.04 2.20 6.43 10.07 0.00 9.78 region7 | 38.63 10.04 25.39 34.36 11.63 32.72 | Total | 27.91 6.17 14.84 42.30 12.11 40.63 -----------------------------------------------------. table urban98 farm, contents(mean poor sd poor) row col format(%4.2f) ---------------------------------------1:urban | 98; | Type of HH (1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total ----------+----------------------------Rural | 27.91 42.30 38.86 | 44.88 49.41 48.75 | Urban | 6.17 12.11 6.82 | 24.07 32.71 25.22 | Total | 14.84 40.63 29.62 | 35.55 49.12 45.66 ---------------------------------------. table urban98 format(%4.2f) farm, contents(mean rlpcex1 mean rlhhex1) row col

---------------------------------------1:urban | 98; | Type of HH (1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total

27

----------+----------------------------Rural | 2835.83 2212.12 2361.29 | 13242.03 10120.89 10867.36 | Urban | 5476.86 3232.17 5230.33 | 22984.44 11903.19 21767.43 | Total | 4423.95 2268.49 3188.67 | 19100.41 10219.39 14010.74 ----------------------------------------

4. Bin tp v sa cha d liu (Data manipulation) 4.1. To bin mi To bin bng lnh generate C php: generate = biu thc [iu kin] [phm vi] Lnh ny cho php to bin mi c gi tr bng gi tr ca biu thc -c ch ra. V d: . gen poor = 1 if rlpcex1 < 1790 (4222 missing values generated) . gen nonpoor=1 if rlpcex1 >= 1790 (1777 missing values generated) Lnh to bin gi tabulategenerate C php: tabulate , generate(bin mi) Lnh generate c th kt hp vi tab to cc bin gi . Bin mi to ra s c dng l bin mi 1, bin mi 2, bin mi 3, v..v. Bin ny chnh l cc bin gi -c to ra trn c s ca bin phn loi. V d:

. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41

28

region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . tab1 region1 region2 -> tabulation of region1 reg7==regio | n1 | Freq. Percent Cum. ------------+----------------------------------0 | 5140 85.68 85.68 1 | 859 14.32 100.00 ------------+----------------------------------Total | 5999 100.00 -> tabulation of region2 reg7==regio | n2 | Freq. Percent Cum. ------------+----------------------------------0 | 4824 80.41 80.41 1 | 1175 19.59 100.00 ------------+----------------------------------Total | 5999 100.00

y bin reg7 c 7 gi tr t 1 n 7 t-ng ng vi 7 bin gi t region1 n region7 s -c to ra. Bin region1 nhn gi tr bng 1 nu nh- bin reg7 nhn gi tr 1, nu khng th bng 0. T-ng t bin region7 nhn gi tr 1 nu nh- bin reg7 bng 7. v d trn lnh tabulategenerate t-ng -ng vi 7 lnh sau: gen region1=(reg7==1) gen region2=(reg7==2) gen region7=(reg7==7) To bin bng lnh egen C php: egen = fcn(tham s) [iu kin] [phm vi] [,

Lnh ny cho php to bin mi theo gi tr ca hm s -c ch ra bi fcn. Bin mi ny s nhn gi tr c nh cho mi quan st. Hm s y c th l: count(exp) mean(exp) median(exp) m s quan st ca biu thc Cho gi tr trung bnh ca biu thc Cho gi tr trung v ca biu thc29

sd(exp)

Cho gi tr lch chun ca biu thc

Cc hm s khc c th xem phn help egen. V d:. egen sumexp=sum(rlpcex1) . sum sumexp Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------sumexp | 5999 1.91e+07 0 1.91e+07 1.91e+07 . egen g=median( food+ nonfood1) . sum g Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------g | 5999 11063.6 0 11063.6 11063.6

Thay th gi tr ca bin C php: replace = biu thc [iu kin] [phm vi] Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc nh bi biu thc exp. V d: replace poor=poor*100 replace pcexp = hhexp/hhsize To bin phn loi bng lnh encode C php: encode [iu kin] [phm vi], generate(bin mi) Lnh ny cho php to bin phn loi mi (categorical) kiu s t-ng ng vi cc gi tr ca bin kiu ch ch ra bi tn bin (-c xp theo vn ch ci). V d:. gen str15(mucsong) = "Kha" . drop mucsong

. gen mucsong="Rat ngheo" type mismatch r(109); . gen str15(mucsong)="Rat ngheo"

30

. replace mucsong="Ngheo" if (1087 real changes made)

rlpcex11290

. replace mucsong="Khong ngheo" if (4222 real changes made) . tab mucsong

rlpcex1>=1790

mucsong | Freq. Percent Cum. ----------------+----------------------------------Khong ngheo | 4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50 100.00 ----------------+----------------------------------Total | 5999 100.00 . sum mucsong Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------mucsong | 0 . encode mucsong, gen(ma_ms) . tab ma_ms ma_ms | Freq. Percent Cum. ------------+----------------------------------Khong ngheo | 4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50 100.00 ------------+----------------------------------Total | 5999 100.00 . sum ma_ms Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------ma_ms | 5999 1.411235 .6871957 1 3

To bin bng lnh xtile C php: xtile = biu thc [quyn s] [iu kin] [phm vi] [, nquantiles(#)] Lnh ny to bin phn nhm cho biu thc theo phn v. nquantiles(#) ch ra s l-ng phn v. V d: To bin ng v phn theo chi tiu. xtile quinexp= rlpcex1, nq(5)

Trong

31

. tab quinexp 5 quantiles | of rlpcex1 | Freq. Percent Cum. ------------+----------------------------------1 | 1200 20.00 20.00 2 | 1200 20.00 40.01 3 | 1200 20.00 60.01 4 | 1200 20.00 80.01 5 | 1199 19.99 100.00 ------------+----------------------------------Total | 5999 100.00 . tab quinexp, sum( rlpcex1) | Summary of comp.M&Reg price adj.pc 5 quantiles | tot exp of rlpcex1 | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 1184.3975 261.20537 1200 2 | 1803.6331 151.66604 1200 3 | 2408.4867 211.5407 1200 4 | 3390.1065 403.08913 1200 5 | 7160.021 3690.3672 1199 ------------+-----------------------------------Total | 3188.6671 2692.5673 5999

4.2. i tn bin C php: rename Lnh ny thc hin vic i tn c ca mt bin sang tn mi. V d: rename poor nguoingheo rename rpcexp1 chitieu 4.3. Lnh xo bin, xo quan st C php: drop Lnh ny xo bin -c ch ra bi danh sch bin drop biu thc Lnh ny xo quan st tho mn iu kin

drop [iu kin] Lnh ny xo quan st -c ch ra bi phm vi (v c th phi tho mn iu kin biu thc) keep Lnh ny gi li cc bin -c ch ra bi danh sch bin, cc bin khng -c ch ra s b xo i32

keep

Lnh ny gi li cc quan st tho mn iu kin biu thc, cc quan st khc s b xo i

keep [iu kin] Lnh ny gi li cc quan st -c ch ra bi phm vi (v c th tho mn iu kin biu thc), cc quan st khc s b xo i. V d: drop poor urban98 drop if sex==1 tr bng 1 drop in 1/20 keep househol khc b xo i Xo 2 bin poor v urban98 Xo cc quan st c bin sex nhn gi Xo quan st t 1 n 20 Ch gi li bin househol, cc bin

keep in f/50 Gi li quan st t u tin n 50, cc quan st khc b xo i 4.4. Lnh i gi tr ca bin phn loi C php: recode vi] gi tr c = gi tr mi [iu kin] [phm

Lnh ny i gi tr ca bin phn loi theo cc quy tc -c ch ra sau . V d:. recode sex 0=1 (0 changes made) . recode sex . = 0 (0 changes made) . recode hhsize 1/5=1 6/10 = 2 * = 3 (5785 changes made) . tab hhsize Household | size | Freq. Percent Cum. ------------+----------------------------------1 | 4164 69.41 69.41 2 | 1786 29.77 99.18 3 | 49 0.82 100.00 ------------+----------------------------------Total | 5999 100.00

33

. tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 4269 71.16 71.16 Urban | 1730 28.84 100.00 ------------+----------------------------------Total | 5999 100.00

. recode urban98 0=1 1=0 (5999 changes made) . tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 1730 28.84 28.84 Urban | 4269 71.16 100.00 ------------+----------------------------------Total | 5999 100.00

4.5. Lnh gn nhn cho bin Gn nhn cho bin C php: label variable Nhn ca bin Lnh ny gn nhn l mt dy k t cho bin. V d:. gen ngheo=poor . des ngheo storage display value variable name type format label variable label --------------------------------------------------------------------------ngheo float %9.0g . tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . label var ngheo "Nguoi co thu nhap duoi chuan ngheo" . tab ngheo Nguoi co | thu nhap |

34

duoi chuan | ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . des ngheo storage display value variable name type format label variable label ---------------------------------------------------------------------------ngheo float %9.0g Nguoi co thu nhap duoi chuan ngheo

Gn gi tr cho bin phn loi label define # "nhn" [# "nhn" ...] [, add modify] label dir label list label drop {tn b nhn [tn b nhn ...] | _all} label values [tn b nhn] Lnh label define gn nhn cho mt b gi tr s. Tn ca b nhn -c ch ra sau t kho define, # l gi tr s, nhn l chui k t t-ng ng vi gi tr s y. C hai tu chn y: tu chn add thm gi tr v nhn t-ng ng vo 1 b nhn c sn. Tu chn modify cho php sa cha gi tr v nhn ca 1 b nhn c sn. Lnh label dir hin th nhng b nhn c sn, cn lnh label list hin th gi tr ca b nhn -c ch ra. Lnh label drop xo cc b nhn c sn. V d: To nhn c tn l nngheo vi gi tr 1 c ngha l ng-i ngho, cn 0 c ngha l ng-i khng ngho.. label define nngheo 0 "Ngheo" 1 "Khong ngheo" . label dir nngheo region loaiho diploma urban agegroup . label list nngheo

35

nngheo: 0 Khong ngheo 1 Ngheo . label drop _all . label dir

Lnh label values s gn cc nhn ca 1 b nhn cho cc gi tr s ca 1 bin phn loi. V d:. tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . list ngheo in 1/5 ngheo 1 0 1 1 0

1. 2. 3. 4. 5.

. label values ngheo nngheo . tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------Ngheo | 4222 70.38 70.38 Khong ngheo | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . list ngheo in 1/5 ngheo 1. Khong ngheo 2. Ngheo 3. Khong ngheo 4. Khong ngheo 5. Ngheo

4.6. Sp xp s liu36

C php: sort [phm vi] gsort [+|-]tn bin [[+|-]tn bin [...]] Lnh sort sp xp quan st theo th t tng dn ca gi tr ca cc bin -c ch ra trong danh sch bin. Lnh gsort cho php sp xp cc quan st theo th t tng dn ca ca cc bin (danh sch bin), nu du + -c ch ra (y cng l gi tr ngm nh), hoc theo th t gim dn, nu du -c ch ra. V d: sort reg7 hhsize Lnh ny sp xp cc quan st theo th t tng dn ca bin vng reg7, trong mi vng cc quan st li -c sp xp theo th t tng dn ca bin quy m h hhsize. gsort reg7 hhsize tng vng gim 4.7. Trn s liu Lnh thu gn s liu - collapse C php: collapse [quyn s] [iu kin] [phm vi] [, by(danh sch bin)] trong : Biu thc thng k l danh sch cc thng k v cc bin t-ng ng. Cc thng k -c k hiu nh- mc 3.12 ca ch-ng ny. Lnh collapse s to ra mt tp s liu mi bao gm cc bin -c ch ra bi danh sch bin, vi cc gi tr -c tnh theo thng k t-ng ng. Cc quan st ca tp s liu c s -c nhm li theo cc gi tr cng loi ca bin -c ch ra bi by(danh sch bin). V d: Chng ta c file s liu v thu nhp v chi tiu ca cc h thnh vin trong gia nh: ma_tv 1 2 3 ma_ho 101 101 101 thunhap Chitieu 200 500 1200 400 0 20037

Lnh ny sp xp cc quan st theo th t dn ca bin vng reg7, nh-ng trong mi cc quan st li -c sp xp theo th t dn ca bin quy m h hhsize.

4 1 2 3 1 2 3 4 1 2 3 4 5 6

101 102 102 102 103 103 103 103 104 104 104 104 104 104

0 3200 1200 200 300 2100 0 0 4300 3500 300 0 0 0

200 500 320 200 500 250 300 300 800 500 500 300 200 200

Chng ta s dng lnh collapse to file v thu nhp v chi tiu bnh qun ca cc h, v to thm 1 bin v qui m h. . gen quimo=1 . collapse (mean) thunhap (mean) chitieu (sum) quimo, by(ma_ho)

Tp s liu mi c dng: ma_ho 101 102 103 104 thunhap chitieu 350 325 1533.33 340 600 337.5 1350 416.667 quimo 4 3 4 6

Kt hp s liu - lnh merge C php: merge [danh replace] sch bin] using [, update

Lnh merge s ni cc quan st ca tp s liu ang m trong Stata (gi l tp ch (master dataset)) vi cc quan st t-ng ng ca tp s liu khc -c ch ra sau t kho using (gi l tp s dng (using dataset)) thnh 1 tp mi. Cc bin ch ra trong danh sch bin -c gi l bin xc nh (identifying variables), v phi -c sp xt bng lnh sort (hoc gsort) tr-c khi thc hin lnh merge. V d: Chng ta c 2 tp s liu nh- sau: thunhap.dta ma_ho thunhap chitieu quimo38

101 102 103 104

350 1533.33 600 1350

325 340 337.5 416.667

4 3 4 6

dialy.dta ma_ho 204 102 103 104 thanhthi 0 1 0 0 vung 1 4 3 6

Lnh merge s -c thc hin nh- sau: . use "C:\dialy.dta", clear . sort ma_ho

. save "C:\dialy.dta" file C:\dialy.dta saved . use "C:\thunhap.dta", clear . sort . merge ma_ho ma_ho using "C:\dialy.dta"

ma_ho was byte now int . edit Tp kt qu c dng nh- sau: ma_ho 101 102 103 104 204 thunhap chitieu 350 325 1533.33 340 600 337.5 1350 416.667 . . quimo 4 3 4 6 . thanhthi . 1 0 0 0 vung . 4 3 6 1 _merge 1 3 3 3 2

Trong tp kt qu c thm 1 bin tn l _merge, bin ny nhn cc gi tr nh- sau: _merge==1 _merge==2 _merge==3 s dng Cc tu chn: Trong tr-ng hp hai tp s liu c cc bin trng nhau, cc tu chn sau y cho php x l s liu theo cc cch khc nhau: Nu nh- quan st ch -c to t tp ch Nu nh- quan st ch -c to t s dng Nu nh- quan st -c to t c tp ch v tp

39

update

Nu s liu ca bin trng nhau ca tp ch c gi tr thiu th gi tr thiu ny nhn gi tr ca bin trng nhau ca tp s dng. Gi tr ca bin trng nhau ca tp ch s nhn gi tr ca bin trng nhau ca tp s dng.

replace

Nu khng tu chn no -c ch ra th theo ngm nh, gi tr ca bin ca tp ch s khng thay i. Ni s liu lnh append C php: append using Lnh ny cho php ni tp -c ch ra bi using vo vi tp ang -c m theo cc bin c cng tn v nh dng. S quan st ca tp mi bng tng s s quan st ca 2 tp. V d: c tp thunhap2.dta nh- sau ma_ho 105 106 107 108 109 thunhap 1350 1500 800 1500 2500 chitieu 425 370 556 417 540 gioitinh 1 0 0 0 1

Hai tp ny s -c ni vi nhau bng lnh append nh- sau: . use "C:\thunhap.dta", clear . append using "C:\thunhap2.dta" . edit Tp kt qu c dng: ma_ho 101 102 103 104 105 106 107 108 109 thunhap chitieu 350 325 1533.33 340 600 337.5 1350 416.667 1350 425 1500 370 800 556 1500 417 2500 540 quimo 4 3 4 6 gioitinh

1 0 0 0 1

Ch : Xem thm lnh expand dung to ra cc quan st ging nhau. 4.8. Chuyn dng s liu40

C php: reshape wide , i(danh i(danh sch sch bin) bin) [ [ j(tn j(tn bin bin

s liu ny s -c chuyn sang dng bng dc nh- sau: - i maho 101 101 101 102 102 102 103 103 103 quimo 5 5 5 4 4 4 6 6 6

V lnh reshape s -c vit nh- sau:. reshape long thunhap, i(maho) j(nam) (note: j = 95 96 97) Data wide -> long --------------------------------------------------------------------Number of obs. 3 -> 9 Number of variables 5 -> 4 j variable (3 values) -> nam

41

xij variables: thunhap95 thunhap96 thunhap97 -> thunhap --------------------------------------------------------------------* Va chuyen nguoc lai tu dang doc sang dang ngang nhu sau . reshape wide thunhap, i(maho) j(nam) (note: j = 95 96 97) Data long -> wide -----------------------------------------------------------------------Number of obs. 9 -> 3 Number of variables 4 -> 5 j variable (3 values) nam -> (dropped) xij variables: thunhap -> thunhap95 thunhap96 thunhap97 ----------------------------------------------------------------------

V d 2: Chng ta c s liu dng bng sau y: maho 101 102 103 104 sotien1 1200 1300 2500 3000 nguon1 Ngan hang A Ngan hang B Ngan hang A Ngan hang A sotien2 2000 . 1000 2000 nguon2 Ngan hang A . Ngan hang C Ngan hang B

Bng ny -c chuyn sang bng dng dc nh- sau:. reshape long sotien nguon, i(maho) j(lanvay) (note: j = 1 2) Data wide -> long --------------------------------------------------------------------Number of obs. 4 -> 8 Number of variables 5 -> 4 j variable (2 values) -> lanvay xij variables: sotien1 sotien2 -> sotien nguon1 nguon2 -> nguon ---------------------------------------------------------------------

Bng dc c dng nh- sau: maho 101 101 102 lanvay 1 2 1 sotien 1200 2000 1300 nguon Ngan hang A Ngan hang A Ngan hang B42

102 103 103 104 104

2 1 2 1 2

2500 1000 3000 2000

Ngan hang A Ngan hang C Ngan hang A Ngan hang B

5. Quyn s trong VHLSS (Weight) 5.1. Quyn s trong iu tra chn mu Trong iu tra chn mu, cc quan st -c la chn mt cch ngu nhin nh-ng thng th-ng cc quan st th-ng c xc sut la chn khc nhau. Quyn s bng gi tr nghch o ca xc sut -c chn vo mu. Nu nh- quan st i c quyn s l wi th c th ni quan st i trong mu i din cho wi phn t trong tng th. Cc -c l-ng suy din v tng th cn phi tnh n quyn s chn mu, nu khng th kt qu s b sai lch. V d: Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh vi dn s t-ng ng l 4.5 triu v 500 nghn ng-i. Chng ta mun chn mt mu ngu nhin vi c mu l 500 quan st nghin cu v thu nhp ca ng bng Sng Hng cng nh- 2 tnh ny. Nu nh- theo t l v dn s gia 2 tnh th chng ta s thu -c mu gm 450 h ti H Ni v 50 h ti Nam nh. Tuy nhin mu -c chn mt cch ngu nhin trn c vng nn s c kh nng l chng ta thu -c mt mu m khng c quan st no ca tnh Nam nh, hoc c vi s l-ng rt nh. cho mu mang tnh i din cho cc tnh th nn chn 400 quan st ti H Ni v 100 quan st ti Nam nh. Nu thu nhp bnh qun ca H Ni l 900 nghn/ thng, v ca Nam nh l 300 nghn/thng th thu nhp bnh qun ca c vng ng bng Sng Hng khng th tnh l (900 + 300)/2, v cc quan st trong mu khng -c chn t l vi cc tnh. Mi quan st ti H ni i din cho 11250 h trong vng (4500000/400). y chnh l quyn s ca quan st, bng gi tr nghch o ca xc sut -c chn vo mu. Cn mi quan st ti Nam nh i din cho 50000 quan st ca vng (500000/100). Thu nhp ca vng ng bng Sng Hng s -c tnh nh- sau:

Thu nhap

900 400 11250 300 100 50000 840 400 11250 100 50000

Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt, chnh l s h ca Vit Nam m mi h i din. Quyn s th hai l quyn s ca thnh vin h, hhsizewt l s ng-i Vit Nam m43

mi thnh vin ca h i din. Quyn s ca thnh vin h bng quyn s h nhn vi quy m h. V d: Quyn s trong VLSS 1998. tab reg7, sum(wt) Code by 7 | Summary of sample quyn s regions | Mean Std. Dev. Freq. ------------+-----------------------------------region1 | 3218.4296 850.74246 859 region2 | 3133.7277 849.12325 1175 region3 | 3185.1794 801.74266 708 region4 | 2199.37 492.37202 754 region5 | 1336.3098 269.14747 368 region6 | 1963.8964 528.69328 1023 region7 | 2938.2122 547.72125 1112 ------------+-----------------------------------Total | 2688.5003 900.01379 5999 . tab reg7, sum(hhsizewt) Code by 7 | Summary of =hhsize*wt regions | Mean Std. Dev. Freq. ------------+-----------------------------------region1 | 15790.857 7555.7552 859 region2 | 12656.003 5970.9089 1175 region3 | 14814.504 7236.7592 708 region4 | 10794.537 5235.562 754 region5 | 7564.731 3185.9336 368 region6 | 9447.7077 4535.0816 1023 region7 | 14653.702 6639.8297 1112 ------------+-----------------------------------Total | 12636.546 6597.6574 5999 . di 2688.5003*5999 16128313 . di 12636.546*5999 75806639

5.2. Cc la chn v quyn s Stata cho php s dng 4 loi loi quyn s sau y: fweights: quyn s tn sut (frequency weights), Stata s hiu quyn s y c ngha l s ln m mi quan st mi quan st -c lp li trong tnh ton. quyn s chn mu (sampling weights), Stata s hiu quyn s l gi tr nghch o ca xc sut -c chn vo mu, hay s phn t trong tng th m mi quan st trong mu i din.

pweights:

44

aweights

quyn s phn tch (analytical weights), Stata s hiu quyn s t l nghch vi ph-ng sai ca quan st. quyn s quan trng (importance weights), y quyn s ch mc quan trng ca cc quan st. l

iweights

i vi iu tra mc sng cc lnh s dng quyn s pweights v fweights. V d:. sum poor Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 5999 29.6216 45.66255 0 100 . sum poor [fw=hhsize] Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 28509 34.17517 47.43051 0 100 . . .

tab

reg7 urban98

| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 672 187 | 859 region2 | 783 392 | 1175 region3 | 600 108 | 708 region4 | 502 252 | 754 region5 | 368 0 | 368 region6 | 514 509 | 1023 region7 | 830 282 | 1112 -----------+----------------------+---------Total | 4269 1730 | 5999

. .

tab

reg7 urban98 [fw= hhsizewt]

| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 11993763 1570583 | 13564346 region2 | 11057932 3812871 | 14870803 region3 | 9582621 906048 | 10488669 region4 | 5618709 2520372 | 8139081 region5 | 2783821 0 | 2783821

45

region6 | 4545303 5119702 | 9665005 region7 | 13220727 3074190 | 16294917 -----------+----------------------+---------Total | 58802876 17003766 | 75806642

. tab reg7 urban98 , sum(hhsize) means Means of Household size | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 5.1205357 3.7326203 | 4.8183935 region2 | 4.045977 4.0459184 | 4.0459574 region3 | 4.6666667 4.6759259 | 4.6680791 region4 | 4.8027888 5.1190476 | 4.9084881 region5 | 5.7065217 . | 5.7065217 region6 | 5.0719844 4.7131631 | 4.8934506 region7 | 5.1373494 4.3971631 | 4.9496403 -----------+----------------------+---------Total | 4.8702272 4.4612717 | 4.752292 . tab reg7 urban98 [fw=wt], sum(hhsize) means Means and Number of Observations of Household size | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 5.1328749 3.6698008 | 4.9063857 | 2336656 427975 | 2764631 -----------+----------------------+---------region2 | 4.0564115 3.987975 | 4.0386415 | 2726038 956092 | 3682130 -----------+----------------------+---------region3 | 4.6508908 4.6530097 | 4.6510738 | 2060384 194723 | 2255107 -----------+----------------------+---------region4 | 4.8136253 5.132367 | 4.9080132 | 1167251 491074 | 1658325 -----------+----------------------+---------region5 | 5.6609112 . | 5.6609112 | 491762 0 | 491762 -----------+----------------------+---------region6 | 5.0486426 4.6174858 | 4.8106956 | 900302 1108764 | 2009066 -----------+----------------------+---------region7 | 5.1494132 4.3925283 | 4.9872852 | 2567424 699868 | 3267292 -----------+----------------------+---------Total | 4.8003065 4.3841133 | 4.7002214 | 12249817 3878496 | 16128313

46

. . table reg7 urban98 , c(mean poor) col row format(%4.1f) ------------------------------| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban Total ----------+-------------------region1 | 61.5 8.0 49.8 region2 | 32.6 5.9 23.7 region3 | 44.8 10.2 39.5 region4 | 37.3 11.5 28.6 region5 | 47.3 47.3 region6 | 12.5 2.2 7.3 region7 | 35.8 10.3 29.3 | Total | 38.9 6.8 29.6 ------------------------------. table reg7 urban98 [pw=hhsizewt], c(mean poor) col row format(%4.1f) ------------------------------| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban Total ----------+-------------------region1 | 65.2 8.3 58.6 region2 | 36.1 7.0 28.7 region3 | 51.3 14.3 48.1 region4 | 43.6 16.6 35.2 region5 | 52.4 52.4 region6 | 13.0 2.9 7.6 region7 | 42.0 15.3 36.9 | Total | 45.5 9.2 37.4 -------------------------------

Ch-ng III: Kim nh gi thit v phn tch hi quy

1. c l-ng v kim nh gi thit (Estimation and hypothesis testing) 1.1. c l-ng gi tr trung bnh bng khong tin cy C php: ci [danh sch bin] [quyn s] [iu kin] [phm level(#) binomial poisson exposure(tn bin) total] vi] [,

Lnh ny tnh sai s chun v khong tin cy cho gi tr trung bnh ca mu theo quy lut chun, nh thc v Poatxng. Cc tu chn: level(#) ch nh mc tin cy cho -c l-ng47

khong tin cy. # nhn gi tr t 10 n 99, gi tr ngm nh l 95. binomial poisson exposure(tn bin) p dng cho quy lut nh thc p dng cho quy lut Poatxng p dng cho quy lut Poatxng, tn bin ch ra bin thi l-ng (thng th-ng l thi gian hoc din tch) m trong xy ra cc s kin -c ch ra bi danh sch bin dng khi ma by prefix -c s dung, yu cu -c l-ng khong tin cy cho ton b nhm.

total

V d:. ci poor

Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 5999 29.6216 .5895501 28.46587 30.77733 . . . sort reg7 . by reg7: ci poor, total _______________________________________________________________________________ -> reg7 = region1 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 859 49.82538 1.706961 46.47507 53.17569 _______________________________________________________________________________ -> reg7 = region2 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1175 23.65957 1.240357 21.22601 26.09314 _______________________________________________________________________________ -> reg7 = region3 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 708 39.54802 1.838899 35.93767 43.15838 _______________________________________________________________________________ -> reg7 = region4

48

Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 754 28.64721 1.64759 25.4128 31.88163 _______________________________________________________________________________ -> reg7 = region5 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 368 47.28261 2.606121 42.1578 52.40741 _______________________________________________________________________________ -> reg7 = region6 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1023 7.331378 .8153306 5.731465 8.931292 _______________________________________________________________________________ -> reg7 = region7 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1112 29.31655 1.365709 26.63689 31.99621 _______________________________________________________________________________ -> Total Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 5999 29.6216 .5895501 28.46587 30.77733

Ch : Cc lnh -c l-ng c th -c s dng khi bit cc tham s v mu. y c th -c gi l cc lnh s dng tham s trc tip (Commands using immediate arguments). Cc lnh ny rt hu dng khi chng ta khng c s liu gc v bin. cii [, level(#) ] (phn phi chun) cii ] (phn phi nh thc) [, level(#)

#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn gi tr t-ng ng vi php th thnh cng (thng th-ng nhn gi tr bng 1) cii level(#) ] (phn phi Poatxng) V d:49

poisson [

. cii 5999 1777, level (90) -- Binomial Exact -Variable | Obs Mean Std. Err. [90% Conf. Interval] -------------+------------------------------------------------------------| 5999 .296216 .005895 .2865107 .3060676 . cii 12 27, poisson -- Poisson Exact -Variable | Exposure Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------| 12 2.25 .4330127 1.483144 3.273587

1.2. Kim nh gi thuyt thng k 1.2.1. Kim nh gi tr trung bnh ca mu Phn phi khng mt C php: prtest = # [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi thuyt v t l gi tr ca bin phn phi theo quy lut khng mt (Ho: p = p0). V d:. prtest poor=0.44 if reg7==1 One-sample test of proportion poor: Number of obs = 859

---------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+-----------------------------------------------------------------poor | .4982538 .0170597 29.2065 0.0000 .4648174 .5316901 ---------------------------------------------------------------------------Ho: proportion(poor) = .44 Ha: poor < .44 z = 3.440 P < z = 0.9997 Ha: poor ~= .44 z = 3.440 P > |z| = 0.0006 Ha: poor > .44 z = 3.440 P > z = 0.0003

prtest = [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai gi tr bin -c ch ra bi tn bin (Ho: pX = pY). V d: Kim nh xem t l ngho i gia vng 2 v vng 4 c khac nhau khng:. gen poor2=poor if reg7==2 (4824 missing values generated)

50

. gen poor4=poor if reg7==4 (5245 missing values generated) . prtest poor2 = poor4 Two-sample test of proportion poor2: Number of obs = poor4: Number of obs = 1175 754

-----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------poor2 | .2365957 .0123983 19.0829 0.0000 .2122955 .2608959 poor4 | .2864721 .016465 17.3989 0.0000 .2542014 .3187429 ---------+-------------------------------------------------------------------diff | -.0498764 .020611 -.0902732 -.0094796 | under Ho: .0203666 -2.44893 0.0143 -----------------------------------------------------------------------------Ho: proportion(poor2) - proportion(poor4) = diff = 0 Ha: diff < 0 z = -2.449 P < z = 0.0072 Ha: diff ~= 0 z = -2.449 P > |z| = 0.0143 Ha: diff > 0 z = -2.449 P > z = 0.9928

prtest [level(#)]

[iu

kin]

[phm

vi],

by(bin

phn

nhm)

Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai nhm -c ch ra bi bin phn nhm (Ho: pX1 = pX2). V d:. prtest poor, by(sex) Two-sample test of proportion 1: Number of obs = 2: Number of obs = 4375 1624

-----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | .3248 .00708 45.8755 0.0000 .3109234 .3386766 2 | .2192118 .0102661 21.353 0.0000 .1990906 .239333 ---------+-------------------------------------------------------------------diff | .1055882 .0124708 .0811459 .1300304 | under Ho: .0132673 7.95855 0.0000 -----------------------------------------------------------------------------Ho: proportion(1) - proportion(2) = diff = 0 Ha: diff < 0 z = 7.959 P < z = 1.0000 Ha: diff ~= 0 z = 7.959 P > |z| = 0.0000 Ha: diff > 0 z = 7.959 P > z = 0.0000

Phn phi nh thc C php:51

bitest = #p [quyn s] [iu kin] [phm vi] Lnh ny kim nh gi thuyt v tham s p trong quy lut nh thc (xc sut thnh cng ca php th) ca bin -c ch ra bi tn bin. (Ho: p = p0) V d:. bitest poor=0.44 if reg7==1 Variable | N Observed k Expected k Assumed p Observed p -------------+-----------------------------------------------------------poor | 859 428 377.96 0.44000 0.49825 Pr(k >= 428) = 0.000344 Pr(k = 428) = 0.000344 Pr(k |t| = 0.7444 Ha: mean > 3200 t = -0.3260 P > t = 0.6278

52

ttest = [iu kin] [phm vi] [, unpaired unequal level(#) ] Lnh ny thc hin kim nh gi thuyt rng hai bin c gi tr trung bnh bng nhau. (Ho: Cc tu chn: unpaired unequal V d:. ttest poor2=poor4, unpaired unequal Two-sample t test with unequal variances -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------poor2 | 1175 .2365957 .0124036 .425173 .2122601 .2609314 poor4 | 754 .2864721 .0164759 .4524128 .254128 .3188163 ---------+-------------------------------------------------------------------combined | 1929 .2560912 .0099404 .436586 .2365962 .2755863 ---------+-------------------------------------------------------------------diff | -.0498764 .0206229 -.0903285 -.0094243 -----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 1532.64 Ho: mean(poor2) - mean(poor4) = diff = 0 Ha: diff < 0 t = -2.4185 P < t = 0.0079 Ha: diff ~= 0 t = -2.4185 P > |t| = 0.0157 Ha: diff > 0 t = -2.4185 P > t = 0.9921

X = Y).

S liu ca hai bin khng cng cp Phung sai ca hai bin khng bng nhau

ttest [iu kin] [phm vi], by(bin phn nhm) [ unequal level(#) ] Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi tr trung bnh ca hai nhm -c ch ra bi bin phn nhm (Ho: X1 = X2). V d:. ttest rlpcex1, by(sex)

Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | 4375 2980.906 36.74795 2430.648 2908.862 3052.951 2 | 1624 3748.368 80.18189 3231.241 3591.097 3905.638

53

---------+-------------------------------------------------------------------combined | 5999 3188.667 34.76379 2692.567 3120.518 3256.817 ---------+-------------------------------------------------------------------diff | -767.4613 77.6155 -919.6156 -615.3071 -----------------------------------------------------------------------------Degrees of freedom: 5997 Ho: mean(1) - mean(2) = diff = 0 Ha: diff < 0 t = -9.8880 P < t = 0.0000 Ha: diff ~= 0 t = -9.8880 P > |t| = 0.0000 Ha: diff > 0 t = -9.8880 P > t = 1.0000

1.2.2. Kim nh gi tr lch chun C php: sdtest sdtest = # [iu kin] [phm vi] [, level(#) ] = [iu kin] [phm vi] [, level(#) ]

sdtest [iu kin] [phm vi] , by(bin phn nhm) [ level(#) ] Lnh ny kim dnh tham s lch chun ca bin ngu nhin tun theo quy lut chun -c ch ra bi tn bin. C php ca ln ny t-ng t vi c php ca lnh ttest V d:. sum rlpcex1

Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 5999 3188.667 2692.567 357.318 45801.71 . sdtest rlpcex1=2700 One-sample test of variance -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------rlpcex1 | 5999 3188.667 34.76379 2692.567 3120.518 3256.817 -----------------------------------------------------------------------------Ho: sd(rlpcex1) = 2700 chi2(5998) = 5965.022 Ha: sd(rlpcex1) < 2700 P < chi2 = 0.3838 Ha: sd(rlpcex1) ~= 2700 2*(P < chi2) = 0.7676 Ha: sd(rlpcex1) > 2700 P > chi2 = 0.6162

2. Phn tch t-ng quan

v hi quy (Correlation and regression)

2.1. Phn tch t-ng quan C php:54

correlate [danh sch bin] [quyn s] [iu kin] [phm vi] [, means covariance _coef wrap] Lnh ny tnh ma trn h s t-ong quan (correlation coefficient), hoc hip ph-ng sai (covariance) cho cc bin -c lit k trong danh sch bin. S quan st -c dng l s quan st ca bin c t quan st nht. Cc tu chn: means Hin th cc thng k khc nh- gi tr trung bnh, lch chun, gi tr ln nht, nh nht -a ra ma trn hip ph-ng sai thay v h s t-ng quan Tnh ma trn tung quan ca cc h s ca -c l-ng gn nht Hin th cc dng ca ma trn lin nhau nu c qua nhiu cc bin -c lit k

covariance _coef wrap V d:. corr hhsize poor (obs=5999)

rlpcex1 sex

| hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 1.0000 poor | 0.2425 1.0000 rlpcex1 | -0.2172 -0.4452 1.0000 sex | -0.2570 -0.1028 0.1267 1.0000

. corr hhsize poor (obs=5999)

rlpcex1 sex, means cov

Variable | Mean Std. Dev. Min Max -------------+---------------------------------------------------hhsize | 4.752292 1.954292 1 19 poor | .296216 .4566255 0 1 rlpcex1 | 3188.667 2692.567 357.318 45801.71 sex | 1.270712 .4443645 1 2

| hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 3.81926 poor | .216435 .208507 rlpcex1 | -1142.93 -547.335 7.2e+06 sex | -.223195 -.020849 151.543 .19746

55

pwcorr [danh sch bin] [quyn s] [iu kin] [phm vi] [, obs sig print(#) star(#)] Lnh ny tnh h s t-ng quan cho tng cp bin -c ch ra bi danh sch bin. Cc tu chn: obs sig print(#) Hin th s quan st dng tnh h s t-ng quan Hin th mc ngha ca cc h s t-ng quan Ch ra mc ngha theo ch cc h s t-ng quan c mc ngha nh hn mc ny mi -c hin th nh du sao i vi cc h s t-ng quan c mc ngh nh hn mc -c ch ra bi star

star(#) V d:

. pwcorr hhsize poor rlpcex1 sex, obs sig star(5) | hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 1.0000 | | 5999 | poor | 0.2425* 1.0000 | 0.0000 | 5999 5999 | rlpcex1 | -0.2172* -0.4452* 1.0000 | 0.0000 0.0000 | 5999 5999 5999 | sex | -0.2570* -0.1028* 0.1267* 1.0000 | 0.0000 0.0000 0.0000 | 5999 5999 5999 5999 |

pcorr [quyn s] [iu kin] [phm vi] Lnh ny tnh h s t-ng quan ca bin -c ch ra bi tn bin vi cc bin -c trong danh sch bin V d:. pwcorr poor hhsize rlpcex1 sex

| poor hhsize rlpcex1 sex -------------+------------------------------------

56

poor hhsize rlpcex1 sex

| | | |

1.0000 0.2425 -0.4452 -0.1028

1.0000 -0.2172 -0.2570

1.0000 0.1267

1.0000

2.2. Phn tch hi quy Ph-ng php bnh ph-ng nh nht (Ordinary-Least Square) C php: regress [danh sch bin] [quyn s] [iu kin] [phm vi] [, option] Lnh ny -c l-ng cc h s ca hm bin ph thuc (dependent variable) theo cc bin c lp (danh sch bin) theo ph-ng php bnh ph-ng nh nht. V d:. reg rlpcex1 reg7 sex hhsize Number of obs F( 3, 5995) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5999 194.88 0.0000 0.0889 0.0884 2570.8

Source | SS df MS -------------+-----------------------------Model | 3.8639e+09 3 1.2880e+09 Residual | 3.9621e+10 5995 6609032.15 -------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

-----------------------------------------------------------------------------rlpcex1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reg7 | 240.9633 15.5905 15.46 0.000 210.4003 271.5263 sex | 403.2984 77.38324 5.21 0.000 251.5994 554.9974 hhsize | -305.6382 17.70692 -17.26 0.000 -340.3501 -270.9263 _cons | 3160.201 155.6576 20.30 0.000 2855.056 3465.346 ------------------------------------------------------------------------------

Cc tu chn: level(#) ca h s noconstant noheader beta Ch ra mc tin cy cho -c l-ng khong tin cy Khng c h s (intercept) trong hm hi quy Ch hin th kt qu phn tch v cc h s Hin th h s -c chun ho, dng so snh mc nh h-ng ca cc h s vi nhau

Ph-ng php kh nng ln nht (Maximum-Likelihood) C php: probit [danh sch bin] [quyn s] [iu kin] [phm vi] [, tu chn]57

Lnh ny thc hin hi quy bin ph thuc theo cc bin -c ch ra trong danh sch bin theo ph-ng php kh nng ln nht. Bin ph thuc th-ng l bin gi vi hai gi tr 0 v 1. V d:. probit Iteration Iteration Iteration Iteration poor 0: 1: 2: 3: reg7 sex log log log log hhsize = = = = -3645.1363 -3367.2185 -3364.8032 -3364.8025 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 5999 560.67 0.0000 0.0769

likelihood likelihood likelihood likelihood

Probit estimates

Log likelihood = -3364.8025

-----------------------------------------------------------------------------poor | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------reg7 | -.116342 .0084551 -13.76 0.000 -.1329136 -.0997703 sex | -.1284525 .0422247 -3.04 0.002 -.2112113 -.0456937 hhsize | .1808115 .0095806 18.87 0.000 .1620338 .1995892 _cons | -.8088731 .0824798 -9.81 0.000 -.9705306 -.6472157 ------------------------------------------------------------------------------

c l-ng gi tr bin ph thuc v phn dC php: predict [iu kin] [phm vi] [, xb stdp resid] Lnh ny -c thc hin sau lnh regress (hoc probit) to ra 1 bin mi c gi tr -c tnh tu theo tu chn -c ch ra. Cc tu chn: xb cho php -c l-ng gi tr ca bin ph thuc thu -c t hm hi quy: Yi 0 1 X i

stdp

-c l-ng sai s chun ca gia tr -c l-ng:2 SE i Var (0 ) X i Var (1 ) 2X i Cov(0 , 1 )

redid

-c l-ng gi tr phn d-: e i Yi Yi

V d: predict exphat, xb To ra bin mi exphat c gi tr -c l-ng ca bin ph thuc (fitted value) theo h s thu -c t hm hi quy.58

predict expres, resid To ra bin expres c gi tr ca phn d-. Kim nh v h s ca hm hi quy C php: test [gi tr biu thc] test [danh sch bin] testparm [, equal ] Lnh test kim nh cc gi thit v h s ca hm hi quy va mi -c -c l-ng V d: test urban98 =2000 Kim nh gi thit h s ca bin urban98 = 0 test region1 = region2 Kim nh gi thit h s ca bin region1 bng h s ca bin region2 test region1 = (region2+region3)/2

Kim nh gi thit v quan h gia cc h s ca bin region1, region2, va region3 test region1 region2 region3

Kim nh gi thit h s ca bin region1, region2, va region3 u bng 0 testparm region* Kim nh gi thit v ca h s ca bin region1 n region7 u bng 0

. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . reg rlpcex1 urban98 region* sex educyr98 hhsize

59

Source | SS df MS -------------+-----------------------------Model | 1.6960e+10 10 1.6960e+09 Residual | 2.6525e+10 5988 4429712.49 -------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

Number of obs F( 10, 5988) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

5999 382.87 0.0000 0.3900 0.3890 2104.7

-----------------------------------------------------------------------------rlpcex1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------urban98 | 1995.163 66.46943 30.02 0.000 1864.859 2125.467 region1 | -923.7066 132.8334 -6.95 0.000 -1184.108 -663.3052 region2 | -362.6047 130.2254 -2.78 0.005 -617.8934 -107.316 region3 | -558.0354 137.1551 -4.07 0.000 -826.9089 -289.1619 region4 | -100.7586 135.8372 -0.74 0.458 -367.0486 165.5313 region5 | (dropped) region6 | 1742.688 131.9928 13.20 0.000 1483.934 2001.441 region7 | 151.9854 128.0272 1.19 0.235 -98.99396 402.9648 sex | 270.9142 66.61031 4.07 0.000 140.3339 401.4944 educyr98 | 153.3281 6.836934 22.43 0.000 139.9253 166.731 hhsize | -257.691 14.73741 -17.49 0.000 -286.5816 -228.8004 _cons | 2362.355 178.3197 13.25 0.000 2012.784 2711.926 -----------------------------------------------------------------------------. test ( 1) urban98 =2000 urban98 = 2000.0 F( 1, 5988) = Prob > F = 0.01 0.9420

. test ( 1)

region1 = region2 region1 - region2 = 0.0 F( 1, 5988) = Prob > F = 34.57 0.0000

. test ( 1)

region1 = (region2+region3)/2 region1 - .5 region2 - .5 region3 = 0.0 F( 1, 5988) = Prob > F = 27.80 0.0000

. test ( 1) ( 2) ( 3)

region1 region2 region3 region1 = 0.0 region2 = 0.0 region3 = 0.0 F( 3, 5988) = Prob > F = 20.22 0.0000

. testparm ( 1) ( 2)

region*

region1 = 0.0 region2 = 0.0

60

( ( ( ( (

3) 4) 5) 6) 7)

region3 = 0.0 region4 = 0.0 region5 = 0.0 region6 = 0.0 region7 = 0.0 Constraint 5 dropped F( 6, 5988) = Prob > F = 148.55 0.0000

Ch-ng IV: V th

1. V th (graph) C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi] [, loi__th tu_chn_ring tu_chn_chung] Trong : loi__th (graph_type) Ch ra loi th cn v Cc tu chn lin quan n

tu_chn_ring (specific_options) tng loi th

tu_chn_chung (common_options) Cc tu chn c th s dng chung cho cc loi th nh- tu chn v nh nhn trn cc trc ca th

Stata cho php v 8 loi th nh- sau (graph_type): (1) th 2 chiu (two-way scatterplots) . graph rlpcex1 age61

45801.7

comp.M&Reg price adj.pc tot exp

357.318 16 Age of household head 95

(2) Ma trn th 2 chiu (two-way scatterplot matrices) . gr rlpcex1 age educyr98 hhsize, matrix16 95 1 19 45801.7

comp.M&Reg price adj.pc tot exp357.318 95

Age of household head16 22

schooling year of HH.head0 19

Household size

1 357.318 45801.7 0 22

(3) th tn sut (histograms) . gr rlpcex1, bin(50) normal

62

.329888

Fraction

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

(4) th ri mt chiu (one-way scatterplots) . gr rlpcex1, oneway

357.318

comp.M&Reg price adj.pc tot exp

45801.71

(5) th hnh hp (box-and-whisker plots)

63

comp.M&Reg price adj.pc tot exp 45801.7

357.318

(6) th ct (bar chart) . sort reg7 . gr poor, bar means by(reg7)poor .498254

0

1

2

3

4

5

6

7

(7) th hnh trn (pie charts) . for num 1/7: gen poorX=poor if reg7==X -> gen poor1=poor if reg7==1

(5140 missing values generated) -> gen poor2=poor if reg7==2

(4824 missing values generated) -> gen poor3=poor if reg7==364

(5291 missing values generated) -> gen poor4=poor if reg7==4

(5245 missing values generated) -> gen poor5=poor if reg7==5

(5631 missing values generated) -> gen poor6=poor if reg7==6

(4976 missing values generated) -> gen poor7=poor if reg7==7

(4887 missing values generated) . graph poor1-poor7, pie24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

(8) th hnh sao (star charts) chart_type l star

65

Audi 5000

Audi Fox

BMW 320i

Datsun 200

Datsun 210

Price Mileage (mpg) Repair Record 1978Datsun 510 Datsun 810 Fiat Strada Honda Accord Honda Civic

Headroom (in.) Trunk space (cu. ft.) Weight (lbs.) Length (in.)

Mazda GLC

Renault

Subaru

Toyota Celica

Toyota Corolla

Turn Circle (ft.) Displacement (cu. in.)

Toyota Corona

VW Dasher

VW Diesel

VW Rabbit

VW Scirocco

Volvo 260

Cc la chn chung (common_options) * To tp s liu. tabulate hhsize, sum (rlpcex1)

| Summary of comp.M&Reg price adj.pc Household | tot exp size | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 4696.0254 4619.5012 214 2 | 4131.4892 3677.2297 497 3 | 3834.8615 2913.8177 731 4 | 3428.8011 2599.7301 1404 5 | 2930.5486 2168.0644 1318 6 | 2626.6848 2277.1893 867 7 | 2501.0912 2186.1605 480 8 | 2329.7009 1803.7873 255 9 | 2207.0166 1380.5607 126 10 | 2252.3772 1423.7576 58 11 | 2370.7034 1404.7148 29 12 | 1747.3691 924.72977 9 13 | 2114.1337 2109.0077 4 14 | 1579.78 990.81152 4 16 | 2994.5771 2061.6804 2 19 | 4833.936 0 1 ------------+-----------------------------------Total | 3188.6671 2692.5673 5999 . tab hhsize, | Household | sum(educyr98) Summary of schooling year of HH.head

66

size | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 3.7897196 4.3956537 214 2 | 5.7545272 4.7225549 497 3 | 7.3023256 4.6396425 731 4 | 8.2578348 4.2659841 1404 5 | 7.7243298 4.2998488 1318 6 | 6.8788927 4.0778062 867 7 | 6.3348958 4.1241759 480 8 | 5.7333333 3.9623557 255 9 | 5.7936508 3.4878474 126 10 | 6.1724138 3.1851516 58 11 | 4.7931034 3.1665586 29 12 | 4.4444444 3.6438685 9 13 | 5 5.0990195 4 14 | 3 2.1602469 4 16 | 4 1.4142136 2 19 | 2 0 1 ------------+-----------------------------------Total | 7.0944185 4.4160917 5999 . replace meanexp= meanexp/1000 (16 real changes made) . replace meanexp= meanexp/1000 . rename var71 ahhsize . rename var72 meanexp . rename var73 meanedu . replace meanexp= meanexp/1000 . label var meanexp Chi tieu binh quan . label var meanedu So nam hoc . label var ahhsize Quy mo ho

* Cc tu chn v tiu v trc to Ly v d th 2 chiu, trc tung th hin chi tiu bnh qun v s nm hc bnh qun ca ch h, trc honh th hin quy m h gia nh. . gr meanexp meanedu ahhsize

67

meanexp 8.25783

meanedu

1.57978 1 ahhsize 19

* La chn v tiu : title("chui k t") t1title("chui k t") t2title("chui k t") b1title("chui k t") b2title("chui k t") l1title("chui k t") l2title("chui k t") r1title("chui k t") r2title("chui k t") Lnh ny ghi cc tiu trn pha trn (top), (bottom), bn tri (left) v bn phi (right) th. V d: gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) pha d-i

68

Chi tieu binh quan 8.25783

So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho

1.57978 1 Quy mo ho gia dinh 19

Do thi chi tieu va hoc van chu ho

* Hin th gi tr trc th xlabel[(gi tr s)] ylabel[(gi tr s)] rlabel[(gi tr s)] tlabel[(gi tr s)] V d: gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabelChi tieu binh quan 8 So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho

6

4

2 0 5 10 Quy mo ho gia dinh 15 20

Do thi chi tieu va hoc van chu ho

69

Ch : Cc la chn khc c th xem phn help bng lnh: help graxes Cc tu chn v -ng ni xline[(gi tr s)] yline[(gi tr tline[(gi tr s)] connect(c[[p]] ... c[[p]]) V d: . gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll) s)] rline[(gi tr s)]

Chi tieu binh quan 8

So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho

6

4

2 0 5 10 Quy mo ho gia dinh 15 20

Do thi chi tieu va hoc van chu ho2. Mt s loi th th-ng dng 2.1. th 2 chiu C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], twoway [tu_chn_chung rescale] Tu chn rescale cho php hin th hai trc tung vi gi tr khc nhau . gen meanexp1=meanexp*1000 . label var meanexp1 "Chi tieu binh quan"

70

. gr meanexp1 meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (nghin dong)) b2title (Quy mo ho gia dinh) xlabel ylabel rlabel(2 4 to 8) connect(ll) rescaleChi tieu binh quan 5000 So nam hoc 8

Chi tieu binh quan (nghin dong)

4000So nam hoc

6 3000

4 2000

1000 0 5 10 Quy mo ho gia dinh 15 20

2

Do thi chi tieu va hoc van chu ho

2.2. th tn sut C php: graph [bin] [quyn s] [iu kin] [phm vi], [tu_chn_chung bin(#) freq normal[(#,#)] density(#)] Cc tu chn: bin(#) Freq Ch ra s l-ng khong cho th, gi tr ngm nh l bin(5) Gi tr tn sut s -c hin th trn trc tung histogram

normal[(#,#)] V hm phn phi chun density(#)] -c dng vi la chn normal, ch ra s l-ng im -c l-ng hm mt theo phn phi chun

V d: th tn sut ca chi tiu binh qun u ng-i . gr rlpcex1, hist bin(20) normal

71

.56026

Fraction

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

. gr

rlpcex1, hist bin(50) normal freq1979

Frequency

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

. gr

rlpcex1, hist bin(50) normal freq by(reg7)

72

region1 415

region2

region3

0 region4 415 region5 region6

Frequency

0 region7 415 357.318 45801.7 357.318 45801.7

0 357.318

45801.7

comp.M&Reg price adj.pc tot exp

Histograms by Code by 7 regions

2.3. th hnh ct C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], bar [tu_chn_chung [no]alt means stack] V d: th gi tr trung bnh hc vn ca ch h v quy m h gia nh theo 7 vng . gr8.64426

educyr98 hhsize, bar means by(reg7)schooling year of HH.head Household size

0

1

2

3

4

5

6

7

73

. label define region 1 "region1" 2 "region2" 3 "region3" 4 "region4" 5 "region5" 6 "region6" 7 "region7" . label values reg7 region . tab reg7 Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . gr alt10

educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10)schooling year of HH.head Household size

8

6

4

2

region1 region2

region3 region4

region5 region6

region7

La chn stack . gen persons=1 . gr persons urban98, bar ylabel by(reg7) stack alt

74

persons 1500

1:urban 98; 0:rural 98

1000

500

0

region1 region2

region3 region4

region5 region6

region7

V d: Hy v th sau:foodpoor 600 poor

400

200

0

region1 region2

region3 region4

region5 region6

region7

2.4. th hnh trn C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], pie [tu_chn_chung] Lnh ny v th hnh trn Mi bin s chim 1 phn ca hnh trn v t l ca phn ny do tng gi tr ca cc quan st cu bin quyt nh. V d: V th t l phn trm s ng-i ngho ca mi vng trn tng s ng-i ngho ca c n-c.75

. gr poor1-poor7, pie24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

. gen nonfpood=poor- foodpoor . label var nonfpood "poor but still above food poverty line"

. gen nonpoor=( rlpcex1>=1790) . gr foodpoor nonfpood nonpoor, pie

. set textsize 9012% foodpoor 18% poor but stil l above food povert 70% nonpoor

. set textsize 100 . gr foodpoor nonfpood nonpoor, pie by(reg7) total

76

region1

region2

region3

12% foodpoor 18% poor but still above food povert 70% nonpoor

region4

region5

region6

region7

Total

3. L-u tr v hin th th (Saving and graph using) l-u tr th th ti ca s graph, vo thc n File, chn Save graph, sau la chn -ng dn v tn file cho th, phn m rng ngm nh l gph. th cng c th -c l-u tr bng tu chn [,replace]) vit sau lnh graph V d: . gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt saving ("c:\ do thi 1") . gr persons urban98, saving("c:\do thi 2") bar ylabel by(reg7) stack alt saving(tn tp

khng hin th th th c th dng lnh tt ch hin th th bng lnh set graphics { on | off } . set graphics off . gr poor1-poor7, pie saving ("c:\do thi 3", replace) (note: file c:\do thi 3.gph not found) Stata cho php hin th cc th l-u tr bng lnh: graph using [tp tp th 2 ...] [, margin(#)]77

margin(#) ch ra khong cch l bao quanh th theo gi tr phn trm ca din tch th. Gi tr ngm nh l 0. V d: . set graphics on . graph using "c:\do thi 1" "c:\do thi 2" "c:\do margin(10) title("Mot so dac diem cua ho gia dinh")region1 region2 region3

thi

3",

persons 12% foodpoor 18% poor but still above food povert 70% nonpoor 1500

1:urban 98; 0:rural 98

region4

region5

region6

1000

region7

Total

500

0

region1 region2

region3 region4

region5 region6

region7

24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

Mot so dac diem cua ho gia dinhCh : Chng ta co th kt hp lnh saving vi using l-u tr ra th mi. V d: . graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so dac die m cua ho gia dinh") saving("c:\do thi tong hop") . graph using "c:\do thi tong hop"

78

Ch-ng V: Lp trnh trong Stata

1. Gii thiu chung v ch-ng trnh do-file 1.1. M v l-u tr do-file Stata cho php vit cc tp -c gi l do-file bao gm cc lnh ca Stata. Thay v thc hin tng lnh mt t ca s lnh command, cc tp do-file s ln l-t thc hin cc lnh . Ch-ng trnh Stata -c son tho trong ca s do-file editor. Ca s ny -c m bng cch kch vo thc n Windows v chn tu chn do-file editor. Mt cch khc m ca s ny l g lnh doedit ti ca s lnh command. V d: Mt ch-ng trnh c th -c son tho trong ca s do-file editor nh- sau: ---------------clear set mem 32m use "C:\VLSS98\Hhexp98n.dta", clear tab urban98 sum hhsize gen new=hhsizet gen new=hhsize----------------

Sau khi son tho, do-file s -c l-u tr bng tu chn Save as trong thc n File ca ca s do-file editor. Tn ca do-file c th -c ch ra ngay ti lnh doedit nh- sau: doedit (tn do-file) Tp do-file c phn m rng l do.

79

v d trn chng ta c th l-u tr on ch-ng trnh d-i tn l ch-ng trnh 1 ti th- mc Vlss98 trn a C. 1.2. Thc hin cc tp do-file chy do-file th ti ca s lnh chng ta g mt trong hai lnh sau: do filename [, nostop]

run filename [, nostop] Lnh run thc hin cc lnh trong do-file nh-ng khng hin th kt qu ra mn hnh. Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata s bo li v ngng vic thc hin cc cu lnh sau . Tuy nhin nu tu chn nostop -c ch ra th Stata s b qua cu lnh b li v tip tc thc hin cc lnh sau cu lnh li . V d: . do "c:\vlss98\chuong trinh 1" . clear . set mem 32m (32768k) . use "C:\VLSS98\Hhexp98n.dta", clear . tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum.

------------+----------------------------------Rural | Urban | 4269 1730 71.16 28.84 71.16 100.00

------------+----------------------------------Total | 5999 100.00

. sum hhsize Variable | Max -------------+---------------------------------------------------hhsize | 1980

Obs

Mean

Std. Dev.

Min

5999

4.752292

1.954292

1

. gen new=hhsizet hhsizet not found r(111); end of do-file r(111);

Vi tu chn nostop . do "c:\vlss98\chuong trinh 1", nostop . clear . set mem 32m (32768k) . use "C:\VLSS98\Hhexp98n.dta", clear . tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum.

------------+----------------------------------Rural | Urban | 4269 1730 71.16 28.84 71.16 100.00

------------+----------------------------------Total | 5999 100.00

. sum hhsize Variable | Max -------------+---------------------------------------------------hhsize | 19 5999 4.752292 1.954292 1 Obs Mean Std. Dev. Min

. gen new=hhsizet hhsizet not found r(111); . gen new=hhsize81

. end of do-file Thc hin (chy) bng lnh run . run "c:\vlss98\chuong trinh 1", nostop hhsizet not found Cc do-file c th thc hin bng tu chn Do trong thc n File, hoc thc hin trc tip trong ca s Do-file editor bng tu chn Do hoc Run trong thc n Tool. 1.3. Mt s l-u khi son tho do-file version # Khi son tho cc tp do-file chng ta nn -a dng lnh ny vo u ch-ng trnh thng bo phin bn Stata -c dng son tho do-file. V d nu nh- chng ta dng Stata 7.0 son tho do-file th cu lnh ny s -c -a vo u ch-ng trnh nhsau: version 7.0 clear use Hhexp98n.dta tab reg7 . Cc phin bn Stata khc nhau s c th c s khc nhau v c php hoc ngha ca cc cu lnh. Lnh version cho php ch-ng trnh Stata chy c th hiu ng -c ni dung ca tp do-file -c vit bi cc phin bn khc. set memory #[k|m] Nu nh- file s liu i hi b nh ln hn b nh m Stata ang s dng th chng ta phi thit lp b nh ln hn cho Stata bng lnh trn. Ch l khng nn thit lp b nh ln hn b nh ca RAM my tnh. V d: . use "C:\Hhexp98n.dta", clear no room to add more observations r(901); . set mem 32m (32768k) . use "C:\Hhexp98n.dta", clear set more off/on82

Theo ch ngm nh, khi thc hin mt lnh nu nh- kt qu ca vic x l lnh di hn ca s kt qu (Stata Results), mn hnh s dng li v chng ta s phi n phm (chng hn Enter hoc Space bar) kt qu tip tc -c hin th. Lnh set more off cho php kt qu khng b dng li m -c hin th lin tc cho n khi thc hin xong cu lnh hoc do-file. Lnh set more on khi phc li ch ngm nh. K t * v /* */

Stata s khng thc hin cc cu lnh -c bt u bng k t * hoc nm gia hai nhm k t /* */. Cc k t ny dng vit ch thch trong do-file. V d: -------------------version 7.0 set mem 32m use "C:\Hhexp98n.dta", clear * Tao bien thu nhap cua ho gia dinh /* Bien nay bang Thu nhap binh quan nhan voi Quy mo ho*/ gen hhexp = rlpcex1 * hhsize #delimit ; Khi cu lnh trong do-file editor qu di th chng ta c th dng lnh ny thng bo rng 1 cu lnh -c kt thc bng k t (;). Theo ch ngm nh th cu lnh -c kt thc khi xung dng bng vic g phm Enter. khi phc li ch ngm nh th dng lnh #delimit cr V d: lnh v th ch-ng tr-c: graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll) tung -ng vi: #delimit ; graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)83

yline(2 4 to 8) connect(ll) ; gen hhexp = rlpcex1 * hhsize ; .. Sau chng ta nn khi phc li ch ngm nh nu nh- cc cu lnh sau c th vit trn 1 dng bng lnh: #delimit cr Ch : Chng ta c th dng k t /* */ vit cu lnh di nhsau: graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) /* */ l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) /* */ b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll); Cc lnh # delimit v cch vit cu lnh di s dng k t /* */ ch dng -c trong do-file ch khng dng -c ti ca s lnh command.

2. Local v global macros Macros l cc bin -c dng trong cc ch-ng trnh Stata. Bin macros -c xem nh- 1 on k t - gi l macroname (tn ca macros) - t-ng ng vi 1 dy k t khc - -c gi l macro contents (ni dung ca macro). C hai loi macros l local macros (macros ni b) v global macros (macros ton b). 2.1. Local macros Nu chng ta g: . local hogd age hhsize rlpcex1 (Du nhy kp co th b qua, tc l c th g: local hogd age hhsize rlpcex1) Khi th `hogd s -c hiu t-ng -ng vi: age hhsize rlpcex1. hogd -c gi l tn ca macros, cn age hhsize rlpcex1 l ni dung ca macros. s dng ni dung ca macros, chng ta g tn ca macros gia du trch dn bn tri ( ) nm pha trn bn tri bn phm - v du trch dn bn phi ( ) nm pha phi bn d-i ca bn phm.84

Nh- vy nu chng ta g: . summarize `hogd th t-ng -ng vi g: . summarize age hhsize rlpcex1

Nu chng ta g: . local tb summarize th chng ta c th thc hin lnh summarize rlpcex1 bng cch g: . `tb' `hogd' age hhsize

Variable |

Obs

Mean

Std. Dev.

Min

Max -------------+----------------------------------------------------------age | 5999 48.01284 13.7702 16 95 hhsize | 5999 4.752292 1.954292 1 19 rlpcex1 | 5999 3188.667 2692.567 357.318 45801.71 hin th ni dung ca local macros th chng ta g lnh macros list _(tn local macros) V d: . macro list _hogd _hogd: age hhsize rlpcex1

xo local macros th chng ta c th dung lnh macros drop _(tn local macros) V d: . macro drop _hogd . macro list _hogd local macro `hogd' not found r(111); 2.2. Global macros Nu chng ta g: . global diaban reg7 province commune

85

(hoc c th b qua du ngoc kp: global diaban reg7 province commune) Khi th $diaban t-ng -ng vi: reg7 province commune. diaban -c gi l tn ca macros, cn reg7 province commune l ni dung ca macros. s dng -c ni dung ca global macros chng ta g k hiu $ lin tr-c tn ca macros. Nh- vy nu chng ta g: . describe $diaban th t-ng -ng vi g: . describe : reg7 province commune . describe $diaban storage display value variable name type format label variable label -----------------------------------------------------------------------------reg7 int %8.0g Code by 7 regions province float %9.0g Province code commune float %9.0g commune code PSU-SVY commands . global mota "describe" . $mota $diaban storage display value variable name type format label variable label -----------------------------------------------------------------------------reg7 int %8.0g Code by 7 regions province float %9.0g Province code commune float %9.0g commune code PSU-SVY commands hin th ni dung ca global macros th chng ta g lnh macros list (tn global macros) V d: . global diaban "reg7 province commune" . macro list diaban diaban: reg7 province commune86

xo global macros th chng ta c th dng lnh macros drop (tn local macros) V d: . macro drop diaban . macro list diaban global macro $diaban not found r(111); 2.3. S khc nhau gia local macros v global macros Local macros ch tn ti trong 1 ch-ng trnh. Mt s khng hiu -c cc local macros -c s dng trnh khc. Trong khi , mt khi -c khai macros -c hiu bi tt c cc ch-ng trnh v tn nh ca Stata trong sut qu trnh hot ng. V d: Thc hin on ch-ng trnh khai bo local macros a. Sau thc hin lnh hin th ni dung local macros ny, nh-ng macros ny khng tn ti on ch-ng trinh khc hay b nh ca Stata. . do "C:\WINDOWS\TEMP\STD010000.tmp" . local a "chuong trinh thong ke Stata" . end of do-file . macro list _a local macro `a' not found r(111); Trong khi i vi global macros . do "C:\WINDOWS\TEMP\STD010000.tmp" . global b "chuong trinh thong ke Stata" . end of do-file . macro list b b: chuong trinh thong ke Stata ch-ng trnh cc ch-ng bo, global ti trong b

3. Tch v h-ng v m