Sử dụng chương trình Stata để khai thác số liệu

100
Sử dụng chương trình Stata để khai thác số liệu Điều tra Mức sống hộ gia đình (VLSS) * nội dung Chương I: Giới thiệu chung về chương trình Stata ................... 2 1. Tổ chức lưu trữ dữ liệu trong Stata (Dataset in Stata) ......... 2 2. Khởi động và thoát khỏi Stata (Open and exit) ................. 3 3. Giao diện Stata 7 (Stata interface) ..........................3 4. Biên bản làm việc (log file) .................................6 5. Nhập và lưu dữ liệu (Use, input and and save) ..................7 Chương II: Khai thác dữ liệu .....................................10 1. Cấu trúc lệnh trong Stata (Stata command syntax) .............10 2.Toán tử và hàm số (Operators and functions) ..................13 3. Mô tả dữ liệu (Data reporting) ..............................14 4. Biên tập và sửa chữa dữ liệu (Data manipulation) ............. 26 5. Quyền số trong VHLSS (Weight) ..............................39 Chương III: Kiểm định giả thiết và phân tích hồi quy ................44 1. Ước lượng và kiểm định giả thiết (Estimation and hypothesis testing) ................................................... 44 2. Phân tích tương quan và hồi quy (Correlation and regression) . 50 Chương IV: Vẽ đồ thị ............................................ 57 1. Vẽ đồ thị (graph) ......................................... 57 2. Một số loại đồ thị thường dùng ..............................65 3. Lưu trữ và hiển thị đồ thị (Saving and graph using) ........... 72 Chương V: Lập trình trong Stata ..................................74 1. Giới thiệu chung về chương trình do-file ....................74 2. Local và global macros .................................... 78 3. Tích vô hướng và ma trận (scalar and matrix) ................. 81 4. Lệnh điều kiện và vòng lặp ..................................83 5. Giới thiệu về file ado .....................................85 Tài liệu tham khảo ..............................................87 Phụ lục ....................................................... 87 1

Transcript of Sử dụng chương trình Stata để khai thác số liệu

S dng chng trnh Stata khai thc s liu iu tra Mc sng h gia nh (VLSS) *

ni dung

Chng I: Gii thiu chung v chng trnh Stata1. T chc lu tr d liu trong Stata (Dataset in Stata) Stata l phn mm thng k s dng qun l, phn tch s liu v v th. Stata cho php lu tr thng tin v cc c im ca cc i tng nghin cu. S liu lu tr trong Stata c th c hin th di dng bng nh v d sau: hhcode headname 101 Nguyen Van A 102 Le Thi B 103 Tran Van C Quan st (bn ghi) hhsize 6 5 10 incomepc 2100 3210 1200

Mi mt hng ngang ca bng s liu c gi l mt quan st (observation), hay mt bn ghi (record) lu tr s liu v mt i tng nghin cu. v d trn c 3 quan st lu tr s liu v M h (hhcode); Tn ch h (headname); Quy m h (hhsize); Thu nhp bnh qun (incomepc) ca 3 h gia nh. Bin (trng; thuc tnh) Thng tin v i tng nghin cu c thu thp v lu tr theo cc c im ca chng. Cc c im ny c gi l bin (variable), hay trng (field). Bin c xem l cc ct ca bng s liu. v d trn c 4 bin, vi tn l hhcoed, hedname, hhsize, v incomepc. Tn bin di t 1 n 32 k t, c bt u ch hoc du gch di (_). Tn bin ch bao gm ch, s v du gch di. Cc k t c bit khc khng th dng t tn cho bin. Bin xc nh (identifying variables) Thng thng trong cc bin s c cc bin dng nhn dng quan st, c gi l bin xc nh. Nh c cc bin xc nh ny m cc quan st c th phn bit c vi nhau. Mi mt quan st c mt gi tr ca cc bin ny. v d trn, bin xc nh l hhcode, i vi mi mt quan st bin hhcode nhn mt gi tr. Cc c im ca bin Cc bin c th c gn nhn (ch thch). V d bin hhcode c th c gn nhn l M h.

1

Bin c th c nh dng (format) l bin s v bin k t vi cc loi lu tr khc nhau. Bin s c th lu tr di loi byte; int; long; float; double. Cn bin k t th c th lu tr di dng str1 n str80 cho cc di khc nhau. Kiu lu tr Dung lng Gi tr nh nht Gi tr ln nht Kiu dng s (Byte) byte 1 -127 126 S nguyn int 2 -32,767 32,766 S nguyn long 4 -2,147,483,647 2,147,483,646 S nguyn float 4 -10^36 10^36 S thc double 8 -10^308 10^308 S thc Cc bin s c th bao gm cc bin ri rc v lin tc. Cc bin nh l quy m h gia nh, gii tnh ch h, vng a l, trnh gio dc l cc bin ri rc (discrete) (hay cn gi l bin phn loi (categorical)). Cc bin ny c th c lu tr di dng byte, int, v long. Cc bin lin tc (continuous) nh thu nhp, chi tiu ca h th lu tr di dng float hoc double. Bin k t (string) dng lu tr cc loi k t. V d bin headname l bin kiu k t dng lu tr tn ca ch h. Kiu lu tr dng ch str1 str2 ... str80 Byte 1 2 80 di ln nht 1 2 80

2. Khi ng v thot khi Stata (Open and exit) Stata c khi ng tng t nh cc chng trnh tin hc ng dng khc, bng cch kch vo biu tng ca tp wstata.exe trong Windows explorer, hoc chn bng cch chn Start -> Program -> Stata. Chng trnh c thot ra bng lnh exit t ca s lnh Stata Command, hoc tu chn exit trong thc n (menu) File. 3. Giao din Stata 7 (Stata interface)1 Sau khi Stata c khi ng, giao din ca Stata s c hin ln, bao gm thanh thc n (menu bar) trn cng, di l thanh cng c (tool bar) v cc ca s (windows).

1

Phin bn Stata 8 c giao din tng t nh phin bn Stata 7. Khc bit ln nht l Stata 8 c thm tu chn Statistics trong thanh thc n. Tu chn ny cho php thc hin cc mt s lnh thng k bng cc tu chn qua giao din ca s m khng phi g cc lnh trong ca s Command. 2

Cc ca s ca Stata Cc ca s ca Stata c m ra bng vic la chn cc tu chn thanh thc n Windows (menu bar). Cc ca s ny bao gm: Results Graph Viewer Command Review Variables Data editor Do-file editor Hin th cc lnh v kt qu Hin th th Hin th ca s tr gip (help) v hin th ni dung cc file vn bn (text) Dng g cc cu lnh Hin th cc lnh thc hin Hin th danh sch cc bin ca tp s liu Hin th v sa cha s liu di dng bng Hin th ca s son tho chng trnh

Thanh thc n (Menu bar) Bng cch kch vo thanh thc n v cc tu chn trong , Stata s thc hin cc lnh khc nhau. Thanh thc n bao gm cc nhm lnh sau y: File Open M file s liu3

View Save Save as File name Log Save graph Print graph Print results Exit Edit Copy text Copy tables Paste Table copy options

Xem cc file ca Stata trong ca s Viewer Lu file s liu Lu file s liu di tn mi Chn tn file a vo ca s lnh ng, m, xem li log file Lu gi file th In th In kt qu Thot khi Stata

Sao chp vn bn (text) Sao chp bng biu Dn La chn sao chp bng s liu La chn sao chp th (khng c trong Stata 7) Cc tu chn v mu sc, phng ch, v kch c

Graph copy options Prefs Windows Results Graph Log Viewer Command Review Variables Help/Search Data editor Do-file editor Help Thanh cng c (tool bar)

M ca s kt qu M ca s th M ca s log file M ca s tr gip (help) v xem ni dung file M ca s cu lnh M ca s cc lnh thc hin M ca s danh sch cc bin ca tp s liu M ca s tr gip (help) M ca xem s liu lu tr di dng bng M ca s vit chng trnh Cc tr gip lin quan n vic s dng Stata

Cc tu chn trn thanh cng c c thit k thc hin cc lnh thng dng ca Stata. Nu chng ta di chuyn con tr n cc nt ny th s hin ln cc cu hung dn, bao gm: Open (use) Save M file s liu Stata Lu tr file s liu ra a4

Print results Begin log Start viewer Bring Dialog Window to font Bring Result Window to font Bring Graph Window to font Do-file editor Data editor Data browser Clear more- condition Break

In ni dung ca ca s kt qu M, ng v xem ni dung ca file log M ca s tr tr (help) a ca s hp thoi ra pha trc a ca s kt qu ra pha trc a ca s v th ra pha trc M ca s son tho chng trnh M ca s sa cha s liu M ca s xem s liu Tt lnh more Dng vic thc hin lnh hoc chng trnh

4. Bin bn lm vic (log file) Thng thng khi lm vic vi Stata, ngi s dng mun ghi li bin bn lm vic bao gm cc lnh, cc thng bo v cc kt qu phn tch thu c. Stata cho php ghi li cc bin bn lm vic bng lnh log using. C php: log using (ng dn\tn tp) [, append replace [ text | smcl ] ] Cc tu chn: append replace text smcl V d: log using baitap1 . log using baitap1 ------------------------------------------------------------------------------log: C:\baitap1.smcl log type: smcl opened on: 17 Feb 2004, 15:32:03 log using baitap1, replace log using d:\baitap2, text To tp baitap1 ghi ln tp baitap1 c sn To tp baitap2 ti a D, di dng vn bn (text) (phn m rng l log)5

Ghi bin bn lm vic tip vo 1 file c sn Ghi li bin bn lm vic ln 1 file c sn To bin bn lm vic di dng vn bn (text) (phn m rng l log) To bin bn lm vic di dng smcl (phn m rng l smcl), y cng l tu chn ngm nh

To tp baitap1 ghi li bin bn lm vic ti th mc hin thi, phn m rng mc nh l smcl

log using d:\baitap2, append V d: translate baitap1.smcl exercise1.log log off

Ghi tip tc bin bn lm vic tp baitap2 ti a D

Cc tp vi phn m rng smcl c th chuyn thnh cc tp text bng lnh translate.

Lnh ny tm thi dng vic ghi li bin bn lm vic vo tp log/smcl ang m log on Lnh ny tip tc ghi bin bn lm vic vo tp log ang m. Lnh ny c dng sau ln log using hoc log off. log close Lnh ny ng v lu tr tp log ang m. Ch : Stata cho php ch ghi li nhng g m ngi s dng g trong ca s command, vic ny gip cho vic sau ny vit cc chng trnh da trn nhng bin bn lm vic. C php: cmdlog using (ng dn\tn tp) [, append replace] cmdlog {off | on | close} xem cc file log/smcl vo thanh thc n: file/log/view (hoc ca s lnh command g: view (tn tp)); hoc c th m bng cc chng trnh son thao vn bn khc nh MS-Word; Notepad

5. Nhp v lu d liu (Use, input and and save) M tp s liu ang c: C php: use (ng dn\tn tp) Lnh ny m tp Stata, vi phn rng l .dta, c ch ra tn tp. V d: use ho1.dta use "D:\VHLSS 2004\ho1.dta", clear m tp ho1.dta th mc hin thi m tp ho1.ta th mc VHLSS 2004 trn D

Tp s liu Stata c th c m bng la chn Open trn thc n File; hoc nt Open (use) trn thanh cng c tool bar. Nu file s liu c dung lng ln th chng ta phi thit lp b nh cn dng cho Stata bng lnh: set memory #[k|m] V d: set mem 32m set mem 32000k Nhp s liu C mt s cch nhp s liu t bn phm vo b nh ca Stata.6

-

S dng ca s Stata editor nhp s liu. Hoc t ca s command, g lnh edit. Sau nhp s liu theo kiu biu bng trong ca s ny. S dng lnh: input [danh sch bin + nh dng nu cn] Sau s dng bn phm nhp s liu ln lt cho cc bin ca tng quan st. Gi tr c nhp cch nhau 1 k t trng. Kt thc nhp s liu bng lnh end. V d: . input hhcode str15 name income hhcode name income 1. 101 "Nguyen Van A" 1200 2. 102 "Nguyen Van B" 1350 3. 103 "Tran Thi C" 2310 4. end

Stata cho php nhp s liu t cc file c s d liu khc. Trc ht cc file s liu ny cn c lu tr di dng text (c th bng chng trnh Excel), cc quan st c cc nhau 1 dng v cc gi tr cch nhau 1 du phy (commas) hoc du cch (tab). Sau dng lnh insheet nhp s liu ny vo Stata. C php: insheet [danh sch bin] using (tn tp text) [, [no]names comma tab clear] Lnh ny s c vo b nh ca Stata cc quan st ca tp text, v ch ra tn cc bin s c to ra. Cc tu chn: [no]names comma tab clear V d: . insheet using c:\income.txt (3 vars, 4 obs) . insheet maho hoten thunhap using c:\income.txt (note: variable names in file ignored) (3 vars, 4 obs) Lu tr s liu C php: save (ng dn\tn tp) [,replace] Lnh ny lu tr s liu ang trong b nh ca Stata thnh tp ch nh di tn tp. Nu tu chn replace c ch ra th tp s liu ny s ghi ln tp hin thi (tt nhin tn tp s liu l ging nhau).7

Cho php nhp tn bin c ch ra dng th nht ca file text Thng bo l cc gi tr ca file text c phn cch bng du phy Thng bo l cc gi tr ca file text c phn cch bng du tab S liu c c vo s thay th s liu ang c thng tr trong b nh ca Stata

Vic lu tr s liu c th thc hin bng cc ty chn Save v Save as trong thanh thc n (menu bar); hoc nt Save trn thanh cng c (tool bar). Ch : Xem thm lnh infile v outfile

Chng II: Khai thc d liu1. Cu trc lnh trong Stata (Stata command syntax) Cu trc c bn ca mt lnh trong Stata nh sau: [by danh sch bin:] C php lnh [danh sch bin] [biu thc] [iu kin] [phm vi] [quyn s] [, tu chn] Trong phn Hng dn s dng (Help) ca Stata, c php lnh trnh by bng ting Anh nh sau: [by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [, options] Trong du ngoc vung k hiu cc tu chn. Ch : Cc cu lnh Stata c vit bng ch thng. i vi tn bin, Stata phn bit ch vit thng vi ch vit hoa. V d, trong cng mt tp s liu, bin Ho_ten v bin ho_ten l 2 bin khc nhau. Cc tu chn c k hiu trong du ngoc vung [ ]. Cc tu chn ny c th c hoc khng trong cu lnh. Cc tham s bt buc (tn bin) c t trong du ngoc < >. Cc cu lnh s khng thc hin c nu cc tham s bt buc ny khng c khai bo.

8

-

Mt s lnh Stata cho php vit tt. V d lnh summarize c th vit tt l sum. Trong cun ti liu ny phn gch chn di c php ca cu lnh l c php vit tt ca cu lnh . Cc v d trong cun ti liu ny s dng s liu iu tra Mc sng dn c nm 1998 do Tng cc Thng k tin hnh. Trong Tp chi tiu tng hp Hhexp98n.dta thng xuyn c s dng.

-

by danh sch bin (by varlist): Stata s thc hin cu lnh vi theo tng gi tr c ch ra bi danh sch bin. Bin c ch ra bi danh sch bin c yu cu sp xp trc khi thc hin lnh. V d:. sort sex . by sex: sum -> sex = 1 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 4375 2980.906 2430.648 357.318 45801.71 -> sex = 2 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 1624 3748.368 3231.241 376.9805 30624.77 rlpcex1

. sort sex urban98 . by sex urban98: sum rlpcex1

-> sex = 1, urban98 = Rural Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 3344 2308.134 1345.671 357.318 24386.43 -> sex = 1, urban98 = Urban Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 1031 5163.01 3602.245 682.9575 45801.71 -> sex = 2, urban98 = Rural Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 925 2553.448 1776.178 376.9805 25527.95 -> sex = 2, urban98 = Urban Variable | Obs Mean Std. Dev. Min Max

9

-------------+----------------------------------------------------rlpcex1 | 699 5329.628 3962.946 1057.797 30624.77

Danh sch bin (varlist) Ch ra danh sch cc bin chu tc ng ca cu lnh. Nu nh khng c bin no c ch ra th lnh Stata s c tc dng ln tt c cc bin (all variables) V d:. sum hhsize sex reg7 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------hhsize | 5999 4.752292 1.954292 1 19 sex | 5999 1.270712 .4443645 1 2 reg7 | 5999 4.01917 2.145305 1 7 . sum Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------househol | 5999 19617.86 11201.92 101 38820 year | 5999 97.94666 .2247337 97 98 month | 5999 6.340723 3.011082 1 12 --Break-r(1);

Lnh sum ny hin th thng k c bn ca tt c cc bin trong tp s liu. iu kin (if exp) Stata ch thc hin cu lnh i vi cc quan st m gi tr ca n cho kt qu ca biu thc l ng. V d:. sum poor if reg7==1

Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 859 .4982538 .5002882 0 1

Lnh ny ch c tc dng i vi cc quan st m bin reg7 c gi tr bng 1. Phm vi (in range) Ch ra phm vi cc quan st chu tc ng ca cu lnh. Range (phm vi) c th c cc dng sau: sum poor in 10 Tnh gi tr trung bnh ca bin poor cho quan st 10 (chnh bng gi tr ca bin poor ti quan st th 10)10

sum poor in 10/100 sum poor in f/100 sum poor in 100/l Quyn s (weight)

Tnh gi tr trung bnh ca bin poor cho quan st t 10 n 100 Tnh gi tr trung bnh ca bin poor cho quan st t u tin n 100 Tnh gi tr trung bnh ca bin poor cho quan st t th 100 n quan st cui cng

Cho php tnh ton s dng quyn s. Tu chn v quyn s s c trnh by k mc 5 ca chng ny. Cc tu chn (Options) Nhiu cu lnh Stata cho php cc tu chn ring. Cc tu chn ny c ch ra sau du phy. V d: Lnh sum c tu chn l detail, cho php tnh ton thm mt s thng k khc ngoi gi tr trung bnh v lnh chun.. sum rlpcex1, detail comp.M&Reg price adj.pc tot exp ------------------------------------------------------------Percentiles Smallest 1% 682.9575 357.318 5% 1012.433 366.2792 10% 1238.088 376.9805 Obs 5999 25% 1671.054 381.3502 Sum of Wgt. 5999 50% 75% 90% 95% 99% 2397.042 3711.917 5940.803 8045.32 14163.04 Largest 26944.64 30624.77 31066.5 45801.71 Mean Std. Dev. Variance Skewness Kurtosis 3188.667 2692.567 7249918 3.791027 29.21398

Ch : Stata cho php vit tt cc lnh v ty chn. Trong ti liu ny, phn gch chn di cc lnh c ngha l lnh c th vit tt bng k t trong phn gch chn ny. V d nh lnh use c ngha l c th c vit tt bi u. C php ca cc cu lnh trong ti liu ny c vit bng ting Anh, cho php ngi c c th i chiu vi phn hng dn s dng trong Stata.

-

2.Ton t v hm s (Operators and functions) Cc ton t (operators) Cc ton t trong Stata c k hiu nh sau: K hiu S hc + * / Cng Tr Nhn Chia11

ngha

^ Quan h > < >= tabulation of urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 4269 71.16 71.16 Urban | 1730 28.84 100.00 ------------+----------------------------------Total | 5999 100.00 -> tabulation of reg7 Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00

To bng tn sut 2 chiu C php: tabulate [quyn s] [iu kin] [phm vi] [, chi2 missing nofreq cell column row] tab2 [quyn s] [iu kin] [phm vi] [, chi2 missing nofreq cell column row] Lnh tablulate ny tnh v hin th bng tn sut 2 chiu ca 2 bin c ch ra. Lnh tab2 to bng tn sut 2 chiu ca tng cp bin c ch ra trong danh sch bin. V d:. tab urban98 farm 1:urban | Type of HH (1:farm; 98; | 0:nonfarm) 0:rural 98 | non farm farm | Total -----------+----------------------+---------Rural | 1021 3248 | 4269 Urban | 1540 190 | 1730 -----------+----------------------+---------Total | 2561 3438 | 5999

17

Cc tu chn: chi2 missing nofreq cell column row V d:. tab reg7 urban98, cell nof

Thc hin kim nh gi thit l hai bin c lp Cho php cc quan st khng c gi tr c xp vo 1 loi Khng hin th tn sut Hin th tn sut tng i (t l %) ca cc Hin th tn sut tng i (t l %) ca cc theo ct Hin th tn sut tng i (t l %) ca cc theo hng

| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 11.20 3.12 | 14.32 region2 | 13.05 6.53 | 19.59 region3 | 10.00 1.80 | 11.80 region4 | 8.37 4.20 | 12.57 region5 | 6.13 0.00 | 6.13 region6 | 8.57 8.48 | 17.05 region7 | 13.84 4.70 | 18.54 -----------+----------------------+---------Total | 71.16 28.84 | 100.00 . tab farm urban98, column row Type of HH | 1:urban 98; 0:rural (1:farm; | 98 0:nonfarm) | Rural Urban | Total -----------+----------------------+---------non farm | 1021 1540 | 2561 | 39.87 60.13 | 100.00 | 23.92 89.02 | 42.69 -----------+----------------------+---------farm | 3248 190 | 3438 | 94.47 5.53 | 100.00 | 76.08 10.98 | 57.31 -----------+----------------------+---------Total | 4269 1730 | 5999 | 71.16 28.84 | 100.00 | 100.00 100.00 | 100.00

3.11. To bng thng k tng hp bng lnh tabulatesummarize C php: tabulate [quyn s] [iu kin] [phm vi] , summarize(tn bin 3) [means standard freq missing ] Lnh ny to bng mt hoc hai chiu nh ngha bi bin 1 hoc bin 2 v mi cho gi tr thng k trung bnh, lch chun v tn sut ca bin 3. V d:18

. tab

farm urban98, sum(poor) Means, Standard Deviations and Frequencies of poor

Type of HH | 1:urban 98; 0:rural (1:farm; | 98 0:nonfarm) | Rural Urban | Total -----------+----------------------+---------non farm | .2791381 .06168831 | .14837954 | .44879538 .24066673 | .35554523 | 1021 1540 | 2561 -----------+----------------------+---------farm | .42302956 .12105263 | .4063409 | .4941161 .32705022 | .49122109 | 3248 190 | 3438 -----------+----------------------+---------Total | .3886156 .06820809 | .29621604 | .48749275 .25217555 | .45662551 | 4269 1730 | 5999

Cc tu chn: means standard freq missing V d:. replace poor=poor*100 (1777 real changes made) . format poor %4.2f . tab reg7 urban98, sum(poor) means Means of poor | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 61.46 8.02 | 49.83 region2 | 32.57 5.87 | 23.66 region3 | 44.83 10.19 | 39.55 region4 | 37.25 11.51 | 28.65 region5 | 47.28 . | 47.28 region6 | 12.45 2.16 | 7.33 region7 | 35.78 10.28 | 29.32 -----------+----------------------+---------Total | 38.86 6.82 | 29.62

Hin th mi gi tr trung bnh Hin th mi gi tr lch chun Hin th mi gi tr tn sut Cho php cc quan st khng c gi tr c xp vo 1 loi

3.12. To bng thng k tng hp bng lnh tabstat C php:19

tabstat [quyn s] [iu kin] [phm vi] [, statistics(c php tk [...]) by(tn bin) missing format[(%fmt)]] Lnh ny tnh ton cc thng k ca cc bin c ch ra bi danh sch bin cho tng gi tr ca bin phn loi (categorical) c ch ra bi by(tn bin). V d:. tabstat rlfood rlhhex1, stats(mean median) by(reg7)

Summary statistics: mean, p50 by categories of: reg7 (Code by 7 regions) reg7 | rlfood rlhhex1 --------+-------------------region1 | 5595.556 9560.349 | 5350.916 8536.373 ----------------------------region2 | 6419.427 12951.14 | 5664.145 9997.146 ----------------------------region3 | 5692.201 10885.38 | 5369.411 9022.334 ----------------------------region4 | 6512.576 13525.41 | 5790.046 11077.51 ----------------------------region5 | 5894.983 11217.05 | 5380.505 9421.447 ----------------------------region6 | 9746.158 23515.01 | 8428.743 18514.39 ----------------------------region7 | 6556.616 13068.11 | 6066.128 11043.99 ----------------------------Total | 6787.898 14010.74 | 5951.567 10733.19 -----------------------------

Cc tu chn: statistics(statname [...]) by(tn bin) Missing format[(%fmt)] C php thng k mean count n Ch ra thng k cn tnh cho danh sch bin Ch ra bin phn loi (categorical) Gi tr thiu (mising) ca bin loi c xem nh 1 loi Ch ra nh dng ca s liu hin th ngha Trung bnh mean m s quan st Ging nh lnh count (m s quan st)20

Stata cho php cc loi thng k c ch ra bi statistics(c php thng k [...]) nh sau:

sum max min range sd sdmean skewness kurtosis median p1 p5 p10 p25 p50 p75 p90 p95 p99 iqr q V d:. tabstat

Tng cng Gi tr ln nht Gi tr nh nht Bin = Gi tr ln nht - Gi tr nh nht lch chun lch chun ca trung bnh = lch chun / {(S quan st)^0.5} lch ca phn phi nhn Trung v (Ging nh p50) 1% phn v 5% phn v 10% phn v 25% phn v 50% phn v (trung v) 75% phn v 90% phn v 95% phn v 99% phn v p75 - p25 tng ng vi "p25 p50 p75"

rlpcex1, stats(mean sd q) by(reg7) format(%5.1f)

Summary for variables: rlpcex1 by categories of: reg7 (Code by 7 regions) reg7 | mean sd p25 p50 p75 --------+-------------------------------------------------region1 | 2174.8 1265.1 1328.0 1792.1 2710.8 region2 | 3294.0 2511.9 1816.7 2532.5 3822.0 region3 | 2503.3 1918.0 1489.7 2001.2 2808.1 region4 | 2933.7 2260.5 1697.9 2362.2 3471.4 region5 | 2087.3 1285.4 1217.3 1850.8 2700.5 region6 | 5257.5 4005.7 2676.7 4154.1 6431.8 region7 | 2931.1 2137.2 1680.1 2321.9 3414.7 ----------------------------------------------------------Total | 3188.7 2692.6 1671.1 2397.0 3711.9 -----------------------------------------------------------

3.13. To bng thng k tng hp bng lnh table C php:21

table [bin ct [bin ct trn cng]] [iu kin] [phm vi] [quyn s] [, contents(ni dung) row col format(%fmt) missing] Lnh ny cho php tnh cc thng k ca cc bin c ch ra trong contents theo dng bng, trong cc hng c nh ngha bi bin dng, cn cc ct c nh ngha bi bin ct (v bin ct trn cng). Cc bin hng v ct ny l cc bin phn loi (categorical). V d:. table reg7 urban98 farm, contents(mean poor) ---------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and | 1:urban 98; 0:rural 98 Code by 7 | ---- non farm --------- farm -----regions | Rural Urban Rural Urban ----------+----------------------------------------region1 | 19.35484 6.015038 65.7377 12.96296 region2 | 26.66667 4.624278 33.96524 15.21739 region3 | 40.98361 10.11236 45.8159 10.52632 region4 | 21.6 11.63793 42.44032 10 region5 | 30.76923 49.24012 region6 | 15.04065 2.195609 10.07463 0 region7 | 38.62816 10.04184 34.35805 11.62791 ----------------------------------------------------

Cc tu chn: Contents(ni dung) row col format(%fmt) missing V d:. table reg7 urban98 farm, contents(mean poor) row col format(%4.2f) -----------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and 1:urban | 98; 0:rural 98 Code by 7 | ----- non farm ---------- farm -----regions | Rural Urban Total Rural Urban Total ----------+------------------------------------------region1 | 19.35 6.02 10.26 65.74 12.96 61.45 region2 | 26.67 4.62 11.29 33.97 15.22 32.70 region3 | 40.98 10.11 27.96 45.82 10.53 44.47 region4 | 21.60 11.64 15.13 42.44 10.00 40.81 region5 | 30.77 30.77 49.24 49.24 region6 | 15.04 2.20 6.43 10.07 0.00 9.78 region7 | 38.63 10.04 25.39 34.36 11.63 32.72 | Total | 27.91 6.17 14.84 42.30 12.11 40.63

Lit k danh sch cc bin v cc thng k. Cc k hiu thng k tng t nh lnh tabstat Hin th thng k tng ca cc dng Hin th thng k tng ca cc ct Ch ra nh dng ca s liu hin th Gi tr thiu (mising) ca bin loi c xem nh 1 loi

22

-----------------------------------------------------. table urban98 farm, contents(mean poor sd poor) row col format(%4.2f) ---------------------------------------1:urban | 98; | Type of HH (1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total ----------+----------------------------Rural | 27.91 42.30 38.86 | 44.88 49.41 48.75 | Urban | 6.17 12.11 6.82 | 24.07 32.71 25.22 | Total | 14.84 40.63 29.62 | 35.55 49.12 45.66 ---------------------------------------. table urban98 format(%4.2f) farm, contents(mean rlpcex1 mean rlhhex1) row col

---------------------------------------1:urban | 98; | Type of HH (1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total ----------+----------------------------Rural | 2835.83 2212.12 2361.29 | 13242.03 10120.89 10867.36 | Urban | 5476.86 3232.17 5230.33 | 22984.44 11903.19 21767.43 | Total | 4423.95 2268.49 3188.67 | 19100.41 10219.39 14010.74 ----------------------------------------

4. Bin tp v sa cha d liu (Data manipulation) 4.1. To bin mi To bin bng lnh generate C php: generate = biu thc [iu kin] [phm vi] Lnh ny cho php to bin mi c gi tr bng gi tr ca biu thc c ch ra. V d: . gen poor = 1 if rlpcex1 < 1790 (4222 missing values generated) . gen nonpoor=1 if rlpcex1 >= 1790 (1777 missing values generated)23

Lnh to bin gi tabulategenerate C php: tabulate , generate(bin mi) Lnh generate c th kt hp vi tab to cc bin gi . Bin mi to ra s c dng l bin mi 1, bin mi 2, bin mi 3, v..v. Bin ny chnh l cc bin gi c to ra trn c s ca bin phn loi. V d:

. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . tab1 region1 region2 -> tabulation of region1 reg7==regio | n1 | Freq. Percent Cum. ------------+----------------------------------0 | 5140 85.68 85.68 1 | 859 14.32 100.00 ------------+----------------------------------Total | 5999 100.00 -> tabulation of region2 reg7==regio | n2 | Freq. Percent Cum. ------------+----------------------------------0 | 4824 80.41 80.41 1 | 1175 19.59 100.00 ------------+----------------------------------Total | 5999 100.00

y bin reg7 c 7 gi tr t 1 n 7 tng ng vi 7 bin gi t region1 n region7 s c to ra. Bin region1 nhn gi tr bng 1 nu nh bin reg7 nhn gi tr 1, nu khng th bng 0. Tng t bin region7 nhn gi tr 1 nu nh bin reg7 bng 7. v d trn lnh tabulategenerate tng ng vi 7 lnh sau: gen region1=(reg7==1) gen region2=(reg7==2)24

gen region7=(reg7==7) To bin bng lnh egen C php: egen = fcn(tham s) [iu kin] [phm vi] [, by(bin)] Lnh ny cho php to bin mi theo gi tr ca hm s c ch ra bi fcn. Bin mi ny s nhn gi tr c nh cho mi quan st. Hm s y c th l: count(exp) mean(exp) median(exp) sd(exp) V d:. egen sumexp=sum(rlpcex1) . sum sumexp Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------sumexp | 5999 1.91e+07 0 1.91e+07 1.91e+07 . egen g=median( food+ nonfood1) . sum g Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------g | 5999 11063.6 0 11063.6 11063.6

m s quan st ca biu thc Cho gi tr trung bnh ca biu thc Cho gi tr trung v ca biu thc Cho gi tr lch chun ca biu thc

Cc hm s khc c th xem phn help egen.

Thay th gi tr ca bin C php: replace = biu thc [iu kin] [phm vi] Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc nh bi biu thc exp. V d: replace poor=poor*100 replace pcexp = hhexp/hhsize To bin phn loi bng lnh encode C php: encode [iu kin] [phm vi], generate(bin mi) Lnh ny cho php to bin phn loi mi (categorical) kiu s tng ng vi cc gi tr ca bin kiu ch ch ra bi tn bin (c xp theo vn ch ci). V d:25

. gen str15(mucsong) = "Kha" . drop mucsong

. gen mucsong="Rat ngheo" type mismatch r(109); . gen str15(mucsong)="Rat ngheo" . replace mucsong="Ngheo" if (1087 real changes made) rlpcex11290

. replace mucsong="Khong ngheo" if (4222 real changes made) . tab mucsong

rlpcex1>=1790

mucsong | Freq. Percent Cum. ----------------+----------------------------------Khong ngheo | 4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50 100.00 ----------------+----------------------------------Total | 5999 100.00 . sum mucsong Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------mucsong | 0 . encode mucsong, gen(ma_ms) . tab ma_ms ma_ms | Freq. Percent Cum. ------------+----------------------------------Khong ngheo | 4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50 100.00 ------------+----------------------------------Total | 5999 100.00 . sum ma_ms Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------ma_ms | 5999 1.411235 .6871957 1 3

To bin bng lnh xtile C php: xtile = biu thc [quyn s] [iu kin] [phm vi] [, nquantiles(#)]26

Lnh ny to bin phn nhm cho biu thc theo phn v. Trong nquantiles(#) ch ra s lng phn v. V d: To bin ng v phn theo chi tiu. xtile quinexp= rlpcex1, nq(5) . tab quinexp 5 quantiles | of rlpcex1 | Freq. Percent Cum. ------------+----------------------------------1 | 1200 20.00 20.00 2 | 1200 20.00 40.01 3 | 1200 20.00 60.01 4 | 1200 20.00 80.01 5 | 1199 19.99 100.00 ------------+----------------------------------Total | 5999 100.00 . tab quinexp, sum( rlpcex1) | Summary of comp.M&Reg price adj.pc 5 quantiles | tot exp of rlpcex1 | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 1184.3975 261.20537 1200 2 | 1803.6331 151.66604 1200 3 | 2408.4867 211.5407 1200 4 | 3390.1065 403.08913 1200 5 | 7160.021 3690.3672 1199 ------------+-----------------------------------Total | 3188.6671 2692.5673 5999

4.2. i tn bin C php: rename Lnh ny thc hin vic i tn c ca mt bin sang tn mi. V d: rename poor nguoingheo rename rpcexp1 chitieu 4.3. Lnh xo bin, xo quan st C php: drop drop drop [iu kin] keep Lnh ny xo bin c ch ra bi danh sch bin Lnh ny xo quan st tho mn iu kin biu thc Lnh ny xo quan st c ch ra bi phm vi (v c th phi tho mn iu kin biu thc) Lnh ny gi li cc bin c ch ra bi danh sch bin, cc bin khng c ch ra s b xo i27

keep keep [iu kin]

Lnh ny gi li cc quan st tho mn iu kin biu thc, cc quan st khc s b xo i Lnh ny gi li cc quan st c ch ra bi phm vi (v c th tho mn iu kin biu thc), cc quan st khc s b xo i. Xo 2 bin poor v urban98 Xo cc quan st c bin sex nhn gi tr bng 1 Xo quan st t 1 n 20 Ch gi li bin househol, cc bin khc b xo i Gi li quan st t u tin n 50, cc quan st khc b xo i

V d: drop poor urban98 drop if sex==1 drop in 1/20 keep househol keep in f/50 C php: recode gi tr c = gi tr mi [iu kin] [phm vi] Lnh ny i gi tr ca bin phn loi theo cc quy tc c ch ra sau . V d:. recode sex 0=1 (0 changes made) . recode sex . = 0 (0 changes made) . recode hhsize 1/5=1 6/10 = 2 * = 3 (5785 changes made) . tab hhsize Household | size | Freq. Percent Cum. ------------+----------------------------------1 | 4164 69.41 69.41 2 | 1786 29.77 99.18 3 | 49 0.82 100.00 ------------+----------------------------------Total | 5999 100.00 . tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 4269 71.16 71.16 Urban | 1730 28.84 100.00 ------------+----------------------------------Total | 5999 100.00

4.4. Lnh i gi tr ca bin phn loi

. recode urban98 0=1 1=0

28

(5999 changes made) . tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum. ------------+----------------------------------Rural | 1730 28.84 28.84 Urban | 4269 71.16 100.00 ------------+----------------------------------Total | 5999 100.00

4.5. Lnh gn nhn cho bin Gn nhn cho bin C php: label variable Nhn ca bin Lnh ny gn nhn l mt dy k t cho bin. V d:. gen ngheo=poor . des ngheo storage display value variable name type format label variable label --------------------------------------------------------------------------ngheo float %9.0g . tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . label var ngheo "Nguoi co thu nhap duoi chuan ngheo" . tab ngheo Nguoi co | thu nhap | duoi chuan | ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . des ngheo storage display value variable name type format label variable label ---------------------------------------------------------------------------ngheo float %9.0g Nguoi co thu nhap duoi chuan ngheo

29

Gn gi tr cho bin phn loi label define # "nhn" [# "nhn" ...] [, add modify] label dir label list label drop {tn b nhn [tn b nhn ...] | _all} label values [tn b nhn] Lnh label define gn nhn cho mt b gi tr s. Tn ca b nhn c ch ra sau t kho define, # l gi tr s, nhn l chui k t tng ng vi gi tr s y. C hai tu chn y: tu chn add thm gi tr v nhn tng ng vo 1 b nhn c sn. Tu chn modify cho php sa cha gi tr v nhn ca 1 b nhn c sn. Lnh label dir hin th nhng b nhn c sn, cn lnh label list hin th gi tr ca b nhn c ch ra. Lnh label drop xo cc b nhn c sn. V d: To nhn c tn l nngheo vi gi tr 1 c ngha l ngi ngho, cn 0 c ngha l ngi khng ngho.. label define nngheo 0 "Ngheo" 1 "Khong ngheo" . label dir nngheo region loaiho diploma urban agegroup . label list nngheo nngheo: 0 Khong ngheo 1 Ngheo . label drop _all . label dir

Lnh label values s gn cc nhn ca 1 b nhn cho cc gi tr s ca 1 bin phn loi. V d:. tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------0 | 4222 70.38 70.38 1 | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . list ngheo in 1/5

30

1. 2. 3. 4. 5.

ngheo 1 0 1 1 0

. label values ngheo nngheo . tab ngheo ngheo | Freq. Percent Cum. ------------+----------------------------------Ngheo | 4222 70.38 70.38 Khong ngheo | 1777 29.62 100.00 ------------+----------------------------------Total | 5999 100.00 . list ngheo in 1/5 ngheo 1. Khong ngheo 2. Ngheo 3. Khong ngheo 4. Khong ngheo 5. Ngheo

4.6. Sp xp s liu C php: sort [phm vi] gsort [+|-]tn bin [[+|-]tn bin [...]] Lnh sort sp xp quan st theo th t tng dn ca gi tr ca cc bin c ch ra trong danh sch bin. Lnh gsort cho php sp xp cc quan st theo th t tng dn ca ca cc bin (danh sch bin), nu du + c ch ra (y cng l gi tr ngm nh), hoc theo th t gim dn, nu du - c ch ra. V d: sort reg7 hhsize Lnh ny sp xp cc quan st theo th t tng dn ca bin vng reg7, trong mi vng cc quan st li c sp xp theo th t tng dn ca bin quy m h hhsize. Lnh ny sp xp cc quan st theo th t tng dn ca bin vng reg7, nhng trong mi vng cc quan st li c sp xp theo th t gim dn ca bin quy m h hhsize.

gsort reg7 hhsize

4.7. Trn s liu Lnh thu gn s liu - collapse C php:31

collapse [quyn s] [iu kin] [phm vi] [, by(danh sch bin)] trong : Biu thc thng k l danh sch cc thng k v cc bin tng ng. Cc thng k c k hiu nh mc 3.12 ca chng ny. Lnh collapse s to ra mt tp s liu mi bao gm cc bin c ch ra bi danh sch bin, vi cc gi tr c tnh theo thng k tng ng. Cc quan st ca tp s liu c s c nhm li theo cc gi tr cng loi ca bin c ch ra bi by(danh sch bin). V d: Chng ta c file s liu v thu nhp v chi tiu ca cc h thnh vin trong gia nh: ma_tv ma_ho thunhap Chitieu 1 101 200 500 2 101 1200 400 3 101 0 200 4 101 0 200 1 102 3200 500 2 102 1200 320 3 102 200 200 1 103 300 500 2 103 2100 250 3 103 0 300 4 103 0 300 1 104 4300 800 2 104 3500 500 3 104 300 500 4 104 0 300 5 104 0 200 6 104 0 200 Chng ta s dng lnh collapse to file v thu nhp v chi tiu bnh qun ca cc h, v to thm 1 bin v qui m h. . gen quimo=1 . collapse (mean) thunhap (mean) chitieu (sum) quimo, by(ma_ho) Tp s liu mi c dng: ma_ho thunhap chitieu 101 350 325 102 1533.33 340 103 600 337.5 104 1350 416.667 Kt hp s liu - lnh merge C php: merge [danh sch bin] using [, update replace] Lnh merge s ni cc quan st ca tp s liu ang m trong Stata (gi l tp ch (master dataset)) vi cc quan st tng ng ca tp s liu khc c ch ra sau t kho using (gi l tp s dng (using dataset)) thnh 1 tp mi. Cc bin ch ra trong danh sch bin c gi l32

quimo 4 3 4 6

bin xc nh (identifying variables), v phi c sp xt bng lnh sort (hoc gsort) trc khi thc hin lnh merge. V d: Chng ta c 2 tp s liu nh sau: thunhap.dta ma_ho 101 102 103 104 dialy.dta thunhap chitieu 350 325 1533.33 340 600 337.5 1350 416.667 quimo 4 3 4 6

ma_ho thanhthi vung 204 0 1 102 1 4 103 0 3 104 0 6 Lnh merge s c thc hin nh sau: . use "C:\dialy.dta", clear . sort ma_ho . save "C:\dialy.dta" file C:\dialy.dta saved . use "C:\thunhap.dta", clear . sort ma_ho . merge ma_ho using "C:\dialy.dta" ma_ho was byte now int . edit Tp kt qu c dng nh sau: ma_ho thunhap chitieu quimo thanhthi vung _merge 101 350 325 4 . . 1 102 1533.33 340 3 1 4 3 103 600 337.5 4 0 3 3 104 1350 416.667 6 0 6 3 204 . . . 0 1 2 Trong tp kt qu c thm 1 bin tn l _merge, bin ny nhn cc gi tr nh sau: _merge==1 _merge==2 _merge==3 Cc tu chn: Trong trng hp hai tp s liu c cc bin trng nhau, cc tu chn sau y cho php x l s liu theo cc cch khc nhau: Nu nh quan st ch c to t tp ch Nu nh quan st ch c to t s dng Nu nh quan st c to t c tp ch v tp s dng

33

update Nu s liu ca bin trng nhau ca tp ch c gi tr thiu th gi tr thiu ny nhn gi tr ca bin trng nhau ca tp s dng. replace Gi tr ca bin trng nhau ca tp ch s nhn gi tr ca bin trng nhau ca tp s dng. Nu khng tu chn no c ch ra th theo ngm nh, gi tr ca bin ca tp ch s khng thay i. Ni s liu lnh append C php: append using Lnh ny cho php ni tp c ch ra bi using vo vi tp ang c m theo cc bin c cng tn v nh dng. S quan st ca tp mi bng tng s s quan st ca 2 tp. V d: c tp thunhap2.dta nh sau ma_ho thunhap chitieu gioitinh 105 1350 425 1 106 1500 370 0 107 800 556 0 108 1500 417 0 109 2500 540 1 Hai tp ny s c ni vi nhau bng lnh append nh sau: . use "C:\thunhap.dta", clear . append using "C:\thunhap2.dta" . edit Tp kt qu c dng: ma_ho thunhap chitieu quimo gioitinh 101 350 325 4 102 1533.33 340 3 103 600 337.5 4 104 1350 416.667 6 105 1350 425 1 106 1500 370 0 107 800 556 0 108 1500 417 0 109 2500 540 1 Ch : Xem thm lnh expand dung to ra cc quan st ging nhau. 4.8. Chuyn dng s liu C php: reshape wide , i(danh sch bin) [ j(tn bin [values]) ... ] reshape long , i(danh sch bin) [ j(tn bin [values]) ... ] reshape wide reshape long Lnh ny cho php chuyn s liu t dng ngang sang s liu dng dc (tu chn long), v t dng dc sang dng ngang (tu chn wide). i(danh sch bin) ch ra bin xc nh (indentifying34

variables) dng phn bit cc quan st vi nhau trong s liu dng ngang (gi l quan st cp 1). j(tn bin) ch ra bin dng phn bit gia cc quan st cp 2 s liu dng dc. V d 1: Chng ta c th s liu dng bng ngang nh mt ma trn nh sau: -i-------------------- xj ------------------maho quimo thunhap95 thunhap96 thunhap97 101 5 4500 4400 5400 102 4 3400 3300 3700 103 6 5000 5400 5500 s liu ny s c chuyn sang dng bng dc nh sau: -i-jmaho quimo nam 101 5 95 101 5 96 101 5 97 102 4 95 102 4 96 102 4 97 103 6 95 103 6 96 103 6 97 V lnh reshape s c vit nh sau:. reshape long thunhap, i(maho) j(nam) (note: j = 95 96 97) Data wide -> long --------------------------------------------------------------------Number of obs. 3 -> 9 Number of variables 5 -> 4 j variable (3 values) -> nam xij variables: thunhap95 thunhap96 thunhap97 -> thunhap --------------------------------------------------------------------* Va chuyen nguoc lai tu dang doc sang dang ngang nhu sau . reshape wide thunhap, i(maho) j(nam) (note: j = 95 96 97) Data long -> wide -----------------------------------------------------------------------Number of obs. 9 -> 3 Number of variables 4 -> 5 j variable (3 values) nam -> (dropped) xij variables: thunhap -> thunhap95 thunhap96 thunhap97 ----------------------------------------------------------------------

- xji thunhap 4500 4400 5400 3400 3300 3700 5000 5400 5500

V d 2:35

Chng ta c s liu dng bng sau y: maho sotien1 nguon1 sotien2 101 1200 Ngan hang A 2000 102 1300 Ngan hang B . 103 2500 Ngan hang A 1000 104 3000 Ngan hang A 2000 Bng ny c chuyn sang bng dng dc nh sau:. reshape long sotien nguon, i(maho) j(lanvay) (note: j = 1 2) Data wide -> long --------------------------------------------------------------------Number of obs. 4 -> 8 Number of variables 5 -> 4 j variable (2 values) -> lanvay xij variables: sotien1 sotien2 -> sotien nguon1 nguon2 -> nguon ---------------------------------------------------------------------

nguon2 Ngan hang A . Ngan hang C Ngan hang B

Bng dc c dng nh sau: maho 101 101 102 102 103 103 104 104 lanvay 1 2 1 2 1 2 1 2 sotien 1200 2000 1300 2500 1000 3000 2000 nguon Ngan hang A Ngan hang A Ngan hang B Ngan hang A Ngan hang C Ngan hang A Ngan hang B

5. Quyn s trong VHLSS (Weight) 5.1. Quyn s trong iu tra chn mu Trong iu tra chn mu, cc quan st c la chn mt cch ngu nhin nhng thng thng cc quan st thng c xc sut la chn khc nhau. Quyn s bng gi tr nghch o ca xc sut c chn vo mu. Nu nh quan st i c quyn s l wi th c th ni quan st i trong mu i din cho wi phn t trong tng th. Cc c lng suy din v tng th cn phi tnh n quyn s chn mu, nu khng th kt qu s b sai lch. V d: Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh vi dn s tng ng l 4.5 triu v 500 nghn ngi. Chng ta mun chn mt mu ngu nhin vi c mu l 500 quan st nghin cu v thu nhp ca ng bng Sng Hng cng nh 2 tnh ny. Nu nh theo t l v dn s gia 2 tnh th chng ta s thu c mu gm 450 h ti H Ni v 50 h ti Nam nh. Tuy nhin mu c chn mt cch ngu nhin trn c vng nn s c kh nng l chng ta thu c mt mu m khng c quan st no ca tnh Nam nh, hoc c vi s lng rt36

nh. cho mu mang tnh i din cho cc tnh th nn chn 400 quan st ti H Ni v 100 quan st ti Nam nh. Nu thu nhp bnh qun ca H Ni l 900 nghn/ thng, v ca Nam nh l 300 nghn/thng th thu nhp bnh qun ca c vng ng bng Sng Hng khng th tnh l (900 + 300)/2, v cc quan st trong mu khng c chn t l vi cc tnh. Mi quan st ti H ni i din cho 11250 h trong vng (4500000/400). y chnh l quyn s ca quan st, bng gi tr nghch o ca xc sut c chn vo mu. Cn mi quan st ti Nam nh i din cho 50000 quan st ca vng (500000/100). Thu nhp ca vng ng bng Sng Hng s c tnh nh sau: Thu nhap = 900 400 11250 + 300 100 50000 = 840 400 11250 + 100 50000

Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt, chnh l s h ca Vit Nam m mi h i din. Quyn s th hai l quyn s ca thnh vin h, hhsizewt l s ngi Vit Nam m mi thnh vin ca h i din. Quyn s ca thnh vin h bng quyn s h nhn vi quy m h. V d: Quyn s trong VLSS 1998. tab reg7, sum(wt) Code by 7 | Summary of sample quyn s regions | Mean Std. Dev. Freq. ------------+-----------------------------------region1 | 3218.4296 850.74246 859 region2 | 3133.7277 849.12325 1175 region3 | 3185.1794 801.74266 708 region4 | 2199.37 492.37202 754 region5 | 1336.3098 269.14747 368 region6 | 1963.8964 528.69328 1023 region7 | 2938.2122 547.72125 1112 ------------+-----------------------------------Total | 2688.5003 900.01379 5999 . tab reg7, sum(hhsizewt) Code by 7 | Summary of =hhsize*wt regions | Mean Std. Dev. Freq. ------------+-----------------------------------region1 | 15790.857 7555.7552 859 region2 | 12656.003 5970.9089 1175 region3 | 14814.504 7236.7592 708 region4 | 10794.537 5235.562 754 region5 | 7564.731 3185.9336 368 region6 | 9447.7077 4535.0816 1023 region7 | 14653.702 6639.8297 1112 ------------+-----------------------------------Total | 12636.546 6597.6574 5999 . di 2688.5003*5999 16128313 . di 12636.546*5999 75806639

37

5.2. Cc la chn v quyn s Stata cho php s dng 4 loi loi quyn s sau y: fweights: pweights: quyn s tn sut (frequency weights), Stata s hiu quyn s y c ngha l s ln m mi quan st mi quan st c lp li trong tnh ton. quyn s chn mu (sampling weights), Stata s hiu quyn s l gi tr nghch o ca xc sut c chn vo mu, hay s phn t trong tng th m mi quan st trong mu i din. quyn s phn tch (analytical weights), Stata s hiu quyn s t l nghch vi phng sai ca quan st. quyn s quan trng (importance weights), y l quyn s ch mc quan trng ca cc quan st.

aweights iweights

i vi iu tra mc sng cc lnh s dng quyn s pweights v fweights. V d:. sum poor Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 5999 29.6216 45.66255 0 100 . sum poor [fw=hhsize] Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------poor | 28509 34.17517 47.43051 0 100 . . .

tab

reg7 urban98

| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 672 187 | 859 region2 | 783 392 | 1175 region3 | 600 108 | 708 region4 | 502 252 | 754 region5 | 368 0 | 368 region6 | 514 509 | 1023 region7 | 830 282 | 1112 -----------+----------------------+---------Total | 4269 1730 | 5999

. .

tab

reg7 urban98 [fw= hhsizewt] 1:urban 98; 0:rural 98

| Code by 7 |

38

regions | Rural Urban | Total -----------+----------------------+---------region1 | 11993763 1570583 | 13564346 region2 | 11057932 3812871 | 14870803 region3 | 9582621 906048 | 10488669 region4 | 5618709 2520372 | 8139081 region5 | 2783821 0 | 2783821 region6 | 4545303 5119702 | 9665005 region7 | 13220727 3074190 | 16294917 -----------+----------------------+---------Total | 58802876 17003766 | 75806642

. tab reg7 urban98 , sum(hhsize) means Means of Household size | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 5.1205357 3.7326203 | 4.8183935 region2 | 4.045977 4.0459184 | 4.0459574 region3 | 4.6666667 4.6759259 | 4.6680791 region4 | 4.8027888 5.1190476 | 4.9084881 region5 | 5.7065217 . | 5.7065217 region6 | 5.0719844 4.7131631 | 4.8934506 region7 | 5.1373494 4.3971631 | 4.9496403 -----------+----------------------+---------Total | 4.8702272 4.4612717 | 4.752292 . tab reg7 urban98 [fw=wt], sum(hhsize) means Means and Number of Observations of Household size | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total -----------+----------------------+---------region1 | 5.1328749 3.6698008 | 4.9063857 | 2336656 427975 | 2764631 -----------+----------------------+---------region2 | 4.0564115 3.987975 | 4.0386415 | 2726038 956092 | 3682130 -----------+----------------------+---------region3 | 4.6508908 4.6530097 | 4.6510738 | 2060384 194723 | 2255107 -----------+----------------------+---------region4 | 4.8136253 5.132367 | 4.9080132 | 1167251 491074 | 1658325 -----------+----------------------+---------region5 | 5.6609112 . | 5.6609112 | 491762 0 | 491762 -----------+----------------------+---------region6 | 5.0486426 4.6174858 | 4.8106956 | 900302 1108764 | 2009066

39

-----------+----------------------+---------region7 | 5.1494132 4.3925283 | 4.9872852 | 2567424 699868 | 3267292 -----------+----------------------+---------Total | 4.8003065 4.3841133 | 4.7002214 | 12249817 3878496 | 16128313 . . table reg7 urban98 , c(mean poor) col row format(%4.1f) ------------------------------| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban Total ----------+-------------------region1 | 61.5 8.0 49.8 region2 | 32.6 5.9 23.7 region3 | 44.8 10.2 39.5 region4 | 37.3 11.5 28.6 region5 | 47.3 47.3 region6 | 12.5 2.2 7.3 region7 | 35.8 10.3 29.3 | Total | 38.9 6.8 29.6 ------------------------------. table reg7 urban98 [pw=hhsizewt], c(mean poor) col row format(%4.1f) ------------------------------| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban Total ----------+-------------------region1 | 65.2 8.3 58.6 region2 | 36.1 7.0 28.7 region3 | 51.3 14.3 48.1 region4 | 43.6 16.6 35.2 region5 | 52.4 52.4 region6 | 13.0 2.9 7.6 region7 | 42.0 15.3 36.9 | Total | 45.5 9.2 37.4 -------------------------------

Chng III: Kim nh gi thit v phn tch hi quy1. c lng v kim nh gi thit (Estimation and hypothesis testing) 1.1. c lng gi tr trung bnh bng khong tin cy C php: ci [danh sch bin] [quyn s] [iu kin] [phm vi] [, level(#) binomial poisson exposure(tn bin) total]

40

Lnh ny tnh sai s chun v khong tin cy cho gi tr trung bnh ca mu theo quy lut chun, nh thc v Poatxng. Cc tu chn: level(#) binomial poisson exposure(tn bin) ch nh mc tin cy cho c lng khong tin cy. # nhn gi tr t 10 n 99, gi tr ngm nh l 95. p dng cho quy lut nh thc p dng cho quy lut Poatxng p dng cho quy lut Poatxng, tn bin ch ra bin thi lng (thng thng l thi gian hoc din tch) m trong xy ra cc s kin c ch ra bi danh sch bin dng khi ma by prefix c s dung, yu cu c lng khong tin cy cho ton b nhm.

total V d:. ci poor

Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 5999 29.6216 .5895501 28.46587 30.77733 . . . sort reg7 . by reg7: ci poor, total _______________________________________________________________________________ -> reg7 = region1 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 859 49.82538 1.706961 46.47507 53.17569 _______________________________________________________________________________ -> reg7 = region2 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1175 23.65957 1.240357 21.22601 26.09314 _______________________________________________________________________________ -> reg7 = region3 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 708 39.54802 1.838899 35.93767 43.15838 _______________________________________________________________________________ -> reg7 = region4

41

Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 754 28.64721 1.64759 25.4128 31.88163 _______________________________________________________________________________ -> reg7 = region5 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 368 47.28261 2.606121 42.1578 52.40741 _______________________________________________________________________________ -> reg7 = region6 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1023 7.331378 .8153306 5.731465 8.931292 _______________________________________________________________________________ -> reg7 = region7 Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 1112 29.31655 1.365709 26.63689 31.99621 _______________________________________________________________________________ -> Total Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------poor | 5999 29.6216 .5895501 28.46587 30.77733

Ch : Cc lnh c lng c th c s dng khi bit cc tham s v mu. y c th c gi l cc lnh s dng tham s trc tip (Commands using immediate arguments). Cc lnh ny rt hu dng khi chng ta khng c s liu gc v bin. cii [, level(#) ] cii [, level(#) ] (phn phi chun) (phn phi nh thc)

#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn gi tr tng ng vi php th thnh cng (thng thng nhn gi tr bng 1) cii poisson [ level(#) ] V d:. cii 5999 1777, level (90) -- Binomial Exact -Variable | Obs Mean Std. Err. [90% Conf. Interval] -------------+------------------------------------------------------------| 5999 .296216 .005895 .2865107 .3060676

(phn phi Poatxng)

42

. cii 12 27, poisson -- Poisson Exact -Variable | Exposure Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------| 12 2.25 .4330127 1.483144 3.273587

1.2.

Kim nh gi thuyt thng k

1.2.1. Kim nh gi tr trung bnh ca mu Phn phi khng mt C php: prtest = # [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi thuyt v t l gi tr ca bin phn phi theo quy lut khng mt (Ho: p = p0). V d:. prtest poor=0.44 if reg7==1 One-sample test of proportion poor: Number of obs = 859

---------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+-----------------------------------------------------------------poor | .4982538 .0170597 29.2065 0.0000 .4648174 .5316901 ---------------------------------------------------------------------------Ho: proportion(poor) = .44 Ha: poor < .44 z = 3.440 P < z = 0.9997 Ha: poor ~= .44 z = 3.440 P > |z| = 0.0006 Ha: poor > .44 z = 3.440 P > z = 0.0003

prtest = [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai gi tr bin c ch ra bi tn bin (Ho: pX = pY). V d: Kim nh xem t l ngho i gia vng 2 v vng 4 c khac nhau khng:. gen poor2=poor if reg7==2 (4824 missing values generated) . gen poor4=poor if reg7==4 (5245 missing values generated) . prtest poor2 = poor4 Two-sample test of proportion poor2: Number of obs = poor4: Number of obs = 1175 754

-----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+--------------------------------------------------------------------

43

poor2 | .2365957 .0123983 19.0829 0.0000 .2122955 .2608959 poor4 | .2864721 .016465 17.3989 0.0000 .2542014 .3187429 ---------+-------------------------------------------------------------------diff | -.0498764 .020611 -.0902732 -.0094796 | under Ho: .0203666 -2.44893 0.0143 -----------------------------------------------------------------------------Ho: proportion(poor2) - proportion(poor4) = diff = 0 Ha: diff < 0 z = -2.449 P < z = 0.0072 Ha: diff ~= 0 z = -2.449 P > |z| = 0.0143 Ha: diff > 0 z = -2.449 P > z = 0.9928

prtest [iu kin] [phm vi], by(bin phn nhm) [level(#)] Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai nhm c ch ra bi bin phn nhm (Ho: pX1 = pX2). V d:. prtest poor, by(sex) Two-sample test of proportion 1: Number of obs = 2: Number of obs = 4375 1624

-----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | .3248 .00708 45.8755 0.0000 .3109234 .3386766 2 | .2192118 .0102661 21.353 0.0000 .1990906 .239333 ---------+-------------------------------------------------------------------diff | .1055882 .0124708 .0811459 .1300304 | under Ho: .0132673 7.95855 0.0000 -----------------------------------------------------------------------------Ho: proportion(1) - proportion(2) = diff = 0 Ha: diff < 0 z = 7.959 P < z = 1.0000 Ha: diff ~= 0 z = 7.959 P > |z| = 0.0000 Ha: diff > 0 z = 7.959 P > z = 0.0000

Phn phi nh thc C php: bitest = #p [quyn s] [iu kin] [phm vi] Lnh ny kim nh gi thuyt v tham s p trong quy lut nh thc (xc sut thnh cng ca php th) ca bin c ch ra bi tn bin. (Ho: p = p0) V d:. bitest poor=0.44 if reg7==1 Variable | N Observed k Expected k Assumed p Observed p -------------+-----------------------------------------------------------poor | 859 428 377.96 0.44000 0.49825 Pr(k >= 428) = 0.000344 (one-sided test)

44

Pr(k = 428) = 0.000344 Pr(k |t| = 0.7444 Ha: mean > 3200 t = -0.3260 P > t = 0.6278

ttest = [iu kin] [phm vi] [, unpaired unequal level(#) ] Lnh ny thc hin kim nh gi thuyt rng hai bin c gi tr trung bnh bng nhau. (Ho: X = Y). Cc tu chn: unpaired unequal V d:. ttest poor2=poor4, unpaired unequal Two-sample t test with unequal variances

S liu ca hai bin khng cng cp Phung sai ca hai bin khng bng nhau

45

-----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------poor2 | 1175 .2365957 .0124036 .425173 .2122601 .2609314 poor4 | 754 .2864721 .0164759 .4524128 .254128 .3188163 ---------+-------------------------------------------------------------------combined | 1929 .2560912 .0099404 .436586 .2365962 .2755863 ---------+-------------------------------------------------------------------diff | -.0498764 .0206229 -.0903285 -.0094243 -----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 1532.64 Ho: mean(poor2) - mean(poor4) = diff = 0 Ha: diff < 0 t = -2.4185 P < t = 0.0079 Ha: diff ~= 0 t = -2.4185 P > |t| = 0.0157 Ha: diff > 0 t = -2.4185 P > t = 0.9921

ttest [iu kin] [phm vi], by(bin phn nhm) [ unequal level(#) ] Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi tr trung bnh ca hai nhm c ch ra bi bin phn nhm (Ho: X1 = X2). V d:. ttest rlpcex1, by(sex)

Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | 4375 2980.906 36.74795 2430.648 2908.862 3052.951 2 | 1624 3748.368 80.18189 3231.241 3591.097 3905.638 ---------+-------------------------------------------------------------------combined | 5999 3188.667 34.76379 2692.567 3120.518 3256.817 ---------+-------------------------------------------------------------------diff | -767.4613 77.6155 -919.6156 -615.3071 -----------------------------------------------------------------------------Degrees of freedom: 5997 Ho: mean(1) - mean(2) = diff = 0 Ha: diff < 0 t = -9.8880 P < t = 0.0000 Ha: diff ~= 0 t = -9.8880 P > |t| = 0.0000 Ha: diff > 0 t = -9.8880 P > t = 1.0000

1.2.2. Kim nh gi tr lch chun C php: sdtest = # [iu kin] [phm vi] [, level(#) ] sdtest = [iu kin] [phm vi] [, level(#) ] sdtest [iu kin] [phm vi] , by(bin phn nhm) [ level(#) ] Lnh ny kim dnh tham s lch chun ca bin ngu nhin tun theo quy lut chun c ch ra bi tn bin. C php ca ln ny tng t vi c php ca lnh ttest46

V d:. sum rlpcex1

Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------rlpcex1 | 5999 3188.667 2692.567 357.318 45801.71 . sdtest rlpcex1=2700 One-sample test of variance -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------rlpcex1 | 5999 3188.667 34.76379 2692.567 3120.518 3256.817 -----------------------------------------------------------------------------Ho: sd(rlpcex1) = 2700 chi2(5998) = 5965.022 Ha: sd(rlpcex1) < 2700 P < chi2 = 0.3838 Ha: sd(rlpcex1) ~= 2700 2*(P < chi2) = 0.7676 Ha: sd(rlpcex1) > 2700 P > chi2 = 0.6162

2. Phn tch tng quan v hi quy (Correlation and regression) 2.1. Phn tch tng quan C php: correlate [danh sch bin] [quyn s] [iu kin] [phm vi] [, means covariance _coef wrap] Lnh ny tnh ma trn h s tong quan (correlation coefficient), hoc hip phng sai (covariance) cho cc bin c lit k trong danh sch bin. S quan st c dng l s quan st ca bin c t quan st nht. Cc tu chn: means covariance _coef wrap V d:. corr hhsize poor (obs=5999) rlpcex1 sex

Hin th cc thng k khc nh gi tr trung bnh, lch chun, gi tr ln nht, nh nht a ra ma trn hip phng sai thay v h s tng quan Tnh ma trn tung quan ca cc h s ca c lng gn nht Hin th cc dng ca ma trn lin nhau nu c qua nhiu cc bin c lit k

| hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 1.0000 poor | 0.2425 1.0000 rlpcex1 | -0.2172 -0.4452 1.0000

47

sex |

-0.2570

-0.1028

0.1267

1.0000

. corr hhsize poor (obs=5999)

rlpcex1 sex, means cov

Variable | Mean Std. Dev. Min Max -------------+---------------------------------------------------hhsize | 4.752292 1.954292 1 19 poor | .296216 .4566255 0 1 rlpcex1 | 3188.667 2692.567 357.318 45801.71 sex | 1.270712 .4443645 1 2

| hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 3.81926 poor | .216435 .208507 rlpcex1 | -1142.93 -547.335 7.2e+06 sex | -.223195 -.020849 151.543 .19746

pwcorr

[danh sch bin] [quyn s] [iu kin] [phm vi] [, obs sig print(#) star(#)]

Lnh ny tnh h s tng quan cho tng cp bin c ch ra bi danh sch bin. Cc tu chn: obs sig print(#) star(#) V d:. pwcorr hhsize poor rlpcex1 sex, obs sig star(5) | hhsize poor rlpcex1 sex -------------+-----------------------------------hhsize | 1.0000 | | 5999 | poor | 0.2425* 1.0000 | 0.0000 | 5999 5999 | rlpcex1 | -0.2172* -0.4452* 1.0000 | 0.0000 0.0000 | 5999 5999 5999 | sex | -0.2570* -0.1028* 0.1267* 1.0000 | 0.0000 0.0000 0.0000

Hin th s quan st dng tnh h s tng quan Hin th mc ngha ca cc h s tng quan Ch ra mc ngha theo ch cc h s tng quan c mc ngha nh hn mc ny mi c hin th nh du sao i vi cc h s tng quan c mc ngh nh hn mc c ch ra bi star

48

| |

5999

5999

5999

5999

pcorr [quyn s] [iu kin] [phm vi] Lnh ny tnh h s tng quan ca bin c ch ra bi tn bin vi cc bin c trong danh sch bin V d:. pwcorr poor hhsize rlpcex1 sex

| poor hhsize rlpcex1 sex -------------+-----------------------------------poor | 1.0000 hhsize | 0.2425 1.0000 rlpcex1 | -0.4452 -0.2172 1.0000 sex | -0.1028 -0.2570 0.1267 1.0000

2.2. Phn tch hi quy Phng php bnh phng nh nht (Ordinary-Least Square) C php: regress [danh sch bin] [quyn s] [iu kin] [phm vi] [, option] Lnh ny c lng cc h s ca hm bin ph thuc (dependent variable) theo cc bin c lp (danh sch bin) theo phng php bnh phng nh nht. V d:. reg rlpcex1 reg7 sex hhsize Number of obs F( 3, 5995) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5999 194.88 0.0000 0.0889 0.0884 2570.8

Source | SS df MS -------------+-----------------------------Model | 3.8639e+09 3 1.2880e+09 Residual | 3.9621e+10 5995 6609032.15 -------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

-----------------------------------------------------------------------------rlpcex1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reg7 | 240.9633 15.5905 15.46 0.000 210.4003 271.5263 sex | 403.2984 77.38324 5.21 0.000 251.5994 554.9974 hhsize | -305.6382 17.70692 -17.26 0.000 -340.3501 -270.9263 _cons | 3160.201 155.6576 20.30 0.000 2855.056 3465.346 ------------------------------------------------------------------------------

Cc tu chn: level(#) noconstant noheader Ch ra mc tin cy cho c lng khong tin cy ca h s Khng c h s (intercept) trong hm hi quy Ch hin th kt qu phn tch v cc h s49

beta

Hin th h s c chun ho, dng so snh mc nh hng ca cc h s vi nhau

Phng php kh nng ln nht (Maximum-Likelihood) C php: probit [danh sch bin] [quyn s] [iu kin] [phm vi] [, tu chn] Lnh ny thc hin hi quy bin ph thuc theo cc bin c ch ra trong danh sch bin theo phng php kh nng ln nht. Bin ph thuc thng l bin gi vi hai gi tr 0 v 1. V d:. probit Iteration Iteration Iteration Iteration poor 0: 1: 2: 3: reg7 sex log log log log hhsize = = = = -3645.1363 -3367.2185 -3364.8032 -3364.8025 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 5999 560.67 0.0000 0.0769

likelihood likelihood likelihood likelihood

Probit estimates

Log likelihood = -3364.8025

-----------------------------------------------------------------------------poor | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------reg7 | -.116342 .0084551 -13.76 0.000 -.1329136 -.0997703 sex | -.1284525 .0422247 -3.04 0.002 -.2112113 -.0456937 hhsize | .1808115 .0095806 18.87 0.000 .1620338 .1995892 _cons | -.8088731 .0824798 -9.81 0.000 -.9705306 -.6472157 ------------------------------------------------------------------------------

c lng gi tr bin ph thuc v phn d C php: predict [iu kin] [phm vi] [, xb stdp resid] Lnh ny c thc hin sau lnh regress (hoc probit) to ra 1 bin mi c gi tr c tnh tu theo tu chn c ch ra. Cc tu chn: xb cho php c lng gi tr ca bin ph thuc thu c t hm hi quy: Yi = 0 + 1 X i stdp c lng sai s chun ca gia tr c lng:2 SE i = Var ( 0 ) + X i Var (1 ) 2X i Cov( 0 , 1 )

redid

c lng gi tr phn d: e i = Yi Yi

V d: predict exphat, xb50

To ra bin mi exphat c gi tr c lng ca bin ph thuc (fitted value) theo h s thu c t hm hi quy. predict expres, resid To ra bin expres c gi tr ca phn d. Kim nh v h s ca hm hi quy C php: test [gi tr biu thc] test [danh sch bin] testparm [, equal ] Lnh test kim nh cc gi thit v h s ca hm hi quy va mi c c lng V d: test urban98 =2000 Kim nh gi thit h s ca bin urban98 = 0 test region1 = region2 Kim nh gi thit h s ca bin region1 bng h s ca bin region2 test region1 = (region2+region3)/2 Kim nh gi thit v quan h gia cc h s ca bin region1, region2, va region3 test region1 region2 region3 Kim nh gi thit h s ca bin region1, region2, va region3 u bng 0 testparm region* Kim nh gi thit v ca h s ca bin region1 n region7 u bng 0

. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . reg rlpcex1 urban98 region* sex educyr98 hhsize Number of obs F( 10, 5988) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5999 382.87 0.0000 0.3900 0.3890 2104.7

Source | SS df MS -------------+-----------------------------Model | 1.6960e+10 10 1.6960e+09 Residual | 2.6525e+10 5988 4429712.49 -------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

------------------------------------------------------------------------------

51

rlpcex1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------urban98 | 1995.163 66.46943 30.02 0.000 1864.859 2125.467 region1 | -923.7066 132.8334 -6.95 0.000 -1184.108 -663.3052 region2 | -362.6047 130.2254 -2.78 0.005 -617.8934 -107.316 region3 | -558.0354 137.1551 -4.07 0.000 -826.9089 -289.1619 region4 | -100.7586 135.8372 -0.74 0.458 -367.0486 165.5313 region5 | (dropped) region6 | 1742.688 131.9928 13.20 0.000 1483.934 2001.441 region7 | 151.9854 128.0272 1.19 0.235 -98.99396 402.9648 sex | 270.9142 66.61031 4.07 0.000 140.3339 401.4944 educyr98 | 153.3281 6.836934 22.43 0.000 139.9253 166.731 hhsize | -257.691 14.73741 -17.49 0.000 -286.5816 -228.8004 _cons | 2362.355 178.3197 13.25 0.000 2012.784 2711.926 -----------------------------------------------------------------------------. test ( 1) urban98 =2000 urban98 = 2000.0 F( 1, 5988) = Prob > F = 0.01 0.9420

. test ( 1)

region1 = region2 region1 - region2 = 0.0 F( 1, 5988) = Prob > F = 34.57 0.0000

. test ( 1)

region1 = (region2+region3)/2 region1 - .5 region2 - .5 region3 = 0.0 F( 1, 5988) = Prob > F = 27.80 0.0000

. test ( 1) ( 2) ( 3)

region1 region2 region3 region1 = 0.0 region2 = 0.0 region3 = 0.0 F( 3, 5988) = Prob > F = 20.22 0.0000

. testparm ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7)

region*

region1 = 0.0 region2 = 0.0 region3 = 0.0 region4 = 0.0 region5 = 0.0 region6 = 0.0 region7 = 0.0 Constraint 5 dropped F( 6, 5988) = 148.55

52

Prob > F =

0.0000

Chng IV: V th

1. V th (graph) C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi] [, loi__th tu_chn_ring tu_chn_chung] Trong : loi__th (graph_type) tu_chn_ring (specific_options) tu_chn_chung (common_options) Ch ra loi th cn v Cc tu chn lin quan n tng loi th Cc tu chn c th s dng chung cho cc loi th nh tu chn v nh nhn trn cc trc ca th

Stata cho php v 8 loi th nh sau (graph_type): (1) th 2 chiu (two-way scatterplots) . graph rlpcex1 age

53

45801.7

comp.M&Reg price adj.pc tot exp 357.318 16 Age of household head 95

(2) Ma trn th 2 chiu (two-way scatterplot matrices) . gr rlpcex1 age educyr98 hhsize, matrix16 95 1 19 45801.7

comp.M&Reg price adj.pc tot exp357.318 95

Age of household head16 22

schooling year of HH.head0 19

Household size

1 357.318 45801.7 0 22

(3) th tn sut (histograms) . gr rlpcex1, bin(50) normal

54

.329888

Fraction

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

(4) th ri mt chiu (one-way scatterplots) . gr rlpcex1, oneway

357.318

comp.M&Reg price adj.pc tot exp

45801.71

(5) th hnh hp (box-and-whisker plots)

55

comp.M&Reg price adj.pc tot exp 45801.7

357.318

(6) th ct (bar chart) . sort reg7 . gr poor, bar means by(reg7)poor .498254

0

1

2

3

4

5

6

7

(7) th hnh trn (pie charts) . for num 1/7: gen poorX=poor if reg7==X -> gen poor1=poor if reg7==1 (5140 missing values generated) -> gen poor2=poor if reg7==2 (4824 missing values generated) -> gen poor3=poor if reg7==3 (5291 missing values generated) -> gen poor4=poor if reg7==456

(5245 missing values generated) -> gen poor5=poor if reg7==5 (5631 missing values generated) -> gen poor6=poor if reg7==6 (4976 missing values generated) -> gen poor7=poor if reg7==7 (4887 missing values generated) . graph poor1-poor7, pie24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

(8) th hnh sao (star charts) chart_type l star

57

Audi 5000

Audi Fox

BMW 320i

Datsun 200

Datsun 210

Price Mileage (mpg) Repair Record 1978

Datsun 510

Datsun 810

Fiat Strada

Honda Accord

Honda Civic

Headroom (in.) Trunk space (cu. ft.) Weight (lbs.) Length (in.)

Mazda GLC

Renault

Subaru

Toyota Celica

Toyota Corolla

Turn Circle (ft.) Displacement (cu. in.)

Toyota Corona

VW Dasher

VW Diesel

VW Rabbit

VW Scirocco

Volvo 260

Cc la chn chung (common_options) * To tp s liu. tabulate hhsize, sum (rlpcex1) | Summary of comp.M&Reg price adj.pc Household | tot exp size | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 4696.0254 4619.5012 214 2 | 4131.4892 3677.2297 497 3 | 3834.8615 2913.8177 731 4 | 3428.8011 2599.7301 1404 5 | 2930.5486 2168.0644 1318 6 | 2626.6848 2277.1893 867 7 | 2501.0912 2186.1605 480 8 | 2329.7009 1803.7873 255 9 | 2207.0166 1380.5607 126 10 | 2252.3772 1423.7576 58 11 | 2370.7034 1404.7148 29 12 | 1747.3691 924.72977 9 13 | 2114.1337 2109.0077 4 14 | 1579.78 990.81152 4 16 | 2994.5771 2061.6804 2 19 | 4833.936 0 1 ------------+-----------------------------------Total | 3188.6671 2692.5673 5999 . tab hhsize, sum(educyr98) | Summary of schooling year of Household | HH.head

58

size | Mean Std. Dev. Freq. ------------+-----------------------------------1 | 3.7897196 4.3956537 214 2 | 5.7545272 4.7225549 497 3 | 7.3023256 4.6396425 731 4 | 8.2578348 4.2659841 1404 5 | 7.7243298 4.2998488 1318 6 | 6.8788927 4.0778062 867 7 | 6.3348958 4.1241759 480 8 | 5.7333333 3.9623557 255 9 | 5.7936508 3.4878474 126 10 | 6.1724138 3.1851516 58 11 | 4.7931034 3.1665586 29 12 | 4.4444444 3.6438685 9 13 | 5 5.0990195 4 14 | 3 2.1602469 4 16 | 4 1.4142136 2 19 | 2 0 1 ------------+-----------------------------------Total | 7.0944185 4.4160917 5999 . replace meanexp= meanexp/1000 (16 real changes made) . replace meanexp= meanexp/1000 . rename var71 ahhsize . rename var72 meanexp . rename var73 meanedu . replace meanexp= meanexp/1000 . label var meanexp Chi tieu binh quan . label var meanedu So nam hoc . label var ahhsize Quy mo ho

* Cc tu chn v tiu v trc to Ly v d th 2 chiu, trc tung th hin chi tiu bnh qun v s nm hc bnh qun ca ch h, trc honh th hin quy m h gia nh. . gr meanexp meanedu ahhsize

59

meanexp 8.25783

meanedu

1.57978 1 ahhsize 19

* La chn v tiu : title("chui k t") t1title("chui k t") t2title("chui k t") b1title("chui k t") b2title("chui k t") l1title("chui k t") l2title("chui k t") r1title("chui k t") r2title("chui k t") Lnh ny ghi cc tiu trn pha trn (top), pha di (bottom), bn tri (left) v bn phi (right) th. V d: gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh)Chi tieu binh quan 8.25783 So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho 1.57978 1 Quy mo ho gia dinh 19

Do thi chi tieu va hoc van chu ho

60

* Hin th gi tr trc th xlabel[(gi tr s)] ylabel[(gi tr s)] rlabel[(gi tr s)] tlabel[(gi tr s)] V d: gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabelChi tieu binh quan 8 So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho

6

4

2 0 5 10 Quy mo ho gia dinh 15 20

Do thi chi tieu va hoc van chu hoCh : Cc la chn khc c th xem phn help bng lnh: help graxes Cc tu chn v ng ni xline[(gi tr s)] yline[(gi tr s)] rline[(gi tr s)] tline[(gi tr s)] connect(c[[p]] ... c[[p]]) V d: . gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll)

61

Chi tieu binh quan 8

So nam hoc

Chi tieu binh quan (tr dong) So nam hoc cua chu ho

6

4

2 0 5 10 Quy mo ho gia dinh 15 20

Do thi chi tieu va hoc van chu ho2. Mt s loi th thng dng 2.1. th 2 chiu C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], twoway [tu_chn_chung rescale] Tu chn rescale cho php hin th hai trc tung vi gi tr khc nhau . gen meanexp1=meanexp*1000 . label var meanexp1 "Chi tieu binh quan" . gr meanexp1 meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (nghin dong)) b2title (Quy mo ho gia dinh) xlabel ylabel rlabel(2 4 to 8) connect(ll) rescaleChi tieu binh quan 5000 So nam hoc 8

Chi tieu binh quan (nghin dong)

4000 So nam hoc 6 3000

4 2000

1000 0 5 10 Quy mo ho gia dinh 15 20

2

Do thi chi tieu va hoc van chu ho62

2.2. th tn sut C php: graph [bin] [quyn s] [iu kin] [phm vi], histogram [tu_chn_chung bin(#) freq normal[(#,#)] density(#)] Cc tu chn: bin(#) Freq normal[(#,#)] density(#)] V d: th tn sut ca chi tiu binh qun u ngi . gr rlpcex1, hist bin(20) normal.56026

Ch ra s lng khong cho th, gi tr ngm nh l bin(5) Gi tr tn sut s c hin th trn trc tung V hm phn phi chun c dng vi la chn normal, ch ra s lng im c lng hm mt theo phn phi chun

Fraction

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

. gr rlpcex1, hist bin(50) normal freq

63

1979

Frequency

0 357.318 comp.M&Reg price adj.pc tot exp 45801.7

. gr rlpcex1, hist bin(50) normal freq by(reg7)region1 415 region2 region3

0 region4 415 region5 region6

Frequency

0 region7 415 357.318 45801.7 357.318 45801.7

0 357.318

45801.7

Histograms by Code by 7 regions2.3. th hnh ct C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], bar [tu_chn_chung [no]alt means stack] V d: th gi tr trung bnh hc vn ca ch h v quy m h gia nh theo 7 vng . gr educyr98 hhsize, bar means by(reg7)64

comp.M&Reg price adj.pc tot exp

schooling year of HH.head 8.64426

Household size

0

1

2

3

4

5

6

7

. label define region 1 "region1" 2 "region2" 3 "region3" 4 "region4" 5 "region5" 6 "region6" 7 "region7" . label values reg7 region . tab reg7 Code by 7 | regions | Freq. Percent Cum. ------------+----------------------------------region1 | 859 14.32 14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4 | 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05 81.46 region7 | 1112 18.54 100.00 ------------+----------------------------------Total | 5999 100.00 . gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt

65

schooling year of HH.head 10

Household size

8

6

4

2

region1

region2

region3

region4

region5

region6

region7

La chn stack . gen persons=1 . gr persons urban98, bar ylabel by(reg7) stack altpersons 1500 1:urban 98; 0:rural 98

1000

500

0

region1

region2

region3

region4

region5

region6

region7

V d: Hy v th sau:

66

foodpoor 600

poor

400

200

0

region1

region2

region3

region4

region5

region6

region7

2.4. th hnh trn C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi], pie [tu_chn_chung] Lnh ny v th hnh trn Mi bin s chim 1 phn ca hnh trn v t l ca phn ny do tng gi tr ca cc quan st cu bin quyt nh. V d: V th t l phn trm s ngi ngho ca mi vng trn tng s ngi ngho ca c nc. . gr poor1-poor7, pie24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

. gen nonfpood=poor- foodpoor . label var nonfpood "poor but still above food poverty line" . gen nonpoor=( rlpcex1>=1790) . gr foodpoor nonfpood nonpoor, pie . set textsize 90

67

12% foodpoor 18% poor but still above food povert 70% nonpoor

. set textsize 100 . gr foodpoor nonfpood nonpoor, pie by(reg7) totalregion1 region2 region3

12% foodpoor 18% poor but still above food povert 70% nonpoor

region4

region5

region6

region7

Total

3. Lu tr v hin th th (Saving and graph using) lu tr th th ti ca s graph, vo thc n File, chn Save graph, sau la chn ng dn v tn file cho th, phn m rng ngm nh l gph. th cng c th c lu tr bng tu chn saving(tn tp [,replace]) vit sau lnh graph V d: . gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt saving ("c:\ do thi 1") . gr persons urban98, bar ylabel by(reg7) stack alt saving("c:\do thi 2")68

khng hin th th th c th dng lnh tt ch hin th th bng lnh set graphics { on | off } . set graphics off . gr poor1-poor7, pie saving ("c:\do thi 3", replace) (note: file c:\do thi 3.gph not found) Stata cho php hin th cc th lu tr bng lnh: graph using [tp tp th 2 ...] [, margin(#)] margin(#) ch ra khong cch l bao quanh th theo gi tr phn trm ca din tch th. Gi tr ngm nh l 0. V d: . set graphics on . graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so dac diem cua ho gia dinh")region1 region2 region3

persons 12% foodpoor 18% poor but still above food povert 70% nonpoor 1500

1:urban 98; 0:rural 98

region4

region5

region6

1000

region7

Total

500

0

region1

region2

region3

region4

region5

region6

region7

24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7

Mot so dac diem cua ho gia dinhCh : Chng ta co th kt hp lnh saving vi using lu tr ra th mi. V d: . graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so dac die m cua ho gia dinh") saving("c:\do thi tong hop") . graph using "c:\do thi tong hop"

69

Chng V: Lp trnh trong Stata

1. Gii thiu chung v chng trnh do-file 1.1. M v lu tr do-file Stata cho php vit cc tp c gi l do-file bao gm cc lnh ca Stata. Thay v thc hin tng lnh mt t ca s lnh command, cc tp do-file s ln lt thc hin cc lnh . Chng trnh Stata c son tho trong ca s do-file editor. Ca s ny c m bng cch kch vo thc n Windows v chn tu chn do-file editor. Mt cch khc m ca s ny l g lnh doedit ti ca s lnh command. V d: Mt chng trnh c th c son tho trong ca s do-file editor nh sau: ---------------clear set mem 32m use "C:\VLSS98\Hhexp98n.dta", clear tab urban98 sum hhsize gen new=hhsizet gen new=hhsize----------------

Sau khi son tho, do-file s c lu tr bng tu chn Save as trong thc n File ca ca s do-file editor. Tn ca do-file c th c ch ra ngay ti lnh doedit nh sau: doedit (tn do-file) Tp do-file c phn m rng l do. v d trn chng ta c th lu tr on chng trnh di tn l chng trnh 1 ti th mc Vlss98 trn a C. 1.2. Thc hin cc tp do-file chy do-file th ti ca s lnh chng ta g mt trong hai lnh sau: do filename [, nostop] run filename [, nostop]70

Lnh run thc hin cc lnh trong do-file nhng khng hin th kt qu ra mn hnh. Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata s bo li v ngng vic thc hin cc cu lnh sau . Tuy nhin nu tu chn nostop c ch ra th Stata s b qua cu lnh b li v tip tc thc hin cc lnh sau cu lnh li . V d: . do "c:\vlss98\chuong trinh 1" . clear . set mem 32m (32768k) . use "C:\VLSS98\Hhexp98n.dta", clear . tab urban98 1:urban 98; | 0:rural 98 | Rural | Urban | Total | . sum hhsize Variable | hhsize | Obs 5999 Mean Std. Dev. 4.752292 1.954292 Min 1 Max 19 -------------+----------------------------------------------------Freq. 4269 1730 5999 Percent 71.16 28.84 100.00 Cum. 71.16 100.00 ------------+-----------------------------------

------------+-----------------------------------

. gen new=hhsizet hhsizet not found r(111); end of do-file r(111); Vi tu chn nostop . do "c:\vlss98\chuong trinh 1", nostop . clear . set mem 32m (32768k) . use "C:\VLSS98\Hhexp98n.dta", clear . tab urban98 1:urban 98; |71

0:rural 98 | Rural | Urban | Total | . sum hhsize Variable | hhsize |

Freq. 4269 1730 5999

Percent 71.16 28.84 100.00

Cum. 71.16 100.00

------------+-----------------------------------

------------+-----------------------------------

Obs 5999

Mean Std. Dev. 4.752292 1.954292

Min 1

Max 19

-------------+-----------------------------------------------------

. gen new=hhsizet hhsizet not found r(111); . gen new=hhsize . end of do-file Thc hin (chy) bng lnh run . run "c:\vlss98\chuong trinh 1", nostop hhsizet not found Cc do-file c th thc hin bng tu chn Do trong thc n File, hoc thc hin trc tip trong ca s Do-file editor bng tu chn Do hoc Run trong thc n Tool. 1.3. Mt s lu khi son tho do-file version # Khi son tho cc tp do-file chng ta nn a dng lnh ny vo u chng trnh thng bo phin bn Stata c dng son tho do-file. V d nu nh chng ta dng Stata 7.0 son tho do-file th cu lnh ny s c a vo u chng trnh nh sau: version 7.0 clear use Hhexp98n.dta tab reg7 . Cc phin bn Stata khc nhau s c th c s khc nhau v c php hoc ngha ca cc cu lnh. Lnh version cho php chng trnh Stata chy c th hiu ng c ni dung ca tp do-file c vit bi cc phin bn khc. set memory #[k|m] Nu nh file s liu i hi b nh ln hn b nh m Stata ang s dng th chng ta phi thit lp b nh ln hn cho Stata bng lnh trn. Ch l khng nn thit lp b nh ln hn b nh ca RAM my tnh.72

V d: . use "C:\Hhexp98n.dta", clear no room to add more observations r(901); . set mem 32m (32768k) . use "C:\Hhexp98n.dta", clear set more off/on Theo ch ngm nh, khi thc hin mt lnh nu nh kt qu ca vic x l lnh di hn ca s kt qu (Stata Results), mn hnh s dng li v chng ta s phi n phm (chng hn Enter hoc Space bar) kt qu tip tc c hin th. Lnh set more off cho php kt qu khng b dng li m c hin th lin tc cho n khi thc hin xong cu lnh hoc do-file. Lnh set more on khi phc li ch ngm nh. K t * v /* */ Stata s khng thc hin cc cu lnh c bt u bng k t * hoc nm gia hai nhm k t /* */. Cc k t ny dng vit ch thch trong do-file. V d: -------------------version 7.0 set mem 32m use "C:\Hhexp98n.dta", clear * Tao bien thu nhap cua ho gia dinh /* Bien nay bang Thu nhap binh quan nhan voi Quy mo ho*/ gen hhexp = rlpcex1 * hhsize #delimit ; Khi cu lnh trong do-file editor qu di th chng ta c th dng lnh ny thng bo rng 1 cu lnh c kt thc bng k t (;). Theo ch ngm nh th cu lnh c kt thc khi xung dng bng vic g phm Enter. khi phc li ch ngm nh th dng lnh #delimit cr V d: lnh v th chng trc: graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll) tung ng vi: #delimit ; graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)73

yline(2 4 to 8) connect(ll) ; gen hhexp = rlpcex1 * hhsize ; .. Sau chng ta nn khi phc li ch ngm nh nu nh cc cu lnh sau c th vit trn 1 dng bng lnh: #delimit cr Ch : Chng ta c th dng k t /* */ vit cu lnh di nh sau: graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) /* */ l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) /* */ b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll); Cc lnh # delimit v cch vit cu lnh di s dng k t /* */ ch dng c trong dofile ch khng dng c ti ca s lnh command.

2. Local v global macros Macros l cc bin c dng trong cc chng trnh Stata. Bin macros c xem nh 1 on k t - gi l macroname (tn ca macros) - tng ng vi 1 dy k t khc - c gi l macro contents (ni dung ca macro). C hai loi macros l local macros (macros ni b) v global macros (macros ton b). 2.1. Local macros Nu chng ta g: . local hogd age hhsize rlpcex1 (Du nhy kp co th b qua, tc l c th g: local hogd age hhsize rlpcex1) Khi th `hogd s c hiu tng ng vi: age hhsize rlpcex1. hogd c gi l tn ca macros, cn age hhsize rlpcex1 l ni dung ca macros. s dng ni dung ca macros, chng ta g tn ca macros gia du trch dn bn tri ( ) nm pha trn bn tri bn phm - v du trch dn bn phi ( ) nm pha phi bn di ca bn phm. Nh vy nu chng ta g: . summarize `hogd th tng ng vi g: . summarize age hhsize rlpcex1 Nu chng ta g: . local tb summarize th chng ta c th thc hin lnh summarize age hhsize rlpcex1 bng cch g: . `tb' `hogd' Variable | Obs Mean Std. Dev. Min Max -------------+-----------------------------------------------------------age | 5999 48.01284 13.7702 16 9574

hhsize | 5999 4.752292 1.954292 1 19 rlpcex1 | 5999 3188.667 2692.567 357.318 45801.71 hin th ni dung ca local macros th chng ta g lnh macros list _(tn local macros) V d: . macro list _hogd _hogd: age hhsize rlpcex1 xo local macros th chng ta c th dung lnh macros drop _(tn local macros) V d: . macro drop _hogd . macro list _hogd local macro `hogd' not found r(111); 2.2. Global macros Nu chng ta g: . global diaban reg7 province commune (hoc c th b qua du ngoc kp: global diaban reg7 province commune) Khi th $diaban tng ng vi: reg7 province commune. diaban c gi l tn ca macros, cn reg7 province commune l ni dung ca macros. s dng c ni dung ca global macros chng ta g k hiu $ lin trc tn ca macros. Nh vy nu chng ta g: . describe $diaban th tng ng vi g: . describe : reg7 province commune . describe $diaban storage display value variable name type format label variable label ------------------------------------------------------------------------------reg7 int %8.0g Code by 7 regions province float %9.0g Province code commune float %9.0g commune code PSU-SVY commands . global mota "describe" . $mota $diaban storage display value variable name type format label variable label ------------------------------------------------------------------------------reg7 int %8.0g Code by 7 regions75

province commune

float %9.0g float %9.0g

Province code commune code PSU-SVY commands

hin th ni dung ca global macros th chng ta g lnh macros list (tn global macros) V d: . global diaban "reg7 province commune" . macro list diaban diaban: reg7 province commune xo global macros th chng ta c th dng lnh macros drop (tn local macros) V d: . macro drop diaban . macro list diaban global macro $diaban not found r(111); 2.3. S khc nhau gia local macros v global macros Local macros ch tn ti trong 1 chng trnh. Mt chng trnh s khng hiu c cc local macros c s dng cc chng trnh khc. Trong khi , mt khi c khai bo, global macros c hiu bi tt c cc chng trnh v tn ti trong b nh ca Stata trong sut qu trnh hot ng. V d: Thc hin on chng trnh khai bo local macros a. Sau thc hin lnh hin th ni dung local macros ny, nhng macros ny khng tn ti on chng trinh khc hay b nh ca Stata. . do "C:\WINDOWS\TEMP\STD010000.tmp" . local a "chuong trinh thong ke Stata" . end of do-file . macro list _a local macro `a' not found r(111); Trong khi i vi global macros . do "C:\WINDOWS\TEMP\STD010000.tmp" . global b "chuong trinh thong ke Stata" . end of do-file . macro list b b: chuong trinh thong ke Stata 3. Tch v hng v ma trn (scalar and matrix) 3.1. Ma trn (matrix) Stata nh ngha ma trn A[r, c] l mt mng hnh ch nht gm r hng (row) v c ct (column).76

V d: Nu ma trn A c to ra th chng ta c th xem ni dung ca ma trn nh sau: . matrix list A A[3,3] c1 c2 c3 r1 r2 1 3 2 4 4 7

r3 10 11 14 y ma trn A bao gm 9 phn t (element): 1, 2, 4, 3, 4, 7, 10, 11, 14. Cc ct c t tn l c1, c2, v c3, v cc hng l r1, r2, v r3. Phn t l giao im ca dng 1 v ct 2 c k hiu l A[1, 2]. Trong v d ny A[1, 2] cha gi tr bng 2. 3.2. Tch v hng (scalar) Tch v hng cha 1 phn t l s. Tch v hng c nh ngha bng lnh sau: scalar scalar_name = expression V d: . scalar a = 10 . scalar list a a = 10 . scalar b = a* 2 . scalar list b b= 20 Trong chng mc no , tch v hng c th xem nh mt trng hp c bit ca ma trn ch c 1 phn t (mt hng v mt ct). 3.3. Mt s lnh lm vic vi ma trn Thit lp kch thc ma trn Gia tr ngm nh ca kch thc ma trn l ti a 40 hng v 40 ct. Chng ta c th thay i kch thc ti a ny bng lnh: . set matsize 500 Lnh ny cho php cc ma trn c to ra c th bao gm 500 hng v 500 ct. To ma trn Ma trn c th to ra bng cc cu lnh trc tip. V d: matrix mymat = (1,2\3,4) matrix myvec = (1 5 3 1 3) matrix mycol = (1/5/3/1/3) Cc phn t c phn bit bi du phy, cn cc hng c phn bit bi du gch cho To ra vct hng To ra vct ct

Ma trn cng c th c to ra t s liu bng lnh:77

mkmat [iu kin] [phm vi] [, matrix(tn ma trn) ] V d: . input maho quymo thunhap maho 1. 101 6 1200 2. 103 5 1400 3. 105 5 3200 4. 107 9 1000 5. 109 4 2500 6. end . mkmat maho quymo thunhap, matrix(A) . matrix list A A[5,3] maho r1 r2 r3 r4 r5 101 103 105 107 109 =B = (C+C)/2 quymo thunhap 6 5 5 9 4 1200 1400 3200 1000 2500 To ra ma trn D bng ma trn B Tnh li ma trn C da trn gi tr ca ca n To ra ma trn D bng tch ma trn A v ma trn chuyn v A quymo thunhap

Tnh ton ma trn matrix D matrix C

matrix D = A*A Xo ma trn

Ma trn v tch v hng c th xo khi b nh bng lnh: matrix drop scalar drop V d: . matrix drop A . scalar drop B 4. Lnh iu kin v vng lp 4.1. Lnh ifelse C php: iu kin (iu kin logic) { Nhm cu lnh 1 } else Cu lnh78

Stata s kim tra iu kin logic (expression), nu iu kin ny ng th cc lnh Nhm cu lnh 1 s c thc hin, nu iu kin sai th lnh ng sau else s c thc hin, trong trng hp