*********************************** ************************************ ***SOCI 600: INTRODUCTION TO SOCIOLOGICAL DATA ANALYSIS ***ORDINARY LEAST SQUARES REGRESSION ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Shortcut for folders global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata07.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Open 2021 ACS (only Texas) use "$data\ACS2021.dta", clear ***Complex survey design svyset cluster [pweight=perwt], strata(strata) singleunit(scaled) ************************************ ***GENERATE VARIABLES ************************************ ***Sex gen female=. replace female=0 if sex==1 // Male replace female=1 if sex==2 // Female label define female 0 "Male" 1 "Female" label values female female ***Race/ethnicity gen raceth=. replace raceth=1 if race==1 & hispan==0 // White replace raceth=2 if race==2 & hispan==0 // Black replace raceth=3 if hispan>=1 & hispan<=4 // Hispanic replace raceth=4 if (race==4 | race==5 | race==6) & hispan==0 // Asian replace raceth=5 if race==3 & hispan==0 // Native American replace raceth=6 if (race==7 | race==8 | race==9) & hispan==0 // Other label define raceth 1 "White" 2 "African American" 3 "Hispanic" /// 4 "Asian" 5 "Native American" 6 "Other races" label values raceth raceth ***Age egen agegr = cut(age), at(0,16,20,25,35,45,55,65,100) label define agecode 0 "0-15" 16 "16-19" 20 "20-24" 25 "25-34" /// 35 "35-44" 45 "45-54" 55 "55-64" 65 "65-100" label values agegr agegr ***Educational attainment gen educgr=. replace educgr=1 if educ>=0 & educ<=5 // Less than high school replace educgr=2 if educ==6 // High school replace educgr=3 if educ==7 | educ==8 // Some college replace educgr=4 if educ==10 // College replace educgr=5 if educ==11 // 5+ years of college, graduate school label define educgr 1 "Less than high school" 2 "High school" /// 3 "Some college" 4 "College" 5 "Graduate school" label values educgr educgr ***Marital status gen marital=. replace marital=1 if marst==1 | marst==2 // Married replace marital=2 if marst>=3 & marst<=5 // Separated, divorced, widowed replace marital=3 if marst==6 // Never married, single label define marital 1 "Married" 2 "Separated, divorced, widowed" 3 "Never married" label values marital marital ***Migration status gen migrant=. replace migrant=1 if migrate1d==10 | migrate1d==23 // same house or within PUMA replace migrant=2 if migrate1d>=24 & migrate1d<=32 // internal migrant replace migrant=3 if migrate1d==40 // international migrant label define migrant 1 "Non-migrant" 2 "Internal migrant" 3 "International migrant" label values migrant migrant ***Internal migration status (domestic migration) gen dommig=. replace dommig=0 if migrant==1 // non-migrant replace dommig=1 if migrant==2 // internal migrant label define dommig 0 "Non-migrant" 1 "Internal migrant" label values dommig dommig tab migrant dommig, m ***International migration status gen intmig=. replace intmig=0 if migrant==1 // non-migrant replace intmig=1 if migrant==3 // international migrant label define intmig 0 "Non-migrant" 1 "International migrant" label values intmig intmig tab migrant intmig, m ***Wage and salary income gen income=. replace income=incwage if incwage!=999999 ************************************ ***ORDINARY LEAST SQUARES (OLS) REGRESSION ************************************ ***Sample size count ***Keep only observations with non-missing values keep if female!=. & raceth!=. & age!=. & agegr!=. & /// educgr!=. & marital!=. & income!=. & income!=0 & migrant!=. count ***Drop observations with missing values ***Same as above drop if female==. | raceth==. | age==. | agegr==. | /// educgr==. | marital==. | income==. | income==0 | migrant==. count ************************************ ***OLS WITH INCOME, AGE, AND EDUCATION ************************************ ***Use complex survey design svy: reg income age educgr ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size and estimate robust standard errors reg income age educgr [pweight=perwt], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg income age educgr [aweight=perwt] ************************************ ***DETERMINING NORMALITY ************************************ ***Histogram of wage and salary income hist income [fweight=perwt] if income!=0, percent normal ylabel(0(2)12) xtitle(Wage and salary income) ***Boxplot of wage and salary income graph hbox income if income!=0 [fweight=perwt], ytitle(Wage and salary income) ***Quantile-normal plot of wage and salary income qnorm income if income!=0, ytitle(Wage and salary income) ***Skewness and kurtosis sum income if income!=0 [fweight=perwt], d sum income if income!=0 [aweight=perwt], d ***Power transformation ***q<1 (reduce positive skew) ***log(y): q=0 gen lnincome = ln(income) ***Histogram of log of wage and salary income hist lnincome [fweight=perwt], percent normal xtitle(Natural logarithm of wage and salary income) ***Boxplot of log of wage and salary income graph hbox lnincome [fweight=perwt], ytitle(Natural logarithm of wage and salary income) ***Quantile-normal plot of log of wage and salary income qnorm lnincome, ytitle(Natural logarithm of wage and salary income) ***Skewness and kurtosis sum lnincome [fweight=perwt], d sum lnincome [aweight=perwt], d ************************************ ***OLS WITH NATURAL LOGARITHM OF INCOME, AGE, AND EDUCATION ************************************ ***Use complex survey design svy: reg lnincome age educgr ***Automatically see exponential of coefficients svy: reg lnincome age educgr, eform(Exp. Coef.) ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size and estimate robust standard errors reg lnincome age educgr [pweight=perwt], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg lnincome age educgr [aweight=perwt] ************************************ ***Interpret coefficients with log of income ************************************ ***When x increases by 1, ***y increases by 100*[exp(coefficient)-1] percent, ***controlling for the effects of all other independent variables svy: reg lnincome age educgr ***Example of coefficient for age di exp(0.0217796) ***Percentage interpretation di 100*(exp(0.0217796)-1) ***When coefficient has a small magnitude, ***we can use 100*coefficient di 100*(0.0217796) ***Example of coefficient for years of education di exp(0.3401378) di 100*(exp(0.3401378)-1) di 100*(0.3401378) ************************************ ***PREDICTED VALUES AND RESIDUAL ANALYSIS ************************************ ************************************ ***OLS with income, age, and education ************************************ svy: reg income age educgr ***Predicted income of someone with 45 years of age and college education di -31473.52 + (790.4565)*(45) + (17278.29)*(4) ***Save predicted values as a new variable after the estimation of a regression model predict predincome label variable predincome "" ***Scatterplot of predicted income by age twoway (scatter predincome age) ***Scatterplot of predicted income by education twoway (scatter predincome educgr) (lfit predincome educgr) ***Save residual values as a new variable after the estimation of a regression model predict resincome, res label variable resincome "" ***Scatterplot of residuals by predicted income scatter resincome predincome, yline(0) ************************************ ***OLS with natural logarithm of income, age, and education ************************************ svy: reg lnincome age educgr ***Predicted log of income of someone with 45 years of age and college education di 8.408235 + (0.0217796)*(45) + (0.3401378)*(4) ***Exponential of predicted log of income di exp(10.748868) ***Save predicted values as a new variable after the estimation of a regression model predict predlnincome label variable predlnincome "" ***Generate variable with exponential of predicted log of income gen exppredlnincome = exp(predlnincome) ***Scatterplot of predicted log of income by age twoway (scatter predlnincome age) ***Scatterplot of exponential of predicted log of income by age twoway (scatter exppredlnincome age) ***Scatterplot of predicted log of income by education twoway (scatter predlnincome educgr) (lfit predlnincome educgr) ***Scatterplot of exponential of predicted log of income by education twoway (scatter exppredlnincome educgr) ***Save residual values as a new variable after the estimation of a regression model predict reslnincome, res label variable reslnincome "" ***Scatterplot of residuals by predicted log of income scatter reslnincome predlnincome, yline(0) ***Browse data browse age educgr income predincome resincome lnincome predlnincome reslnincome exppredlnincome, nolabel ************************************ ***OLS WITH SQUARED INDEPENDENT VARIABLE (AGE + AGE SQUARED) ************************************ ***Generate variable with mean income by age bysort age: egen mincage=mean(income) if income!=0 sum mincage, d ***Line graph of mean income by age twoway line mincage age [fweight=perwt], /// ytitle("Mean wage and salary income") ylabel(0(20000)80000) ***Generate age squared variable gen agesq = age * age ***OLS with natural logarithm of income, age, and age squared svy: reg lnincome age agesq ***Save predicted values as a new variable after the estimation of a regression model predict predlnincome2 label variable predlnincome2 "" ***Line graph of predicted log of income by age line predlnincome2 age, sort ***Generate variable with exponential of predicted log of income gen exppredlnincome2 = exp(predlnincome2) ***Line graph of exponential of predicted log of income by age line exppredlnincome2 age, sort ***Save residual values as a new variable after the estimation of a regression model predict reslnincome2, res label variable reslnincome2 "" ***Scatterplot of residuals by predicted log of income scatter reslnincome2 predlnincome2, yline(0) ***Scatterplot of residuals by exponential of predicted log of income scatter reslnincome2 exppredlnincome2, yline(0) ************************************ ***DUMMY VARIABLES ************************************ ************************************ ***Age ************************************ ***Age does not have a normal distribution hist age [fweight=perwt], percent normal ***Utilize age group variable ***16-19; 20-24; 25-34; 35-44; 45-54; 55-64; 65+ tabstat age, by(agegr) stat(min max count) ***Generate dummy variables for age (manually) gen agegr16=0 replace agegr16=1 if agegr==16 tab agegr agegr16, m gen agegr20=0 replace agegr20=1 if agegr==20 tab agegr agegr20, m gen agegr25=0 replace agegr25=1 if agegr==25 tab agegr agegr25, m gen agegr35=0 replace agegr35=1 if agegr==35 tab agegr agegr35, m gen agegr45=0 replace agegr45=1 if agegr==45 tab agegr agegr45, m gen agegr55=0 replace agegr55=1 if agegr==55 tab agegr agegr55, m gen agegr65=0 replace agegr65=1 if agegr==65 tab agegr agegr65, m ***Generate dummy variables for age (automatically) tab agegr, gen(agegr) tab agegr agegr1, m tab agegr agegr2, m tab agegr agegr3, m tab agegr agegr4, m tab agegr agegr5, m tab agegr agegr6, m tab agegr agegr7, m ***Browse data browse age agegr agegr16-agegr65 agegr1-agegr7 ***Choose reference category for age ***Use the category with the largest sample size as the reference (25–34) tab agegr, m ***Or age category with large sample and meaningful interpretation for your problem ***Age group with the highest average income (45–54) tabstat income, by(agegr) stat(mean count) ************************************ ***Education ************************************ ***Education does not have a normal distribution hist educ [fweight=perwt], percent normal ***Utilize education group variable ***Less than high school; high school; some college; college; graduate school tab educgr, m ***Generate dummy variables for education (automatically) tab educgr, gen(educgr) tab educgr educgr1, m tab educgr educgr2, m tab educgr educgr3, m tab educgr educgr4, m tab educgr educgr5, m ***Browse data browse educ educgr educgr1-educgr5 ***Choose reference category for education ***Use the category with the largest sample size as the reference (high school) tab educgr, m ***Education category with highest average income (graduate school) does not have large sample tabstat income, by(educgr) stat(mean count) ************************************ ***OLS with natural logarithm of income and dummy independent variables ************************************ ***45-54 as reference group (agegr5): combination of large sample size and meaningful interpretation tab agegr tabstat income, by(agegr) stat(mean count) ***High school as reference group (educgr2): largest sample size tab educgr ***Regression using dummies previously generated svy: reg lnincome agegr1 agegr2 agegr3 agegr4 agegr6 agegr7 educgr1 educgr3 educgr4 educgr5 ***Regression with dummies and reference indicated within "reg" command ***"i" inform dummy variables ***"b#" indicate reference category svy: reg lnincome ib45.agegr ib2.educgr ***Automatically see exponential of coefficients svy: reg lnincome ib45.agegr ib2.educgr, eform(Exp. Coef.) ************************************ ***Interpret coefficients with log of income ************************************ ***When x increases by 1, ***y increases by 100*[exp(coefficient)-1] percent, ***controlling for the effects of all other independent variables svy: reg lnincome ib45.agegr ib2.educgr ***Example of coefficient for 16-19 age group ***compared to 45-54 age group di exp(-2.127542) ***Percentage interpretation di 100*(exp(-2.127542)-1) ***Since the coefficient for 16-19 age group has a large magnitude, ***we cannot use the approximation of 100*coefficient di 100*(-2.127542) ***Example of coefficient for educgr4 (college) ***compared to educgr2 (high school) di exp(0.5492793) ***Percentage interpretation di 100*(exp(0.5492793)-1) ***Since the coefficient for college has a large magnitude, ***we cannot use the approximation of 100*coefficient di 100*(0.5492793) ************************************ ***Standardized regression coefficients, sum of squares, and adjusted R-squared ************************************ ***Standardized regression coefficients ***(i.e., standardized partial slopes, beta-weights) ***It does not allow the use of complex survey design ***Use pweight to maintain sample size reg lnincome ib45.agegr ib2.educgr [pweight=perwt], beta ***Use aweight to estimate adjusted R-squared ***pweight and complex survey design omit sum of squares and adjusted R-squared reg lnincome ib45.agegr ib2.educgr [aweight=perwt] ************************************ ***Predicted values and residual analysis ************************************ svy: reg lnincome ib45.agegr ib2.educgr ***Save predicted values as a new variable after the estimation of a regression model predict predlnincome3 label variable predlnincome3 "" ***Generate variable with exponential of predicted log of income gen exppredlnincome3 = exp(predlnincome3) ***Save residual values as a new variable after the estimation of a regression model predict reslnincome3, res label variable reslnincome3 "" ***Scatterplot of residuals by predicted log of income scatter reslnincome3 predlnincome3, yline(0) ***Scatterplot of residuals by exponential of predicted log of income scatter reslnincome3 exppredlnincome3, yline(0) ************************************ ***FULL OLS MODEL ************************************ ***Reference: sex (men = 0) tab female tab female, nolabel ***Reference: race/ethnicity (white = 1) tab raceth tab raceth, nolabel ***Reference: age group (45-54 = 45) tab agegr tabstat income, by(agegr) stat(mean count) ***Reference: education group (high school = 2) tab educgr tab educgr, nolabel ***Reference: marital status (married = 1) tab marital tab marital, nolabel ***Reference: migration status (non-migrant = 1) tab migrant tab migrant, nolabel ***OLS regression svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant ***Automatically see exponential of coefficients svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant, eform(Exp. Coef.) ***Standardized coefficients reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant [pweight=perwt], beta ***Sum of squares and adjusted R-squared reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant [aweight=perwt] ***Line graphs for predicted values don't look good with all these categorical variables svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant ***Predicted log of income predict predlnincome4 line predlnincome4 agegr, sort ***Predicted income in dollars gen exppredlnincome4 = exp(predlnincome4) line exppredlnincome4 age, sort ***Let's explore the Spost13 commands ************************************ ***PREDICTED VALUES WITH SPOST13 COMMANDS ***From Long and Freese (2014) ************************************ ***If your Stata doesn't have the SPost13 commands, ***type "search spost13_ado" and follow instructions to install it. *search spost13_ado ***Full OLS model svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant ************************************ ***Predicted values by age group - ONLY WOMEN ************************************ ***References: Female (female=1), ***White (raceth=1), High school (educgr=2), Married (marital=1), Non-migrant (migrant=1) mgen, stub(F) at(agegr=(16 20 25 35 45 55 65) female=1 /// raceth=1 educgr=2 marital=1 migrant=1) allstats ***Browse data browse Fxb-Fagegr ***Predicted earnings in dollars gen Fpredincome = exp(Fxb) label variable Fpredincome "Women" ***Standard error in dollars gen Fsedollar = exp(Fse) ***Label for age group label values Fagegr agegr label variable Fagegr "Age group" ***Bar graph: Log of earnings graph bar Fxb, over(Fagegr) ytitle("Predicted log of earnings") ***Bar graph: Earnings in dollars graph bar Fpredincome, over(Fagegr) ylabel(0(10000)50000) ytitle("Predicted earnings") ***Line graph: Log of earnings twoway (line Fxb Fagegr, lcolor(maroon)), /// ytitle("Predicted log of earnings") xtitle(Age group) ***Line graph: Earnings in dollars twoway (line Fpredincome Fagegr, lcolor(maroon)), /// ylabel(0(10000)50000) ytitle("Predicted earnings") xtitle(Age group) ************************************ ***Predicted values by age group - ONLY MEN ************************************ ***References: Male (female=0), ***White (raceth=1), High school (educgr=2), Married (marital=1), Non-migrant (migrant=1) mgen, stub(M) at(agegr=(16 20 25 35 45 55 65) female=0 /// raceth=1 educgr=2 marital=1 migrant=1) allstats ***Browse data browse Mxb-Magegr ***Predicted earnings in dollars gen Mpredincome = exp(Mxb) label variable Mpredincome "Men" ***Standard error in dollars gen Msedollar = exp(Mse) ***Label for age group label values Magegr agegr label variable Magegr "Age group" ***Bar graph: Log of earnings graph bar Mxb, over(Magegr) ytitle("Predicted log of earnings") ***Bar graph: Earnings in dollars graph bar Mpredincome, over(Magegr) ylabel(0(10000)50000) ytitle("Predicted earnings") ***Line graph: Log of earnings twoway (line Mxb Magegr, lcolor(navy)), /// ytitle("Predicted log of earnings") xtitle(Age group) ***Line graph: Earnings in dollars twoway (line Mpredincome Magegr, lcolor(navy)), /// ylabel(0(10000)50000) ytitle("Predicted earnings") xtitle(Age group) ************************************ ***Predicted values by age group - WOMEN AND MEN ************************************ twoway (line Fpredincome Fagegr, lcolor(maroon)) (line Mpredincome Magegr, lcolor(navy)), /// ylabel(0(10000)50000) ytitle("Predicted earnings") xtitle(Age group) ***Save graph graph export "$output\predicted_earnings_age_sex.png", replace ************************************ ***Predicted values by age group and sex ************************************ ***References: ***White (raceth=1), High school (educgr=2), Married (marital=1), Non-migrant (migrant=1) mgen, stub(A) at(agegr=(16 20 25 35 45 55 65) female=(0 1) /// raceth=1 educgr=2 marital=1 migrant=1) allstats ***Browse data browse Axb-Aagegr ***Predicted income in dollars gen Apredincome = exp(Axb) ***Standard error in dollars gen Asedollar = exp(Ase) ***Create interaction between age group and sex gen agesex=. replace agesex=1 if Aagegr==16 & Afemale==1 // 16-19 female replace agesex=2 if Aagegr==16 & Afemale==0 // 16-19 male replace agesex=3 if Aagegr==20 & Afemale==1 // 20-24 female replace agesex=4 if Aagegr==20 & Afemale==0 // 20-24 male replace agesex=5 if Aagegr==25 & Afemale==1 // 25-34 female replace agesex=6 if Aagegr==25 & Afemale==0 // 25-34 male replace agesex=7 if Aagegr==35 & Afemale==1 // 35-44 female replace agesex=8 if Aagegr==35 & Afemale==0 // 35-44 male replace agesex=9 if Aagegr==45 & Afemale==1 // 45-54 female replace agesex=10 if Aagegr==45 & Afemale==0 // 45-54 male replace agesex=11 if Aagegr==55 & Afemale==1 // 55-64 female replace agesex=12 if Aagegr==55 & Afemale==0 // 55-64 male replace agesex=13 if Aagegr==65 & Afemale==1 // 65+ female replace agesex=14 if Aagegr==65 & Afemale==0 // 65+ male tab agesex Aagegr, m tab agesex Afemale, m ***Label for age group and sex variable label define agesex 1 "Female, 16-19" 2 "Male, 16-19" 3 "Female, 20-24" 4 "Male, 20-24" /// 5 "Female, 25-34" 6 "Male, 25-34" 7 "Female, 35-44" 8 "Male, 35-44" /// 9 "Female, 45-54" 10 "Male, 45-54" 11 "Female, 55-64" 12 "Male, 55-64" /// 13 "Female, 65+" 14 "Male, 65+" label values agesex agesex ***Bar graph: Earnings in dollars by age and sex graph bar Apredincome, over(agesex, label(angle(45))) /// ylabel(0(10000)50000) ytitle("Predicted earnings") ***Line graph: Earnings in dollars twoway (line Apredincome Aagegr if Afemale==1, lcolor(maroon)) /// (line Apredincome Aagegr if Afemale==0, lcolor(navy)), /// ylabel(0(10000)50000) ytitle("Predicted earnings") xtitle(Age group) ************************************ ***Suggestion: export these predicted values to Excel ***Then, make better-looking graphs with dots for point estimates with confidence intervals ************************************ sort agesex browse Axb-agesex ************************************ ***Residual analysis ************************************ svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant ***Save predicted values as a new variable after the estimation of a regression model predict predlnincome5 label variable predlnincome5 "" ***Generate variable with exponential of predicted log of income gen exppredlnincome5 = exp(predlnincome5) ***Save residual values as a new variable after the estimation of a regression model predict reslnincome5, res label variable reslnincome5 "" ***Scatterplot of residuals by predicted log of income scatter reslnincome5 predlnincome5, yline(0) ***Scatterplot of residuals by exponential of predicted log of income scatter reslnincome5 exppredlnincome5, yline(0) ************************************ ***TEST OF COLLINEARITY WITH VARIANCE INFLATION FACTOR (VIF) ************************************ ***This is a factor to estimate the increase in variance ***due to issues of multicollinearity in the linear regression. ***Collinearity increases standard errors, ***i.e. it generates smaller statistical tests (smaller t-test) ***VIF > 5 indicates multicollinearity ***VIF > 10 indicates almost perfect multicollinearity ***OLS model with pweight, because VIF doesn't allowed complex survey design reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant [pweight=perwt] ***Calculate variance inflation factors (VIFs) for the independent variables ***specified in previous the linear regression model vif ***Variance equals the standard error squared. ***VIF equals to 1.50 for 16-19 age group means that ***standard error of this variable is 1.23 times higher (square root of VIF) ***than what it would have been if this variable was not correlated ***to any of the other independent variables in the model. ***Estimate the square root of VIF di sqrt(1.50) ***Example with age squared reg lnincome i.female age agesq ib2.educgr i.raceth i.marital i.migrant [pweight=perwt] ***Estimate VIF from previous model vif ***Square root of VIF is high for age and age squared (VIF > 5). ***In this case, this is not a problem because ***we intentionally included these variables to estimate ***the quadratic association of experience in the labor market (age as a proxy) with earnings di sqrt(39.54) di sqrt(36.43) ************************************ ***EXPORT RESULTS TO WORD/EXCEL WITH OUTREG2 COMMAND ************************************ ***If your Stata doesn't have the outreg2 command, ***type "ssc install outreg2" to install it. *ssc install outreg2 ************************************ ***Model 1: Sex, age group, education group ************************************ svy: reg lnincome i.female ib45.agegr ib2.educgr ***Export to Excel outreg2 using "$output\OLS.xls", replace excel dec(3) ctitle(Model 1) nodepvar ***Estimate adjusted R-squared reg lnincome i.female ib45.agegr ib2.educgr [aweight=perwt] ************************************ ***Model 2: Sex, age group, education group, race/ethnicity ************************************ svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth ***Export to Excel outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(Model 2) nodepvar ***Estimate adjusted R-squared reg lnincome i.female ib45.agegr ib2.educgr i.raceth [aweight=perwt] ************************************ ***Model 3: Sex, age group, education group, race/ethnicity, marital status ************************************ svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital ***Export to Excel outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(Model 3) nodepvar ***Estimate adjusted R-squared reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital [aweight=perwt] ************************************ ***Model 4: Sex, age group, education group, race/ethnicity, marital status, migration status ************************************ svy: reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant ***Export to Excel outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(Model 4) nodepvar ***Estimate adjusted R-squared reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant [aweight=perwt] ************************************ ***Model 4: Standardized coefficients ************************************ ***Outreg2 doesn't allow pweight to estimate standardized coefficients reg lnincome i.female ib45.agegr ib2.educgr i.raceth i.marital i.migrant [aweight=perwt], beta ***Export to Excel (including adjusted R-squared) outreg2 using "$output\OLS.xls", append excel dec(3) ctitle(Model 4) nodepvar adjr2 e(r2) stat(beta) ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata07.dta", replace ***Save log log close