************************************ ************************************ ***SOCI 600: INTRODUCTION TO SOCIOLOGICAL DATA ANALYSIS ***NORMAL CURVE ************************************ ************************************ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Shortcut for folders global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata04.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ************************************ ***GRAPH COMMAND TO GENERATE NORMAL DISTRIBUTION ************************************ ***Plot two normal distributions ***IQ scores for females and males graph twoway (function y=normalden(x,100,10), range(40 160) lcolor(maroon) lw(medthick)) /// (function y=normalden(x,100,20), range(40 160) lcolor(navy) lw(medthick)), /// title("Normal density of IQ scores for females and males", color(black)) /// xtitle("IQ Units", size(medlarge)) ytitle("") xlabel(40(10)160) /// xscale(lw(medthick)) yscale(lw(medthick)) /// legend(order(1 "Females" 2 "Males")) graphregion(fcolor(white)) ************************************ ***AREA UNDER THE NORMAL CURVE ***"normal" shows area below Z ************************************ ***Survey in a community collected information about age of individuals ***Average age = 35.5 ***Standard deviation of age = 10 ************************************ ***What's the probability of finding someone ***who is younger than 44 years of age? ***Estimate Z = (x - mean) / standard deviation display (44-35.5)/10 di (44-35.5)/10 ***Area below Z=0.85 di normal(0.85) ************************************ ***What's the probability of finding someone ***who is older than 40 years of age? ***Estimate Z = (x - mean) / standard deviation di (40-35.5)/10 ***Area above Z=0.45 di 1-normal(0.45) ************************************ ***What's the probability of finding someone ***who is younger than 22 years of age? ***Estimate Z = (x - mean) / standard deviation di (22-35.5)/10 ***Area below Z=-1.35 di normal(-1.35) ************************************ ***What's the probability of finding someone ***who is between 32 and 42 years of age? ***Estimate Z = (x - mean) / standard deviation di (32-35.5)/10 di (42-35.5)/10 ***Area between Z=-0.35 and Z=0.65 di normal(0.65)-normal(-0.35) ************************************ ***What's the probability of finding someone ***who is between 42 and 46 years of age? ***Estimate Z = (x - mean) / standard deviation di (42-35.5)/10 di (46-35.5)/10 ***Area between Z=0.65 and Z=1.05 di normal(1.05)-normal(0.65) ************************************ ***What's the probability of finding someone ***who is above 50 years of age? ***Estimate Z = (x - mean) / standard deviation di (50-35.5)/10 ***Area above Z=1.45 di 1-normal(1.45) ************************************ ***DISTRIBUTION OF INCOME ***AMERICAN COMMUNITY SURVEY ************************************ ***Open 2021 ACS (only Texas) use "$data\ACS2021.dta", clear ***Complex survey design svyset cluster [pweight=perwt], strata(strata) singleunit(scaled) ***Wage and salary income gen income=. replace income=incwage if incwage!=0 & incwage!=999999 ***Histogram of income hist income, norm percent ***Boxplot of income graph hbox income ***Quantile-normal plot of income qnorm income ***Power transformation ***q<1 (reduce positive skew) ***log(y): q=0 gen lnincome = ln(income) ***Histogram of log of income hist lnincome, norm percent ***Boxplot of log of income graph hbox lnincome ***Quantile-normal plot of log income qnorm lnincome ************************************ ***What's the probability of finding someone ***who makes more than $50,000 per year? ***Original income variable (income) ***This variable does not have a normal distribution sum income ***Log of income (lnincome) ***This variable has a distribution closer to normal svy: mean lnincome estat sd ***Mean = 10.27 ***Standard deviation = 1.23 ***$50,000 in log scale di ln(50000) ***Estimate Z = (x - mean) / standard deviation di (10.82-10.27)/1.23 ***Area above Z=0.45 di 1-normal(0.45) ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata04.dta", replace ***Save log log close