/*********************************** ************************************ ***SOCI 600: INTRODUCTION TO SOCIOLOGICAL DATA ANALYSIS ***INTRODUCTION ************************************ ************************************ ************************************ ***AMERICAN COMMUNITY SURVEY (ACS) DATA ************************************ Download the ACS material for this course in this link: http://www.ernestoamaral.com/docs/soci600-24fall/course.zip Save and uncompress the ACS material in the Home Drive (H:\) in VOAL. Basically, it will create a folder called "course" with four sub-folders: "codes": we will use this folder to save Stata DO files and LOG files "data": American Community Survey microdata "documents": codebooks, information on income variable, questionnaires "output": we will use this folder to save tables and figures */ ************************************ ***CLEAR MEMORY ************************************ clear all ************************************ ***CREATE SHORTCUTS AND LOG FILE ************************************ ***Windows uses backslash "\". Throughout the course, we use backslash, because the VOAL is based on Windows. ***Macintosh uses forward slash "/". ***Create shortcuts for folders, according to their location ***in your computer or VOAL, which might be different than the ones below global codes = "H:\course\codes" global data = "H:\course\data" global output = "H:\course\output" ***Start saving results window log using "$codes\Stata01.log", replace text ************************************ ***OPENING COMMANDS ************************************ ***Tell Stata to not pause for "more" messages set more off ***Open 2021 ACS (only Texas) use "$data\ACS2021.dta", clear ************************************ ***SAMPLE SIZE ************************************ count ***Year tabulate year tab year, missing tab year, m /* ************************************ ***RELATIONAL OPERATORS ************************************ The relational operators are: > (greater than) < (less than) >= (greater than or equal) <= (less than or equal) == (equal) != (not equal) Observe that the relational operator for equality is a pair of equal signs. This convention distinguishes relational equality from the first equality to generate a variable. See example below... */ ************************************ ***SEX ************************************ tab sex tab sex, m // show missing cases tab sex, nolabel // hide label ***List names of value labels label dir ***List names and contents of sex label label list sex_lbl ***Generate dummy variable for female generate female=. replace female=0 if sex==1 // Male replace female=1 if sex==2 // Female ***Verify new variable tab sex female, m ***Create label for variable label variable female "Sex" ***Create labels for categories label define female 0 "Male" 1 "Female" ***Assign labels for categories label values female female ***Verify new variable with labels tab female tab sex female, m ************************************ ***RACE/ETHNICITY ************************************ ***Race tab race, m tab race, nolabel ***Ethnicity tab hispan, m tab hispan, nolabel ***Cross tabulation of race and ethnicity tab race hispan, m tab race hispan, nolabel ***List names and contents of race and ethnicity labels label list race_lbl hispan_lbl ***Generate race/ethnicity variable gen raceth=. replace raceth=1 if race==1 & hispan==0 // White replace raceth=2 if race==2 & hispan==0 // Black replace raceth=3 if hispan>=1 & hispan<=4 // Hispanic replace raceth=4 if (race==4 | race==5 | race==6) & hispan==0 // Asian replace raceth=5 if race==3 & hispan==0 // Native American replace raceth=6 if (race==7 | race==8 | race==9) & hispan==0 // Other ***Create label for variable label variable raceth "Race/ethnicity" ***Create labels for categories label define raceth 1 "White" 2 "African American" 3 "Hispanic" /// 4 "Asian" 5 "Native American" 6 "Other races" ***Assign labels for categories label values raceth raceth ***Verify new variable tab raceth, m tab race raceth, m tab hispan raceth, m ************************************ ***AGE ************************************ sum age, d ***Generate age group variable - manually gen agegr1=. replace agegr1=0 if age>=0 & age<=15 replace agegr1=16 if age>=16 & age<=19 replace agegr1=20 if age>=20 & age<=24 replace agegr1=25 if age>=25 & age<=34 replace agegr1=35 if age>=35 & age<=44 replace agegr1=45 if age>=45 & age<=54 replace agegr1=55 if age>=55 & age<=64 replace agegr1=65 if age>=65 & age<=100 ***Verify new variable tab agegr1, m table agegr1, statistic(min age) statistic(max age) statistic(count age) ***Generate age group variable - automatically egen agegr2 = cut(age), at(0,16,20,25,35,45,55,65,100) ***Verify new variable tab agegr2, m table agegr2, statistic(min age) statistic(max age) statistic(count age) ***Create label for variables label variable agegr1 "Age group" label variable agegr2 "Age group" ***Create labels for categories label define agecode 0 "0-15" 16 "16-19" 20 "20-24" 25 "25-34" /// 35 "35-44" 45 "45-54" 55 "55-64" 65 "65-100" ***Assign labels for categories label values agegr1 agegr2 agecode ***Verify new variables tab agegr1, m tab age agegr1, m tab agegr2, m tab age agegr2, m ************************************ ***EDUCATIONAL ATTAINMENT ************************************ tab educ, m tab educ, nolabel ***List names and contents of education label label list educ_lbl ***Generate new educational attainment variable gen educgr=. replace educgr=1 if educ>=0 & educ<=5 // Less than high school replace educgr=2 if educ==6 // High school replace educgr=3 if educ==7 | educ==8 // Some college replace educgr=4 if educ==10 // College replace educgr=5 if educ==11 // 5+ years of college, graduate school ***Create label for variable label variable educgr "Educational attainment" ***Create labels for categories label define educgr 1 "Less than high school" 2 "High school" /// 3 "Some college" 4 "College" 5 "Graduate school" ***Assign labels for categories label values educgr educgr ***Verify new variable tab educgr, m tab educ educgr, m ********************** ***MARITAL STATUS ********************** tab marst, m tab marst, nolabel ***List names and contents of marital status label label list marst_lbl ***Generate new marital status variable gen marital=. replace marital=1 if marst==1 | marst==2 // Married replace marital=2 if marst>=3 & marst<=5 // Separated, divorced, widowed replace marital=3 if marst==6 // Never married, single ***Create label for variable label variable marital "Marital status" ***Create labels for categories label define marital 1 "Married" 2 "Separated, divorced, widowed" 3 "Never married" ***Assign labels for categories label values marital marital ***Verify new variable tab marital, m tab marst marital, m ************************************ ***MIGRATION STATUS (detailed version, 7 categories) ************************************ tab migrate1d, m tab migrate1d, nolabel ***List names and contents of migration status label label list migrate1d_lbl ***Who are not applicable (N/A)? tab age if migrate1d==0, m ***Generate new migration status variable gen migrant=. replace migrant=1 if migrate1d==10 | migrate1d==23 // same house or within PUMA replace migrant=2 if migrate1d>=24 & migrate1d<=32 // internal migrant replace migrant=3 if migrate1d==40 // international migrant ***Create label for variable label variable migrant "Migration status" ***Create labels for categories label define migrant 1 "Non-migrant" 2 "Internal migrant" 3 "International migrant" ***Assign labels for categories label values migrant migrant ***Verify new variable tab migrant, m tab migrate1d migrant, m ************************************ ***WAGE AND SALARY INCOME ************************************ sum incwage, d ***Generate new income variable gen income=. replace income=incwage if incwage!=999999 ***Create label for variable label variable income "Wage and salary income" ***Verify number of missing cases codebook income count if income==. ***Verify new variable sum income, d hist income, percent ************************************ ***CLOSING COMMANDS ************************************ ***Save data save "$data\Stata01.dta", replace ***Save log log close