Case-Study 27: Evidence of well-being, health behavior, and environmental factors using multiple data sources Summer 2018 Name: Zhanjia Zhang Well-being is generally defined as the global judgment of one’s life satisfaction and feelings, and has been shown to relate to longevity and a plethora of health outcomes. In the past decades, well-being has received increasing concerns among the public. For example, many countries, such as France and Canada, now have include well-being as an index of national development. Goal: Identify the factors that are potentially associated with well-being. COntemporary “big biomedical data” enables the investigation of outcomes of interest among large population. Examine community/county/state-level characteristics of well-being. 1) identify the publicly available datasets that contain the measure of well-being; 2) examine the geographical pattern of well-being using these datasets; 3) investigate the relationship of well-being with individual-level and county-level factors; 4) determine whether the use of machine learning algorithms significantly improve the performance of prediction of well-being. Data: From ICPSR, DATA.GOV, and HealthData.Gov about “well-being”, “happiness”, and “life satisfaction” within the US. 1) 2010 Behavioral Risk Factor Surveillance System (BRFSS) survey dataset includes individual-level variables on health-related outcomes besides well-being, county-level geographical information (i.e., county FIPS code), which made it feasible to merge with county-level variables from other data sources. URL: https://www.cdc.gov/brfss/annual_data/annual_2010.htm. 2) County-level variables from the CDC WONDER, which was a system for disseminating public health data and information, the US Census Bureau, which comprised a lot of county-level demographic and socioeconomic status variables, the County Health Rankings. URL: https://wonder.cdc.gov/datasets.html. 3) Using the BRFSS and the three county-level data sources, a total of 32 non-redundant variables (20 individual-level and 12 county-level) were selected covering demographics, SES, health behaviors, health status, built environment, social and economic environment, and so forth. The individual-level variable included gender, age, well-being, social support, race, income level, marital status, perceived level of general health, physical health, mental health, asthma, cardiovascular diseases, diabetes, health care access, activity limitation due to health problems, employment status, body mass index (BMI), alcohol consumption, tobacco use, sleep quality, and physical activity. The county-level variables included population density, air quality, premature mortality, median household income, percent of adults with some post-secondary education, high school graduation rate, percent of adults with some post-secondary education, unemployment rate, violent crime rate, drinking water safety, access to parks, access to recreational facilities, fast food restaurant rate.