Analysis of Income vs. Air Quality
Overview:
The goal of the project was to determine if there was a correlation between income (measured by median income of a specific location, e.g. county/city) and air quality, measured by PM2.5, or particulate matter with a diameter of 2.5 micrometers or less.
Data acquisition
Data acquisition involved collecting data via the United States Census Bureau website, where a request was submitted for median income data for cities across the United States in 2023. Because the second part of the project required using counties as locations for FIPS, the project collected this additional information from the NIH HD Pulse California income data.
For the part of the project that analyzed cities across the United States, AQI data from the same time period was collected via the Air Quality Open Data Platform. AQI for California counties was also acquired from NIH HD Pulse.
Data cleaning
The project converted data into dataframe formats. Excess columns from AQI and Income were removed, leaving only the city/county, AQI, and income data. Income and AQI were converted into integer and float values, respectively. Then, a dataframe for both income and AQI were created. Because the AQI data contained significantly less observations than the income data, observations in the income data that contained a city/county not recorded in AQI data were deleted.
Exploratory Data Analysis
Income and AQI were related via a linear relationship. Linear correlation was very weak, with a correlation coefficient of -0.09.
Modeling
Income vs. AQI were plotted on a scatterplot.
For the portion of the project that used California counties, a heatmap was made that showed the range of income and air quality on a map of the United States.