Interpreting a Causal Effect for COVID-19 Mortality on Per-County Housing Prices

Posted by     Sam Lee on Tuesday, December 12, 2023

Introduction

In this analysis I seek to estimate the causal effect that 2020 COVID-19 mortality rates had on per-county housing prices in the United States in 2021. According to elementary economic theory, all else equal, when consumers of a good leave a market, demand decreases. This implies that prices decrease at every quantity demanded (Law of Demand). This theory may be grimly applied when we analyze COVID-19 mortality deaths from 2020. According to the Law of Demand, all else equal, we would assume that on average, housing prices would decrease given the decrease in the number of consumers in the housing market per county. Not only can this be empirically tested using econometric methods, but additionally, this could provide compelling evidence for why providing early care for COVID-related illnesses may have real economic benefits and impacts in external markets.

Data

To estimate the effect that 2020 COVID-19 mortality rates had on per-county housing prices in 2021, I merged per-county COVID-19 mortality rates with the per-county housing prices. Per-county COVID-19 mortality rates was obtained from NYT’s public database ( https://github.com/nytimes/covid-19-data). The us-counties-2020.csv data set contains the fips code for each county and the corresponding death count. This was merged with data from the U.S. Census Bureau (https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-total.html) to obtain the 2020 population totals for each county (see co-est2022-pop.xlsx). Mortality rate was then calculated as the death count for each county divided by the population for each county multiplied by 1000 (deaths per 1,000). This data set also contains the state for each county record, which will later be used to control for.

NOTE: Due to the jurisdiction of large cities such as Kansas City and New York City, and other cities that cross county boundaries and count COVID deaths differently than other counties, city-specific COVID deaths for some cities are often denoted by the city-jurisdiction as opposed to being disaggregated by the counties within. Notwithstanding the housing prices that are certain to vary within these jurisdictions, this made it difficult to match COVID-19 mortality for these jurisdictions and per-county housing. As a result, these records were dropped.

I then merged per-county housing prices to with the mortality data. To do this, I used zip-level housing price data (see HPI_AT_BDL_ZIP3.xlsx) maintained by the Federal Finance Housing Agency (FHFA) (https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx) and merged it with data that mapped zip-codes in 2021 to the corresponding counties. I used HUD’s ZIP-County Crosswalk API to obtain the zip codes that correspond to each county. The Python script, zip-county.py, uses this API to compile zip-county.csv. After combining these two data sets, the housing prices was condensed to create an average HPI (housing price index) for each county consisting of the HPIs from each zip code in each county. For the purposes of this analysis, to control for inflation, the HPI that’s been adjusted for inflation using 2000 year as a base year was used instead of raw HPI. From here on out in this analysis, HPI will be used to refer to as this HPI that’s been adjusted for inflation using 2000 as the base year.

Additionally, $6$ lag variables on the 2021 HPI were constructed using The housing prices data. (See A.1 for the selection process for the number of lag variables).

Merging the housing prices data set with the mortality data set yields the final data set for the econometric analysis: The response variable $\text{HPI}_{2021}$ will be measured on the independent metric of interest, $\text{Mortality Rate}$ for each county, and this will be controlled for using a series of state dummy variables and the $6$ lag variables on $\text{HPI}_{2021}$. This data set yielded a sample size = 3113 counties with $n = 50$ states.

Exploratory Data Analysis

Summary Statistics of Key Numeric Variables:

Variable Mean Median SD Min Max
HPI 2015 140.88 136.14 24.76 85.82 348.52
HPI 2016 145.87 140.07 25.17 90.43 322.69
HPI 2017 151.83 144.78 26.31 97.95 319.03
HPI 2018 158.83 151.28 28.06 104.95 334.27
HPI 2019 165.68 157.21 30.07 110.39 357.13
HPI 2020 171.50 162.37 31.38 115.14 367.03
HPI 2021 189.67 178.99 36.41 127.46 374.60
Mortality Rate 102.08 70.60 103.19 0.00 1062.03

How COVID-19 Mortality Rates Compare Against $\text{HPI}_{2021}$

Initial trends indicate that there’s a general negative trend between mortality rate and $\text{HPI}_{2021}$.

This demonstrates the variability between states’ $\text{HPI}_{2021}$ and COVID-19 mortality rates. The fact that the proportion of $\text{HPI}_{2021}$ and mortality rate varies drastically depending on state indicates that both mortality and $\text{HPI}_{2021}$ may depend on the state and state-specific policies. In econometric terms, we need to include state-specific fixed effects in our econometric model in order to further satisfy $E[\epsilon|X]=0$

Econometric Model

It is hypothesized that there are unobserved effects in the data that are dependent upon each state ($s$) such that for each county ($c$) that we wish to estimate the COVID-19 mortality effect on per-county housing prices in 2021 ($\text{HPI}_{2021}$), the error term ($\epsilon$) can thus be represented by $\epsilon_{sc}=\alpha_s+\eta_{st}$. We will remedy this issue by using a fixed effects model by including $n-1$ dummy variables for $n$ states.

Furthermore, due to the lagged effects in the model, the theoretical model incorporates some time effect $t$, where $t$ is the specified year. Including these lagged effects in the econometric model will make the estimate for the coefficient on COVID-19 mortality less biased since it will be ridden from the time-trends in HPI that occur due to the nature of time-series dependent data. Excluding the lags from model will likely lead to a biased coefficient on the estimated effect for mortality, as it would have encapsulated the effects of the lagged variables in $\eta_{sc}$ which would have otherwise been controlled for in the model Since $\text{HPI}_{2021}$ is significantly dependent upon $\text{HPI}_{2021-1},…,\text{HPI}_{2021-P}$ (as shown in A.1), where $P$ is the number of lags in the model (in this analysis we let $P=6$), then since $\delta_p > 0 \quad \forall p \in {1,…,P}$, then the coefficient for mortality will be overestimated if the lag variables are excluded from the model.

Hence, for year $t$, state $s$, and county $c$, we wish to estimate for $n$ # of states,

  1. $\text{HPI}_{sct} = \beta_0+\beta_1 + \text{Mortality}_{sct-1} + \sum_{j=1}^{n-1} + \beta_{j+1}I(\text{State}_s=j) +$

    $\sum_{p=1}^{6}\delta_{p} + \text{HPI}_{sct-p} + \eta_{sct}$

Where $\beta_1$ is the parameter of interest. Since we are only interested in the effect that 2020 COVID-19 mortality had on 2021 per-county housing prices, setting $t=2021$ yields a regression on a cross-sectional data set

Since we include only $n-1$ dummy variables for $n$ # of states, $X’X$ will be full rank and thus invertible. Since we control for state effects using a fixed effects model we further satisfy the assumption that $E[\eta_{sc}|X]=0$. This assumption is more reasonably satisfied with the inclusion of lagged effects on $\text{HPI}_{sct=2021}$. Since these lagged effects significantly affect the main response, $\text{HPI}_{sct=2021}$, including them in the model will isolate the estimated causal effect for $\beta_1$.

Findings

Using OLS to estimate $\beta_1$, we arrived at the following estimate (see A.2 for full model output):

Coefficient Estimate Std. Error T.Value P.Value 2.5 % 97.5 %
Mortality Rate -0.0031 0.0006105 -5.07407 0.00000041 -0.0043 -0.0019

Adjusted $R^2$ = 0.9936818

Controlling for the a state-fixed effects and lagged effects on $\text{HPI}_{sct=2021}$, the estimate for $\beta_1$ was found to be highly significant and negative.

Holding all else constant, an additional COVID-19 mortality per 1,000 residents decreases the county-average 2021 housing price index by an 0.0031, on average.

Holding all else constant, we are 95% confident that an additional COVID-19 mortality per 1,000 residents decreases the 2021 county-average housing price index between 0.0043 and 0.0019, on average.

Conclusions

The conclusions following the empirical analysis align with economic theory. On average, fewer consumers in the housing market lead to an average decrease in the price on average. Controlling for all other relevant effects, the increase in per-county COVID-19 mortality implies a decrease in per-county 2021 HPI, on average. Though the housing prices over the past six years have generally risen, there is significant evidence to believe that on the county-level, 2021 HPI would have risen higher if COVID-19 mortality could have been mitigated on the county-level as well.

Limitations

Though the econometric model (1) controlled for as much as bias as possible to obtain an unbiased estimate for $\beta_1$, the model suffers from missing data in $X$. Though the NYT’s public database is one of the most up to date data bases on per-county COVID mortalities, some counties were marked as “unknown”, due to some state health department procedures. In addition, as mentioned earlier, several geographical exceptions made it impractical to measure the impact of per-county COVID-19 mortalities on housing prices for larger jurisdictions that cover multiple counties such as New York and Kansas City. In the cases where multiple counties or other non-county geographies were grouped as a single county, the fips code was dropped making it impractical to join with per-county housing price data.

Additionally, housing price indices were aggregated over the county-specific zip-codes and computed as a mean for each county. A future analysis would weight these HPIs by the population weights in each zip code. This analysis assumes each zip code is weighted equally in population.

Appendix

A.1

I determined a sufficient # of lags ($P$) for the lagged effects on $\text{HPI}_{sct=2021}$ by regressing $\text{HPI}_{sct=2021}$ on $\text{HPI}_{sct-1},…,\text{HPI}_{sct-P}$ until the last the regression coefficient, $\delta_{P}$, was no longer significant for a chosen level of significance, $\alpha$. Let $\alpha=0.1$

Hence, let $P=6$.

A.2

Full regression output for estimates on (1). (See also log file)

Coefficient Estimate Std. Error T.Value P.Value 2.5 % 97.5 %
$\beta_0$ 9.08027 0.6515312 13.93682 0.00000000 7.80279 10.35776
Mortality Rate -0.00310 0.0006105 -5.07407 0.00000041 -0.00430 -0.00190
HPI 2015 -0.35359 0.0229598 -15.40042 0.00000000 -0.39861 -0.30857
HPI 2016 -0.02682 0.0452753 -0.59248 0.55357000 -0.11560 0.06195
HPI 2017 0.20687 0.0430709 4.80295 0.00000164 0.12242 0.29132
HPI 2018 -0.13237 0.0412066 -3.21236 0.00133018 -0.21317 -0.05157
HPI 2019 -0.59148 0.0437516 -13.51910 0.00000000 -0.67727 -0.50570
HPI 2020 1.86692 0.0324828 57.47427 0.00000000 1.80323 1.93061
Alaska 0.87602 0.7252829 1.20783 0.22720542 -0.54607 2.29811
Arizona 9.70355 0.8636497 11.23551 0.00000000 8.01015 11.39694
Arkansas 2.48756 0.4915096 5.06106 0.00000044 1.52384 3.45128
California 6.89466 0.6093635 11.31452 0.00000000 5.69985 8.08946
Colorado 2.87857 0.5502238 5.23163 0.00000018 1.79972 3.95741
Delaware 5.75894 1.7115600 3.36473 0.00077563 2.40302 9.11487
District of Columbia 0.44122 3.0003582 0.14705 0.88309895 -5.44171 6.32414
Florida 4.68318 0.5734236 8.16706 0.00000000 3.55885 5.80752
Georgia 0.42052 0.4279287 0.98268 0.32584226 -0.41854 1.25957
Hawaii 5.58311 1.4297505 3.90495 0.00009628 2.77974 8.38648
Idaho 10.65327 0.7255525 14.68297 0.00000000 9.23065 12.07589
Illinois 0.17282 0.4656942 0.37110 0.71058516 -0.74028 1.08593
Indiana 0.67766 0.4708831 1.43912 0.15021823 -0.24562 1.60094
Iowa -0.68887 0.4729804 -1.45645 0.14537018 -1.61627 0.23852
Kansas -0.08220 0.4611358 -0.17825 0.85854226 -0.98636 0.82197
Kentucky 0.40446 0.4493046 0.90018 0.36809361 -0.47651 1.28543
Louisiana -0.76242 0.5327908 -1.43100 0.15253188 -1.80709 0.28224
Maine 0.14863 0.8118053 0.18309 0.85473815 -1.44310 1.74037
Maryland 4.10228 0.7012481 5.84996 0.00000001 2.72731 5.47724
Massachusetts 2.26367 0.8603208 2.63120 0.00855122 0.57681 3.95054
Michigan 0.69564 0.4874590 1.42707 0.15366069 -0.26014 1.65142
Minnesota 1.33058 0.4866585 2.73410 0.00629108 0.37636 2.28479
Mississippi -0.22151 0.4857819 -0.45599 0.64842859 -1.17400 0.73098
Missouri 2.29708 0.4503887 5.10021 0.00000036 1.41398 3.18017
Montana 9.33223 0.6064473 15.38836 0.00000000 8.14314 10.52131
Nebraska 0.49024 0.4842142 1.01245 0.31140438 -0.45918 1.43966
Nevada 7.79373 0.8445802 9.22794 0.00000000 6.13773 9.44974
New Hampshire 1.98369 0.9927911 1.99809 0.04579509 0.03708 3.93029
New Jersey 3.80694 0.7612258 5.00107 0.00000060 2.31437 5.29951
New Mexico -0.03753 0.6253614 -0.06001 0.95215054 -1.26370 1.18864
New York 2.84983 0.5398307 5.27911 0.00000014 1.79136 3.90830
North Carolina 3.08102 0.4630017 6.65445 0.00000000 2.17320 3.98885
North Dakota 2.18345 0.6718277 3.25001 0.00116651 0.86617 3.50073
Ohio -0.90850 0.4762208 -1.90773 0.05651970 -1.84225 0.02524
Oklahoma 2.55833 0.4972734 5.14471 0.00000028 1.58330 3.53335
Oregon 5.49660 0.6729647 8.16774 0.00000000 4.17709 6.81611
Pennsylvania 1.16220 0.5061576 2.29612 0.02173633 0.16976 2.15464
Rhode Island 2.69825 1.3474705 2.00246 0.04532383 0.05621 5.34029
South Carolina 1.52320 0.5600854 2.71959 0.00657320 0.42502 2.62139
South Dakota 2.32386 0.5424788 4.28379 0.00001894 1.26020 3.38752
Tennessee 4.61973 0.4815199 9.59405 0.00000000 3.67559 5.56386
Texas 2.58799 0.4423724 5.85026 0.00000001 1.72062 3.45537
Utah 10.62956 0.6978804 15.23121 0.00000000 9.26120 11.99792
Vermont 0.94603 0.8657926 1.09268 0.27462088 -0.75156 2.64363
Virginia 2.44825 0.4504892 5.43464 0.00000006 1.56496 3.33154
Washington 4.22278 0.6794959 6.21458 0.00000000 2.89047 5.55510
West Virginia 0.55756 0.5466830 1.01990 0.30785529 -0.51434 1.62947
Wisconsin 0.32255 0.5009011 0.64393 0.51966771 -0.65959 1.30468
Wyoming 4.29918 0.7469335 5.75577 0.00000001 2.83464 5.76372