Health Calculation Functions

`health_impact_calcs.py`

The health_impact_calcs script file contains a number of functions that help calculate health impacts from exposure concentrations.

`create_hia_inputs`

Creates the hia_inputs object.

Inputs:
- pop: population object input
- load_file: a Boolean telling the program to load or not
- verbose: a Boolean telling the program to return additional log statements or not
- geodata: the geographic data from the ISRM
- incidence_fp: a string containing the filepath where the incidence data is stored
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs:
- a health data object ready for health calculations
Methodology
1. Allocates population to the ISRM grid using the population object and the ISRM geodata.
2. Initializes a health_data object from that allocated population.

`krewski`

Defines a Python function around the Krewski et al. (2009) function and endpoints

Inputs:
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- conc: a float with the exposure concentration for a given geography
- inc: a float with the background incidence for a given group in a given geography
- pop: a float with the population estimate for a given group in a given geography
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
Outputs
- a float estimating the number of excess mortalities for the endpoint across the group in a given geography
Methodology:
1. Based on the endpoint, grabs a beta parameter from Krewski et al. (2009).
2. Estimates excess mortality using the following equation, where $\beta$ is the endpoint parameter from Krewski et al. (2009), $d$ is the disease endpoint, $C$ is the concentration of PM_2.5, $i$ is the grid cell, $I$ is the baseline incidence, $g$ is the group, and $P$ is the population estimate.
\[1 - ( \frac{1}{\exp(\beta_{d} \times C_{i})} ) \times I_{i,d,g} \times P_{i,g}\]

`create_logging_code`

Makes a global logging code for easier updating

Inputs: None
Outputs:
- logging_code: a dictionary that maps endpoint names to log statement codes
Methodology:
1. Defines a dictionary and returns it.

`calculate_excess_mortality`

Estimates excess mortality for a given endpoint and function

Inputs:
- conc: a float with the exposure concentration for a given geography
- health_data_obj: a health_data object as defined in the health_data.py supporting script
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- function: the health impact function of choice (currently only krewski is built out)
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- pop_inc_conc: a dataframe containing excess mortality for the endpoint using the function provided
Methodology:
1. Creates clean, simplified copies of the detailed_conc method of the conc object and the pop_inc method of the health_data_obj.
2. Merges these two dataframes on the ISRM_ID field.
3. Estimates excess mortality on a row-by-row basis using the function.
4. Pivots the dataframe to get the individual races as columns.
5. Adds the geometry back in to make it geodata.
6. Updates the column names such that the excess mortality columns are ENDPOINT_GROUP.
7. Merges the population back into the dataframe.
8. Cleans up the dataframe.

`plot_total_mortality`

Creates a map image (PNG) of the excess mortality associated with an endpoint for a given group.

Inputs:
- hia_df: a dataframe containing excess mortality for the endpoint using the function provided
- ca_shp_fp: a filepath string of the California state boundary shapefile
- group: the racial/ethnic group name
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- output_dir: a filepath string of the location of the output directory
- f_out: the name of the file output category (will append additional information)
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- fname: a string filename made by combining the f_out with the group and endpoint.
Methodology:
1. Sets a few formatting standards within seaborn and matplotlib.pyplot.
2. Creates the output file directory and name string using f_out, group, and endpoint.
3. Reads in the California boundary and projects the hia_df to match the coordinate reference system of the California dataset.
4. Clips the dataframe to the California boundary.
5. Adds area-normalized columns to the hia_df for more intuitive plotting.
6. Grabs the minimums and sets them to 10^-9 in order to avoid logarithm conversion errors.
7. Updates the ‘MORT_OVER_POP’ column to avoid 100% mortality that arises from the update in step 6.
8. Initializes the figure and plots four panes:
  1. Population density: plots the area-normalized population estimates for the group on a log-normal scale.
  2. PM_2.5 exposure concentrations: plots the exposure concentration on a log-normal scale.
  3. Excess mortality per area: plots the excess mortality per unit area on a log-normal scale.
  4. Excess mortality per population: plots the excess mortality per population for the group on a log-normal scale.
9. Performs a bit of clean-up and formatting before exporting.

`export_health_impacts`

Exports mortality as a shapefile

Inputs:
- hia_df: a dataframe containing excess mortality for the endpoint using the function provided
- group: the racial/ethnic group name
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- output_dir: a filepath string of the location of the output directory
- f_out: the name of the file output category (will append additional information)
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- fname: a string filename made by combining the f_out with the group and endpoint.
Methodology:
1. Creates the output file path (fname) using inputs.
2. Creates endpoint short labels and updates column names since shapefiles can only have ten characters in column names.
3. Exports the geodataframe to shapefile.

`export_health_impacts_csv`

Exports mortality as a csv

Inputs:
- hia_df: a dataframe containing excess mortality for the endpoint using the function provided
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- output_dir: a filepath string of the location of the output directory
- f_out: the name of the file output category (will append additional information)
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- fname: a string filename made by combining the f_out with the group and endpoint.
Methodology:
1. Creates the output file path (fname) using inputs.
2. Revises column names for clarity
3. Exports the geodataframe to csv.

`create_summary_hia`

Creates a summary table of health impacts by racial/ethnic group

Inputs:
- hia_df: a dataframe containing excess mortality for the endpoint using the function provided
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- l: an intermediate string that has the endpoint label string (e.g., ACM_)
- endpoint_nice: an intermediate string that has a nicely formatted version of the endpoint (e.g., All Cause)
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- hia_summary: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
Methodology:
1. Cleans up the hia_df by changing column names and splitting population and mortality
2. Gets total population and mortality by group
3. Combines into one dataframe and cleans it up for export

`visualize_and_export_hia`

Calls plot_total_mortality and export_health_impacts in one clean function call.

Inputs:
- hia_df: a dataframe containing excess mortality for the endpoint using the function provided
- ca_shp_fp: a filepath string of the California state boundary shapefile
- group: the racial/ethnic group name
- endpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- output_dir: a filepath string of the location of the output directory
- f_out: the name of the file output category (will append additional information)
- shape_out: a filepath string for shapefiles
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
- debug_mode: a Boolean indicating whether or not to output debug statements
Outputs
- hia_summary: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
Methodology:
1. Calls plot_total_mortality.
2. Calls `export_health_impacts.

`combine_hia_summaries`

Combines the three endpoint summary tables into one export file

Inputs:
- acm_summary: a summary dataframe containing population, excess all-cause mortality, and all-cause mortality rates
- ihd_summary: a summary dataframe containing population, excess IHD mortality, and IHD mortality rates
- lcm_summary: a summary dataframe containing population, excess lung cancer mortality, and lung cancer mortality rates
- output_dir: a filepath string of the location of the output directory
- f_out: the name of the file output category (will append additional information)
- verbose: a Boolean indicating whether or not detailed logging statements should be printed
Outputs: None
Methodology:
1. Merges the summary dataframes together
2. Removes excess columns
3. Saves as CSV file

`create_rename_dict`

Makes a global rename code dictionary for easier updating

Inputs: None
Outputs:
- logging_code: a dictionary that maps endpoint names to log statement codes
Methodology:
1. Defines a dictionary and returns it.