health_impact_calcs.py
The health_impact_calcs
script file contains a number of functions that help calculate health impacts from exposure concentrations.
create_hia_inputs
Creates the hia_inputs object.
- Inputs:
pop
: population object inputload_file
: a Boolean telling the program to load or notverbose
: a Boolean telling the program to return additional log statements or notgeodata
: the geographic data from the ISRMincidence_fp
: a string containing the filepath where the incidence data is storeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs:
- a health data object ready for health calculations
- Methodology
- Allocates population to the ISRM grid using the population object and the ISRM geodata.
- Initializes a health_data object from that allocated population.
krewski
Defines a Python function around the Krewski et al. (2009) function and endpoints
- Inputs:
verbose
: a Boolean indicating whether or not detailed logging statements should be printedconc
: a float with the exposure concentration for a given geographyinc
: a float with the background incidence for a given group in a given geographypop
: a float with the population estimate for a given group in a given geographyendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- Outputs
- a float estimating the number of excess mortalities for the
endpoint
across the group in a given geography
- a float estimating the number of excess mortalities for the
- Methodology:
- Based on the
endpoint
, grabs abeta
parameter from Krewski et al. (2009). - Estimates excess mortality using the following equation, where $\beta$ is the endpoint parameter from Krewski et al. (2009), $d$ is the disease endpoint, $C$ is the concentration of PM2.5, $i$ is the grid cell, $I$ is the baseline incidence, $g$ is the group, and $P$ is the population estimate.
- Based on the
create_logging_code
Makes a global logging code for easier updating
- Inputs: None
- Outputs:
logging_code
: a dictionary that maps endpoint names to log statement codes
- Methodology:
- Defines a dictionary and returns it.
calculate_excess_mortality
Estimates excess mortality for a given endpoint
and function
- Inputs:
conc
: a float with the exposure concentration for a given geographyhealth_data_obj
: ahealth_data
object as defined in thehealth_data.py
supporting scriptendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’function
: the health impact function of choice (currently onlykrewski
is built out)verbose
: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
pop_inc_conc
: a dataframe containing excess mortality for theendpoint
using thefunction
provided
- Methodology:
- Creates clean, simplified copies of the
detailed_conc
method of theconc
object and thepop_inc
method of thehealth_data_obj
. - Merges these two dataframes on the ISRM_ID field.
- Estimates excess mortality on a row-by-row basis using the
function
. - Pivots the dataframe to get the individual races as columns.
- Adds the geometry back in to make it geodata.
- Updates the column names such that the excess mortality columns are ENDPOINT_GROUP.
- Merges the population back into the dataframe.
- Cleans up the dataframe.
- Creates clean, simplified copies of the
plot_total_mortality
Creates a map image (PNG) of the excess mortality associated with an endpoint
for a given group
.
- Inputs:
hia_df
: a dataframe containing excess mortality for theendpoint
using thefunction
providedca_shp_fp
: a filepath string of the California state boundary shapefilegroup
: the racial/ethnic group nameendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)verbose
: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
fname
: a string filename made by combining thef_out
with thegroup
andendpoint
.
- Methodology:
- Sets a few formatting standards within
seaborn
andmatplotlib.pyplot
. - Creates the output file directory and name string using
f_out
,group
, andendpoint
. - Reads in the California boundary and projects the
hia_df
to match the coordinate reference system of the California dataset. - Clips the dataframe to the California boundary.
- Adds area-normalized columns to the
hia_df
for more intuitive plotting. - Grabs the minimums and sets them to 10-9 in order to avoid logarithm conversion errors.
- Updates the ‘MORT_OVER_POP’ column to avoid 100% mortality that arises from the update in step 6.
- Initializes the figure and plots four panes:
- Population density: plots the area-normalized population estimates for the group on a log-normal scale.
- PM2.5 exposure concentrations: plots the exposure concentration on a log-normal scale.
- Excess mortality per area: plots the excess mortality per unit area on a log-normal scale.
- Excess mortality per population: plots the excess mortality per population for the group on a log-normal scale.
- Performs a bit of clean-up and formatting before exporting.
- Sets a few formatting standards within
export_health_impacts
Exports mortality as a shapefile
- Inputs:
hia_df
: a dataframe containing excess mortality for theendpoint
using thefunction
providedgroup
: the racial/ethnic group nameendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)verbose
: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
fname
: a string filename made by combining thef_out
with thegroup
andendpoint
.
- Methodology:
- Creates the output file path (
fname
) using inputs. - Creates endpoint short labels and updates column names since shapefiles can only have ten characters in column names.
- Exports the geodataframe to shapefile.
- Creates the output file path (
export_health_impacts_csv
Exports mortality as a csv
- Inputs:
hia_df
: a dataframe containing excess mortality for theendpoint
using thefunction
providedendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)verbose
: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
fname
: a string filename made by combining thef_out
with thegroup
andendpoint
.
- Methodology:
- Creates the output file path (
fname
) using inputs. - Revises column names for clarity
- Exports the geodataframe to csv.
- Creates the output file path (
create_summary_hia
Creates a summary table of health impacts by racial/ethnic group
- Inputs:
hia_df
: a dataframe containing excess mortality for theendpoint
using thefunction
providedendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’verbose
: a Boolean indicating whether or not detailed logging statements should be printedl
: an intermediate string that has the endpoint label string (e.g., ACM_)endpoint_nice
: an intermediate string that has a nicely formatted version of the endpoint (e.g., All Cause)debug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
hia_summary
: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
- Methodology:
- Cleans up the hia_df by changing column names and splitting population and mortality
- Gets total population and mortality by group
- Combines into one dataframe and cleans it up for export
visualize_and_export_hia
Calls plot_total_mortality
and export_health_impacts
in one clean function call.
- Inputs:
hia_df
: a dataframe containing excess mortality for theendpoint
using thefunction
providedca_shp_fp
: a filepath string of the California state boundary shapefilegroup
: the racial/ethnic group nameendpoint
: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)shape_out
: a filepath string for shapefilesverbose
: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode
: a Boolean indicating whether or not to output debug statements
- Outputs
hia_summary
: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
- Methodology:
- Calls
plot_total_mortality
. - Calls `export_health_impacts.
- Calls
combine_hia_summaries
Combines the three endpoint summary tables into one export file
- Inputs:
acm_summary
: a summary dataframe containing population, excess all-cause mortality, and all-cause mortality ratesihd_summary
: a summary dataframe containing population, excess IHD mortality, and IHD mortality rateslcm_summary
: a summary dataframe containing population, excess lung cancer mortality, and lung cancer mortality ratesoutput_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)verbose
: a Boolean indicating whether or not detailed logging statements should be printed
- Outputs: None
- Methodology:
- Merges the summary dataframes together
- Removes excess columns
- Saves as CSV file
create_rename_dict
Makes a global rename code dictionary for easier updating
- Inputs: None
- Outputs:
logging_code
: a dictionary that maps endpoint names to log statement codes
- Methodology:
- Defines a dictionary and returns it.