health_impact_calcs.py
The health_impact_calcs script file contains a number of functions that help calculate health impacts from exposure concentrations.
create_hia_inputs
Creates the hia_inputs object.
- Inputs:
pop: population object inputload_file: a Boolean telling the program to load or notverbose: a Boolean telling the program to return additional log statements or notgeodata: the geographic data from the ISRMincidence_fp: a string containing the filepath where the incidence data is storeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs:
- a health data object ready for health calculations
- Methodology
- Allocates population to the ISRM grid using the population object and the ISRM geodata.
- Initializes a health_data object from that allocated population.
krewski
Defines a Python function around the Krewski et al. (2009) function and endpoints
- Inputs:
verbose: a Boolean indicating whether or not detailed logging statements should be printedconc: a float with the exposure concentration for a given geographyinc: a float with the background incidence for a given group in a given geographypop: a float with the population estimate for a given group in a given geographyendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’
- Outputs
- a float estimating the number of excess mortalities for the
endpointacross the group in a given geography
- a float estimating the number of excess mortalities for the
- Methodology:
- Based on the
endpoint, grabs abetaparameter from Krewski et al. (2009). - Estimates excess mortality using the following equation, where $\beta$ is the endpoint parameter from Krewski et al. (2009), $d$ is the disease endpoint, $C$ is the concentration of PM2.5, $i$ is the grid cell, $I$ is the baseline incidence, $g$ is the group, and $P$ is the population estimate.
- Based on the
create_logging_code
Makes a global logging code for easier updating
- Inputs: None
- Outputs:
logging_code: a dictionary that maps endpoint names to log statement codes
- Methodology:
- Defines a dictionary and returns it.
calculate_excess_mortality
Estimates excess mortality for a given endpoint and function
- Inputs:
conc: a float with the exposure concentration for a given geographyhealth_data_obj: ahealth_dataobject as defined in thehealth_data.pysupporting scriptendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’function: the health impact function of choice (currently onlykrewskiis built out)verbose: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
pop_inc_conc: a dataframe containing excess mortality for theendpointusing thefunctionprovided
- Methodology:
- Creates clean, simplified copies of the
detailed_concmethod of theconcobject and thepop_incmethod of thehealth_data_obj. - Merges these two dataframes on the ISRM_ID field.
- Estimates excess mortality on a row-by-row basis using the
function. - Pivots the dataframe to get the individual races as columns.
- Adds the geometry back in to make it geodata.
- Updates the column names such that the excess mortality columns are ENDPOINT_GROUP.
- Merges the population back into the dataframe.
- Cleans up the dataframe.
- Creates clean, simplified copies of the
plot_total_mortality
Creates a map image (PNG) of the excess mortality associated with an endpoint for a given group.
- Inputs:
hia_df: a dataframe containing excess mortality for theendpointusing thefunctionprovidedca_shp_fp: a filepath string of the California state boundary shapefilegroup: the racial/ethnic group nameendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir: a filepath string of the location of the output directoryf_out: the name of the file output category (will append additional information)verbose: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
fname: a string filename made by combining thef_outwith thegroupandendpoint.
- Methodology:
- Sets a few formatting standards within
seabornandmatplotlib.pyplot. - Creates the output file directory and name string using
f_out,group, andendpoint. - Reads in the California boundary and projects the
hia_dfto match the coordinate reference system of the California dataset. - Clips the dataframe to the California boundary.
- Adds area-normalized columns to the
hia_dffor more intuitive plotting. - Grabs the minimums and sets them to 10-9 in order to avoid logarithm conversion errors.
- Updates the ‘MORT_OVER_POP’ column to avoid 100% mortality that arises from the update in step 6.
- Initializes the figure and plots four panes:
- Population density: plots the area-normalized population estimates for the group on a log-normal scale.
- PM2.5 exposure concentrations: plots the exposure concentration on a log-normal scale.
- Excess mortality per area: plots the excess mortality per unit area on a log-normal scale.
- Excess mortality per population: plots the excess mortality per population for the group on a log-normal scale.
- Performs a bit of clean-up and formatting before exporting.
- Sets a few formatting standards within
export_health_impacts
Exports mortality as a shapefile
- Inputs:
hia_df: a dataframe containing excess mortality for theendpointusing thefunctionprovidedgroup: the racial/ethnic group nameendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir: a filepath string of the location of the output directoryf_out: the name of the file output category (will append additional information)verbose: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
fname: a string filename made by combining thef_outwith thegroupandendpoint.
- Methodology:
- Creates the output file path (
fname) using inputs. - Creates endpoint short labels and updates column names since shapefiles can only have ten characters in column names.
- Exports the geodataframe to shapefile.
- Creates the output file path (
export_health_impacts_csv
Exports mortality as a csv
- Inputs:
hia_df: a dataframe containing excess mortality for theendpointusing thefunctionprovidedendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir: a filepath string of the location of the output directoryf_out: the name of the file output category (will append additional information)verbose: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
fname: a string filename made by combining thef_outwith thegroupandendpoint.
- Methodology:
- Creates the output file path (
fname) using inputs. - Revises column names for clarity
- Exports the geodataframe to csv.
- Creates the output file path (
create_summary_hia
Creates a summary table of health impacts by racial/ethnic group
- Inputs:
hia_df: a dataframe containing excess mortality for theendpointusing thefunctionprovidedendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’verbose: a Boolean indicating whether or not detailed logging statements should be printedl: an intermediate string that has the endpoint label string (e.g., ACM_)endpoint_nice: an intermediate string that has a nicely formatted version of the endpoint (e.g., All Cause)debug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
hia_summary: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
- Methodology:
- Cleans up the hia_df by changing column names and splitting population and mortality
- Gets total population and mortality by group
- Combines into one dataframe and cleans it up for export
visualize_and_export_hia
Calls plot_total_mortality and export_health_impacts in one clean function call.
- Inputs:
hia_df: a dataframe containing excess mortality for theendpointusing thefunctionprovidedca_shp_fp: a filepath string of the California state boundary shapefilegroup: the racial/ethnic group nameendpoint: a string containing either ‘ALL CAUSE’, ‘ISCHEMIC HEART DISEASE’, or ‘LUNG CANCER’output_dir: a filepath string of the location of the output directoryf_out: the name of the file output category (will append additional information)shape_out: a filepath string for shapefilesverbose: a Boolean indicating whether or not detailed logging statements should be printeddebug_mode: a Boolean indicating whether or not to output debug statements
- Outputs
hia_summary: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic group
- Methodology:
- Calls
plot_total_mortality. - Calls `export_health_impacts.
- Calls
combine_hia_summaries
Combines the three endpoint summary tables into one export file
- Inputs:
acm_summary: a summary dataframe containing population, excess all-cause mortality, and all-cause mortality ratesihd_summary: a summary dataframe containing population, excess IHD mortality, and IHD mortality rateslcm_summary: a summary dataframe containing population, excess lung cancer mortality, and lung cancer mortality ratesoutput_dir: a filepath string of the location of the output directoryf_out: the name of the file output category (will append additional information)verbose: a Boolean indicating whether or not detailed logging statements should be printed
- Outputs: None
- Methodology:
- Merges the summary dataframes together
- Removes excess columns
- Saves as CSV file
create_rename_dict
Makes a global rename code dictionary for easier updating
- Inputs: None
- Outputs:
logging_code: a dictionary that maps endpoint names to log statement codes
- Methodology:
- Defines a dictionary and returns it.