clarite.analyze.ewas¶

clarite.analyze.ewas(outcome: str, covariates: List[str], data: Any, regression_kind: Union[str, Type[clarite.modules.analyze.regression.base.Regression], None] = None, **kwargs)¶

Run an Environment-Wide Association Study

All variables in data other than the outcome (outcome) and covariates are tested individually. Individual regression classes selected with regression_kind may work slightly differently. Results are sorted in order of increasing pvalue

Parameters:

outcome: string: The variable to be used as the output of the regressions
covariates: list (strings),: The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.
data: Any, usually pd.DataFrame: The data to be analyzed, including the outcome, covariates, and any variables to be regressed.
regression_kind: str or subclass of Regression: This can be ‘glm’, ‘weighted_glm’, or ‘r_survey’ for built-in Regression types, or a custom subclass of Regression None by default to maintain existing api (glm unless SurveyDesignSpec exists, in which case weighted_glm)
kwargs: Keyword arguments specific to the Regression being used

Returns:

df: pd.DataFrame: EWAS results DataFrame with at least these columns: [‘N’, ‘pvalue’, ‘error’, ‘warnings’] indexed by the outcome and the variable being assessed in each row

Examples

>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery)
Running EWAS on a continuous variable