clarite.analyze.ewas

clarite.analyze.ewas(outcome: str, covariates: List[str], data: Any, regression_kind: Union[str, Type[clarite.modules.analyze.regression.base.Regression], None] = None, **kwargs)

Run an Environment-Wide Association Study

All variables in data other than the outcome (outcome) and covariates are tested individually. Individual regression classes selected with regression_kind may work slightly differently. Results are sorted in order of increasing pvalue

Parameters:
outcome: string

The variable to be used as the output of the regressions

covariates: list (strings),

The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.

data: Any, usually pd.DataFrame

The data to be analyzed, including the outcome, covariates, and any variables to be regressed.

regression_kind: str or subclass of Regression

This can be ‘glm’, ‘weighted_glm’, or ‘r_survey’ for built-in Regression types, or a custom subclass of Regression None by default to maintain existing api (glm unless SurveyDesignSpec exists, in which case weighted_glm)

kwargs: Keyword arguments specific to the Regression being used
Returns:
df: pd.DataFrame

EWAS results DataFrame with at least these columns: [‘N’, ‘pvalue’, ‘error’, ‘warnings’] indexed by the outcome and the variable being assessed in each row

Examples

>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery)
Running EWAS on a continuous variable