clarite.analyze.ewas¶

clarite.analyze.ewas(phenotype: str, covariates: List[str], data: pandas.core.frame.DataFrame, survey_design_spec: Union[clarite.modules.survey.survey_design.SurveyDesignSpec, NoneType] = None, cov_method: Union[str, NoneType] = 'stata', min_n: Union[int, NoneType] = 200)¶

Run an EWAS on a phenotype.

Note:

Binary variables are treated as continuous features, with values of 0 and 1.
The results of a likelihood ratio test are used for categorical variables, so no Beta values or SE are reported.
The regression family is automatically selected based on the type of the phenotype. * Continuous phenotypes use gaussian regression * Binary phenotypes use binomial regression (the larger of the two values is counted as “success”)
Categorical variables run with a survey design will not report Diff_AIC

Parameters:

phenotype: string: The variable to be used as the output of the regressions
covariates: list (strings),: The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.
data: pd.DataFrame: The data to be analyzed, including the phenotype, covariates, and any variables to be regressed.
survey_design_spec: SurveyDesignSpec or None: A SurveyDesignSpec object is used to create SurveyDesign objects for each regression.
cov_method: str or None: Covariance calculation method (if survey_design_spec is passed in). ‘stata’ or ‘jackknife’
min_n: int or None: Minimum number of complete-case observations (no NA values for phenotype, covariates, variable, or weight) Defaults to 200

Returns:

df: pd.DataFrame: EWAS results DataFrame with these columns: [‘variable_type’, ‘N’, ‘beta’, ‘SE’, ‘var_pvalue’, ‘LRT_pvalue’, ‘diff_AIC’, ‘pvalue’]

Examples

>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery)
Running EWAS on a continuous variable