clarite.analyze.ewas

clarite.analyze.ewas(phenotype: str, covariates: List[str], data: pandas.core.frame.DataFrame, survey_design_spec: Union[clarite.modules.survey.survey_design.SurveyDesignSpec, NoneType] = None, cov_method: Union[str, NoneType] = 'stata', min_n: Union[int, NoneType] = 200)

Run an EWAS on a phenotype.

Note:
  • Binary variables are treated as continuous features, with values of 0 and 1.
  • The results of a likelihood ratio test are used for categorical variables, so no Beta values or SE are reported.
  • The regression family is automatically selected based on the type of the phenotype. * Continuous phenotypes use gaussian regression * Binary phenotypes use binomial regression (the larger of the two values is counted as “success”)
  • Categorical variables run with a survey design will not report Diff_AIC
Parameters:
phenotype: string

The variable to be used as the output of the regressions

covariates: list (strings),

The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.

data: pd.DataFrame

The data to be analyzed, including the phenotype, covariates, and any variables to be regressed.

survey_design_spec: SurveyDesignSpec or None

A SurveyDesignSpec object is used to create SurveyDesign objects for each regression.

cov_method: str or None

Covariance calculation method (if survey_design_spec is passed in). ‘stata’ or ‘jackknife’

min_n: int or None

Minimum number of complete-case observations (no NA values for phenotype, covariates, variable, or weight) Defaults to 200

Returns:
df: pd.DataFrame

EWAS results DataFrame with these columns: [‘variable_type’, ‘N’, ‘beta’, ‘SE’, ‘var_pvalue’, ‘LRT_pvalue’, ‘diff_AIC’, ‘pvalue’]

Examples

>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery)
Running EWAS on a continuous variable