clarite.analyze.ewas¶
-
clarite.analyze.
ewas
(phenotype: str, covariates: List[str], data: pandas.core.frame.DataFrame, survey_design_spec: Union[clarite.modules.survey.survey_design.SurveyDesignSpec, NoneType] = None, cov_method: Union[str, NoneType] = 'stata', min_n: Union[int, NoneType] = 200)¶ Run an EWAS on a phenotype.
- Note:
- Binary variables are treated as continuous features, with values of 0 and 1.
- The results of a likelihood ratio test are used for categorical variables, so no Beta values or SE are reported.
- The regression family is automatically selected based on the type of the phenotype. * Continuous phenotypes use gaussian regression * Binary phenotypes use binomial regression (the larger of the two values is counted as “success”)
- Categorical variables run with a survey design will not report Diff_AIC
Parameters: - phenotype: string
The variable to be used as the output of the regressions
- covariates: list (strings),
The variables to be used as covariates. Any variables in the DataFrames not listed as covariates are regressed.
- data: pd.DataFrame
The data to be analyzed, including the phenotype, covariates, and any variables to be regressed.
- survey_design_spec: SurveyDesignSpec or None
A SurveyDesignSpec object is used to create SurveyDesign objects for each regression.
- cov_method: str or None
Covariance calculation method (if survey_design_spec is passed in). ‘stata’ or ‘jackknife’
- min_n: int or None
Minimum number of complete-case observations (no NA values for phenotype, covariates, variable, or weight) Defaults to 200
Returns: - df: pd.DataFrame
EWAS results DataFrame with these columns: [‘variable_type’, ‘N’, ‘beta’, ‘SE’, ‘var_pvalue’, ‘LRT_pvalue’, ‘diff_AIC’, ‘pvalue’]
Examples
>>> ewas_discovery = clarite.analyze.ewas("logBMI", covariates, nhanes_discovery) Running EWAS on a continuous variable