API Reference¶

CLARITE functions are organized into several modules:

Analyze¶

Functions used for analyses such as EWAS

ewas(outcome, covariates, data, …) Run an Environment-Wide Association Study

interaction_test(outcome_variable, …) Perform LRT tests comparing a model with interaction terms to one without.

add_corrected_pvalues(data, pvalue, groupby, …) Calculate bonferroni and FDR pvalues and sort by increasing FDR (in-place).

Describe¶

Functions that are used to gather information about some data

correlations(data, threshold) Return variables with pearson correlation above the threshold

freq_table(data) Return the count of each unique value for all binary and categorical variables.

get_types(data) Return the type of each variable

percent_na(data) Return the percent of observations that are NA for each variable

skewness(data, dropna) Return the skewness of each continuous variable

summarize(data) Print the number of each type of variable and the number of observations

Load¶

Load data from different formats or sources

from_tsv(filename, index_col, int, …) Load data from a tab-separated file into a DataFrame

from_csv(filename, index_col, int, …) Load data from a comma-separated file into a DataFrame

Modify¶

Functions used to filter and/or change some data, always taking in one set of data and returning one set of data.

categorize(data, cat_min, cat_max, cont_min) Classify variables into constant, binary, categorical, continuous, and ‘unknown’.

colfilter(data, skip, List[str], …) Remove some variables (skip) or keep only certain variables (only)

colfilter_percent_zero(data, filter_percent, …) Remove continuous variables which have <proportion> or more values of zero (excluding NA)

colfilter_min_n(data, n, skip, List[str], …) Remove variables which have less than <n> non-NA values

colfilter_min_cat_n(data, n, skip, …) Remove binary and categorical variables which have less than <n> occurences of each unique value

make_binary(data, skip, List[str], …) Set variable types as Binary

make_categorical(data, skip, List[str], …) Set variable types as Categorical

make_continuous(data, skip, List[str], …) Set variable types as Numeric

merge_observations(top, bottom) Merge two datasets, keeping only the columns present in both.

merge_variables(left, …) Merge a list of dataframes with different variables side-by-side.

move_variables(left, right, …) Move one or more variables from one DataFrame to another

recode_values(data, replacement_dict, skip, …) Convert values in a dataframe.

remove_outliers(data, method[, cutoff]) Remove outliers from continuous variables by replacing them with np.nan

rowfilter_incomplete_obs(data, skip, …) Remove rows containing null values

transform(data, transform_method, skip, …) Apply a transformation function to a variable

Plot¶

Functions that generate plots

histogram(data, column, figsize, int] = (12, …) Plot a histogram of the values in the given column.

distributions(data, filename, …) Create a pdf containing histograms for each binary or categorical variable, and one of several types of plots for each continuous variable.

manhattan(dfs, pandas.core.frame.DataFrame], …) Create a Manhattan-like plot for a list of EWAS Results

manhattan_fdr(dfs, …) Create a Manhattan-like plot for a list of EWAS Results using FDR significance

manhattan_bonferroni(dfs, …) Create a Manhattan-like plot for a list of EWAS Results using Bonferroni significance

top_results(ewas_result, pvalue_name, …) Create a dotplot for EWAS Results showing pvalues and beta coefficients

Survey¶

Complex survey design

SurveyDesignSpec(survey_df, strata, cluster, …) Holds parameters for building a statsmodels SurveyDesign object