API Reference

CLARITE functions are organized into several modules:

Analyze

Functions used for analyses such as EWAS

ewas(outcome, covariates, data[, …])

Run an Environment-Wide Association Study

interaction_test(outcome_variable, …[, …])

Perform LRT tests comparing a model with interaction terms to one without.

add_corrected_pvalues(data[, pvalue, groupby])

Calculate bonferroni and FDR pvalues and sort by increasing FDR (in-place).


Describe

Functions that are used to gather information about some data

correlations(data[, threshold])

Return variables with pearson correlation above the threshold

freq_table(data)

Return the count of each unique value for all binary and categorical variables.

get_types(data)

Return the type of each variable

percent_na(data)

Return the percent of observations that are NA for each variable

skewness(data[, dropna])

Return the skewness of each continuous variable

summarize(data)

Print the number of each type of variable and the number of observations


Load

Load data from different formats or sources

from_tsv(filename[, index_col])

Load data from a tab-separated file into a DataFrame

from_csv(filename[, index_col])

Load data from a comma-separated file into a DataFrame


Modify

Functions used to filter and/or change some data, always taking in one set of data and returning one set of data.

categorize(data[, cat_min, cat_max, cont_min])

Classify variables into constant, binary, categorical, continuous, and ‘unknown’.

colfilter(data[, skip, only])

Remove some variables (skip) or keep only certain variables (only)

colfilter_percent_zero(data[, …])

Remove continuous variables which have <proportion> or more values of zero (excluding NA)

colfilter_min_n(data[, n, skip, only])

Remove variables which have less than <n> non-NA values

colfilter_min_cat_n(data[, n, skip, only])

Remove binary and categorical variables which have less than <n> occurences of each unique value

make_binary(data[, skip, only])

Set variable types as Binary

make_categorical(data[, skip, only])

Set variable types as Categorical

make_continuous(data[, skip, only])

Set variable types as Numeric

merge_observations(top, bottom)

Merge two datasets, keeping only the columns present in both.

merge_variables(left, right[, how])

Merge a list of dataframes with different variables side-by-side.

move_variables(left, right[, skip, only])

Move one or more variables from one DataFrame to another

recode_values(data, replacement_dict[, …])

Convert values in a dataframe.

remove_outliers(data[, method, cutoff, …])

Remove outliers from continuous variables by replacing them with np.nan

rowfilter_incomplete_obs(data[, skip, only])

Remove rows containing null values

transform(data, transform_method[, skip, only])

Apply a transformation function to a variable


Plot

Functions that generate plots

histogram(data, column[, figsize, title, figure])

Plot a histogram of the values in the given column.

distributions(data, filename[, …])

Create a pdf containing histograms for each binary or categorical variable, and one of several types of plots for each continuous variable.

manhattan(dfs[, categories, bonferroni, …])

Create a Manhattan-like plot for a list of EWAS Results

manhattan_fdr(dfs[, categories, cutoff, …])

Create a Manhattan-like plot for a list of EWAS Results using FDR significance

manhattan_bonferroni(dfs[, categories, …])

Create a Manhattan-like plot for a list of EWAS Results using Bonferroni significance

top_results(ewas_result[, pvalue_name, …])

Create a dotplot for EWAS Results showing pvalues and beta coefficients


Survey

Complex survey design

SurveyDesignSpec(survey_df, strata, cluster, …)

Holds parameters for building a statsmodels SurveyDesign object