API Reference¶
CLARITE functions are organized into several modules:
Analyze¶
Functions used for analyses such as EWAS
ewas
(outcome, covariates, data[, …])Run an Environment-Wide Association Study
interaction_test
(outcome_variable, …[, …])Perform LRT tests comparing a model with interaction terms to one without.
add_corrected_pvalues
(data[, pvalue, groupby])Calculate bonferroni and FDR pvalues and sort by increasing FDR (in-place).
Describe¶
Functions that are used to gather information about some data
correlations
(data[, threshold])Return variables with pearson correlation above the threshold
freq_table
(data)Return the count of each unique value for all binary and categorical variables.
get_types
(data)Return the type of each variable
percent_na
(data)Return the percent of observations that are NA for each variable
skewness
(data[, dropna])Return the skewness of each continuous variable
summarize
(data)Print the number of each type of variable and the number of observations
Load¶
Load data from different formats or sources
from_tsv
(filename[, index_col])Load data from a tab-separated file into a DataFrame
from_csv
(filename[, index_col])Load data from a comma-separated file into a DataFrame
Modify¶
Functions used to filter and/or change some data, always taking in one set of data and returning one set of data.
categorize
(data[, cat_min, cat_max, cont_min])Classify variables into constant, binary, categorical, continuous, and ‘unknown’.
colfilter
(data[, skip, only])Remove some variables (skip) or keep only certain variables (only)
colfilter_percent_zero
(data[, …])Remove continuous variables which have <proportion> or more values of zero (excluding NA)
colfilter_min_n
(data[, n, skip, only])Remove variables which have less than <n> non-NA values
colfilter_min_cat_n
(data[, n, skip, only])Remove binary and categorical variables which have less than <n> occurences of each unique value
make_binary
(data[, skip, only])Set variable types as Binary
make_categorical
(data[, skip, only])Set variable types as Categorical
make_continuous
(data[, skip, only])Set variable types as Numeric
merge_observations
(top, bottom)Merge two datasets, keeping only the columns present in both.
merge_variables
(left, right[, how])Merge a list of dataframes with different variables side-by-side.
move_variables
(left, right[, skip, only])Move one or more variables from one DataFrame to another
recode_values
(data, replacement_dict[, …])Convert values in a dataframe.
remove_outliers
(data[, method, cutoff, …])Remove outliers from continuous variables by replacing them with np.nan
rowfilter_incomplete_obs
(data[, skip, only])Remove rows containing null values
transform
(data, transform_method[, skip, only])Apply a transformation function to a variable
Plot¶
Functions that generate plots
histogram
(data, column[, figsize, title, figure])Plot a histogram of the values in the given column.
distributions
(data, filename[, …])Create a pdf containing histograms for each binary or categorical variable, and one of several types of plots for each continuous variable.
manhattan
(dfs[, categories, bonferroni, …])Create a Manhattan-like plot for a list of EWAS Results
manhattan_fdr
(dfs[, categories, cutoff, …])Create a Manhattan-like plot for a list of EWAS Results using FDR significance
manhattan_bonferroni
(dfs[, categories, …])Create a Manhattan-like plot for a list of EWAS Results using Bonferroni significance
top_results
(ewas_result[, pvalue_name, …])Create a dotplot for EWAS Results showing pvalues and beta coefficients
Survey¶
Complex survey design
SurveyDesignSpec
(survey_df, strata, cluster, …)Holds parameters for building a statsmodels SurveyDesign object