CLI Reference¶

Once CLARITE is installed, the command line interface can be run using the clarte-cli command.

The command line interface has command groups that are the same as the modules in the package (except for survey).

The --help option will show documentation when run with any command or command group:

$ clarite-cli --help
Usage: clarite-cli [OPTIONS] COMMAND [ARGS]...

Options:
--help  Show this message and exit.

Commands:
  analyze
  describe
  load
  modify
  plot

–skip and –only¶

Many commands in the CLI have the skip and only options. These will limit the command to specific variables. If skip is specified, all variables except the specified ones will be processed. If only is specified, only the specified variables will be processed.

Only one or the other option may be used in a single command. They may be passed in any combination of two ways:

As the name of a file containing one variable name per line
As the variable name specfied directly in the terminal

For example:

results in:

-------------------------------------------------------------------------------------------------------------------------
--only: 1 variable(s) specified directly
        8 variable(s) loaded from 'covars.txt'
=========================================================================================================================
Running rowfilter_incomplete_obs
-------------------------------------------------------------------------------------------------------------------------
Removed 3,687 of 22,624 observations (16.30%) due to NA values in any of 9 variables
=========================================================================================================================

Commands¶

clarite-cli analyze¶

clarite-cli analyze [OPTIONS] COMMAND [ARGS]...

add-corrected-pvals¶

Get FDR-corrected and Bonferroni-corrected pvalues

clarite-cli analyze add-corrected-pvals [OPTIONS] EWAS_RESULT OUTPUT

Arguments

EWAS_RESULT¶: Required argument

OUTPUT¶: Required argument

ewas¶

Run an EWAS analysis

clarite-cli analyze ewas [OPTIONS] OUTCOME DATA OUTPUT

Options

-c, --covariate <covariate>¶: Covariates

--covariance-calc <covariance_calc>¶

Covariance calculation method

Options:	stata \| jackknife

--min-n <min_n>¶: Minimum number of complete cases needed to run a regression

--survey-data <survey_data>¶: Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.

--strata <strata>¶: Name of the strata column in the survey data

--cluster <cluster>¶: Name of the cluster column in the survey data

--nested, --not-nested¶: Whether survey data is nested or not

--weights-file <weights_file>¶: Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables

-w, --weight <weight>¶: Name of a survey weight column found in the survey data. This option can’t be used with –weights-file

--fpc <fpc>¶: Name of the finite population correction column in the survey data

--single-cluster <single_cluster>¶

How to handle singular clusters

Options:	fail \| adjust \| average \| certainty

Arguments

OUTCOME¶: Required argument

DATA¶: Required argument

OUTPUT¶: Required argument

ewas-r¶

Run an EWAS analysis using R

clarite-cli analyze ewas-r [OPTIONS] OUTCOME DATA OUTPUT

Options

-c, --covariate <covariate>¶: Covariates

--covariance-calc <covariance_calc>¶

Covariance calculation method

Options:	stata \| jackknife

--min-n <min_n>¶: Minimum number of complete cases needed to run a regression

--survey-data <survey_data>¶: Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.

--strata <strata>¶: Name of the strata column in the survey data

--cluster <cluster>¶: Name of the cluster column in the survey data

--nested, --not-nested¶: Whether survey data is nested or not

--weights-file <weights_file>¶: Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables

-w, --weight <weight>¶: Name of a survey weight column found in the survey data. This option can’t be used with –weights-file

--fpc <fpc>¶: Name of the finite population correction column in the survey data

--single-cluster <single_cluster>¶

How to handle singular clusters

Options:	fail \| adjust \| average \| certainty

Arguments

OUTCOME¶: Required argument

DATA¶: Required argument

OUTPUT¶: Required argument

get-significant¶

filter out non-significant results

clarite-cli analyze get-significant [OPTIONS] EWAS_RESULT OUTPUT

Options

--fdr, --bonferroni¶: Use FDR (–fdr) or Bonferroni pvalues (–bonferroni). FDR by default.

-p, --pvalue <pvalue>¶: Keep results with a pvalue <= this value (0.05 by default)

Arguments

EWAS_RESULT¶: Required argument

OUTPUT¶: Required argument

clarite-cli describe¶

clarite-cli describe [OPTIONS] COMMAND [ARGS]...

correlations¶

Report top correlations between variables

clarite-cli describe correlations [OPTIONS] DATA OUTPUT

Options

-t, --threshold <threshold>¶: Report correlations with R >= this value

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

freq-table¶

Report the number of occurences of each value for each variable

clarite-cli describe freq-table [OPTIONS] DATA OUTPUT

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

get-types¶

Get the type of each variable

clarite-cli describe get-types [OPTIONS] DATA OUTPUT

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

percent-na¶

Report the percent of observations that are NA for each variable

clarite-cli describe percent-na [OPTIONS] DATA OUTPUT

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

skewness¶

Report and test the skewness for each continuous variable

clarite-cli describe skewness [OPTIONS] DATA OUTPUT

Options

--dropna, --keepna¶: Omit NA values before calculating skew

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

clarite-cli load¶

clarite-cli load [OPTIONS] COMMAND [ARGS]...

from-csv¶

Load data from a comma-separated file and save it in the standard format

clarite-cli load from-csv [OPTIONS] INPUT OUTPUT

Options

-i, --index <index>¶: Name of the column to use as the index. Default is the first column.

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

INPUT¶: Required argument

OUTPUT¶: Required argument

from-tsv¶

Load data from a tab-separated file and save it in the standard format

clarite-cli load from-tsv [OPTIONS] INPUT OUTPUT

Options

-i, --index <index>¶: Name of the column to use as the index. Default is the first column.

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

INPUT¶: Required argument

OUTPUT¶: Required argument

clarite-cli modify¶

clarite-cli modify [OPTIONS] COMMAND [ARGS]...

categorize¶

Categorize data based on the number of unique values

clarite-cli modify categorize [OPTIONS] DATA OUTPUT

Options

--cat_min <cat_min>¶: Minimum number of unique values in a variable to make it a categorical type

--cat_max <cat_max>¶: Maximum number of unique values in a variable to make it a categorical type

--cont_min <cont_min>¶: Minimum number of unique values in a variable to make it a continuous type

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

colfilter¶

Remove some variables from a dataset

clarite-cli modify colfilter [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

colfilter-min-cat-n¶

Filter variables based on a minimum number of non-NA observations per category

clarite-cli modify colfilter-min-cat-n [OPTIONS] DATA OUTPUT

Options

-n <n>¶: Remove variables with less than this many non-na observations in each category

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

colfilter-min-n¶

Filter variables based on a minimum number of non-NA observations

clarite-cli modify colfilter-min-n [OPTIONS] DATA OUTPUT

Options

-n <n>¶: Remove variables with less than this many non-na observations

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

colfilter-percent-zero¶

Filter variables based on the fraction of observations with a value of zero

clarite-cli modify colfilter-percent-zero [OPTIONS] DATA OUTPUT

Options

-p, --filter-percent <filter_percent>¶: Remove variables when the percentage of observations equal to 0 is >= this value (0 to 100)

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

drop-extra-categories¶

Remove extra categories from categorical datatypes

clarite-cli modify drop-extra-categories [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

make-binary¶

Set the type of variables to ‘binary’

clarite-cli modify make-binary [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

make-categorical¶

Set the type of variables to ‘categorical’

clarite-cli modify make-categorical [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

make-continuous¶

Set the type of variables to ‘continuous’

clarite-cli modify make-continuous [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

merge-observations¶

Merge observations from two different datasets into one

clarite-cli modify merge-observations [OPTIONS] TOP BOTTOM OUTPUT

Arguments

TOP¶: Required argument

BOTTOM¶: Required argument

OUTPUT¶: Required argument

merge-variables¶

Merge variables from two different datasets into one

clarite-cli modify merge-variables [OPTIONS] LEFT RIGHT OUTPUT

Options

-h, --how <how>¶

Type of Merge

Options:	left \| right \| inner \| outer

Arguments

LEFT¶: Required argument

RIGHT¶: Required argument

OUTPUT¶: Required argument

move-variables¶

Move variables from one dataset to another

clarite-cli modify move-variables [OPTIONS] LEFT RIGHT

Options

--output_left <output_left>¶

--output_right <output_right>¶

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

LEFT¶: Required argument

RIGHT¶: Required argument

recode-values¶

Replace values in the data with other values.The value being replaced (‘current’) and the new value (‘replacement’) are specified with their type, and only one may be included for each. If it is not specified, the value being replaced or being inserted is None.

clarite-cli modify recode-values [OPTIONS] DATA OUTPUT

Options

--current-str <cs>¶: Replace occurences of this string value

--current-int <ci>¶: Replace occurences of this integer value

--current-float <cf>¶: Replace occurences of this float value

--replacement-str <rs>¶: Insert this string value

--replacement-int <ri>¶: Insert this integer value

--replacement-float <rf>¶: Insert this float value

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

remove-outliers¶

Replace outlier values with NaN. Outliers are defined using a gaussian or IQR approach.

clarite-cli modify remove-outliers [OPTIONS] DATA OUTPUT

Options

-m, --method <method>¶

Options:	gaussian \| iqr

-c, --cutoff <cutoff>¶

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

rowfilter¶

Select some rows from a dataset using a simple comparison, keeping rows where the comparison is True.

clarite-cli modify rowfilter [OPTIONS] DATA OUTPUT COLUMN

Options

--value-str <vs>¶: Compare values in the column to this string

--value-int <vi>¶: Compare values in the column to this integer

--value-float <vf>¶: Compare values in the column to this floating point number

-c, --comparison <comparison>¶

Keep rows where the value of the column is lt (<), lte (<=), eq (==), gte (>=), or gt (>) the specified value. Eq by default.

Options:	lt \| lte \| eq \| gte \| gt

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

COLUMN¶: Required argument

rowfilter-incomplete-obs¶

Filter out observations that are not complete cases (contain no NA values)

clarite-cli modify rowfilter-incomplete-obs [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

transform-variable¶

Apply a function to each value of a variable

clarite-cli modify transform-variable [OPTIONS] DATA OUTPUT TRANSFORM_METHOD

Options

-s, --skip <skip>¶: variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>¶: variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

TRANSFORM_METHOD¶: Required argument

clarite-cli plot¶

clarite-cli plot [OPTIONS] COMMAND [ARGS]...

distributions¶

Generate a pdf containing distribution plots for each variable

clarite-cli plot distributions [OPTIONS] DATA OUTPUT

Options

-k, --kind <kind>¶

Kind of plot used for continuous data. Non-continuous always shows a count plot.

Options:	count \| box \| violin \| qq

--nrows <nrows>¶: Number of rows per page

--ncols <ncols>¶: Number of columns per page

-q, --quality <quality>¶

Quality of the generated plots: low (150 dpi), medium (300 dpi), or high (1200 dpi).

Options:	low \| medium \| high

--sort, --no-sort¶: Sort variables alphabetically

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

histogram¶

Create a histogram plot of a variable

clarite-cli plot histogram [OPTIONS] DATA OUTPUT VARIABLE

Arguments

DATA¶: Required argument

OUTPUT¶: Required argument

VARIABLE¶: Required argument

manhattan¶

Generate a manhattan plot of EWAS results

clarite-cli plot manhattan [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>¶: tab-separate file with two columns: ‘Variable’ and ‘category’

--bonferroni <bonferroni>¶: cutoff value to plot bonferroni-adjusted pvalue line

--fdr <fdr>¶: cutoff value to plot fdr-adjusted pvalue line

-o, --other <other>¶: other datasets to include in the plot

--nlabeled <nlabeled>¶: label top n points

--label <label>¶: label points by name

Arguments

EWAS_RESULT¶: Required argument

OUTPUT¶: Required argument

manhattan-bonferroni¶

Generate a manhattan plot of EWAS results showing Bonferroni-corrected pvalues

clarite-cli plot manhattan-bonferroni [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>¶: tab-separate file with two columns: ‘Variable’ and ‘category’

--cutoff <cutoff>¶: cutoff value for plotting the significance line

--fdr <fdr>¶: cutoff value to plot Bonferroni-adjusted pvalue line

-o, --other <other>¶: other datasets to include in the plot

--nlabeled <nlabeled>¶: label top n points

--label <label>¶: label points by name

Arguments

EWAS_RESULT¶: Required argument

OUTPUT¶: Required argument

manhattan-fdr¶

Generate a manhattan plot of EWAS results showing FDR-corrected pvalues

clarite-cli plot manhattan-fdr [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>¶: tab-separate file with two columns: ‘Variable’ and ‘category’

--cutoff <cutoff>¶: cutoff value for plotting the significance line

--fdr <fdr>¶: cutoff value to plot fdr-adjusted pvalue line

-o, --other <other>¶: other datasets to include in the plot

--nlabeled <nlabeled>¶: label top n points

--label <label>¶: label points by name

Arguments

EWAS_RESULT¶: Required argument

OUTPUT¶: Required argument