CLI Reference

Once CLARITE is installed, the command line interface can be run using the clarte-cli command.

The command line interface has command groups that are the same as the modules in the package (except for survey).

The --help option will show documentation when run with any command or command group:

$ clarite-cli --help
Usage: clarite-cli [OPTIONS] COMMAND [ARGS]...

Options:
--help  Show this message and exit.

Commands:
  analyze
  describe
  load
  modify
  plot

–skip and –only

Many commands in the CLI have the skip and only options. These will limit the command to specific variables. If skip is specified, all variables except the specified ones will be processed. If only is specified, only the specified variables will be processed.

Only one or the other option may be used in a single command. They may be passed in any combination of two ways:

  1. As the name of a file containing one variable name per line

  2. As the variable name specfied directly in the terminal

For example:

clarite-cli modify rowfilter-incomplete-obs 1_nhanes_w_sddsrvyr test -o covars.txt -o BMXBMI

results in:

-------------------------------------------------------------------------------------------------------------------------
--only: 1 variable(s) specified directly
        8 variable(s) loaded from 'covars.txt'
=========================================================================================================================
Running rowfilter_incomplete_obs
-------------------------------------------------------------------------------------------------------------------------
Removed 3,687 of 22,624 observations (16.30%) due to NA values in any of 9 variables
=========================================================================================================================

Commands

clarite-cli analyze

clarite-cli analyze [OPTIONS] COMMAND [ARGS]...

add-corrected-pvals

Get FDR-corrected and Bonferroni-corrected pvalues

clarite-cli analyze add-corrected-pvals [OPTIONS] EWAS_RESULT OUTPUT

Arguments

EWAS_RESULT

Required argument

OUTPUT

Required argument

ewas

Run an EWAS analysis

clarite-cli analyze ewas [OPTIONS] OUTCOME DATA OUTPUT

Options

-c, --covariate <covariate>

Covariates

--covariance-calc <covariance_calc>

Covariance calculation method

Options

stata | jackknife

--min-n <min_n>

Minimum number of complete cases needed to run a regression

--survey-data <survey_data>

Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.

--strata <strata>

Name of the strata column in the survey data

--cluster <cluster>

Name of the cluster column in the survey data

--nested, --not-nested

Whether survey data is nested or not

--weights-file <weights_file>

Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables

-w, --weight <weight>

Name of a survey weight column found in the survey data. This option can’t be used with –weights-file

--fpc <fpc>

Name of the finite population correction column in the survey data

--single-cluster <single_cluster>

How to handle singular clusters

Options

fail | adjust | average | certainty

Arguments

OUTCOME

Required argument

DATA

Required argument

OUTPUT

Required argument

ewas-r

Run an EWAS analysis using R

clarite-cli analyze ewas-r [OPTIONS] OUTCOME DATA OUTPUT

Options

-c, --covariate <covariate>

Covariates

--covariance-calc <covariance_calc>

Covariance calculation method

Options

stata | jackknife

--min-n <min_n>

Minimum number of complete cases needed to run a regression

--survey-data <survey_data>

Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.

--strata <strata>

Name of the strata column in the survey data

--cluster <cluster>

Name of the cluster column in the survey data

--nested, --not-nested

Whether survey data is nested or not

--weights-file <weights_file>

Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables

-w, --weight <weight>

Name of a survey weight column found in the survey data. This option can’t be used with –weights-file

--fpc <fpc>

Name of the finite population correction column in the survey data

--single-cluster <single_cluster>

How to handle singular clusters

Options

fail | adjust | average | certainty

Arguments

OUTCOME

Required argument

DATA

Required argument

OUTPUT

Required argument

get-significant

filter out non-significant results

clarite-cli analyze get-significant [OPTIONS] EWAS_RESULT OUTPUT

Options

--fdr, --bonferroni

Use FDR (–fdr) or Bonferroni pvalues (–bonferroni). FDR by default.

-p, --pvalue <pvalue>

Keep results with a pvalue <= this value (0.05 by default)

Arguments

EWAS_RESULT

Required argument

OUTPUT

Required argument

clarite-cli describe

clarite-cli describe [OPTIONS] COMMAND [ARGS]...

correlations

Report top correlations between variables

clarite-cli describe correlations [OPTIONS] DATA OUTPUT

Options

-t, --threshold <threshold>

Report correlations with R >= this value

Arguments

DATA

Required argument

OUTPUT

Required argument

freq-table

Report the number of occurences of each value for each variable

clarite-cli describe freq-table [OPTIONS] DATA OUTPUT

Arguments

DATA

Required argument

OUTPUT

Required argument

get-types

Get the type of each variable

clarite-cli describe get-types [OPTIONS] DATA OUTPUT

Arguments

DATA

Required argument

OUTPUT

Required argument

percent-na

Report the percent of observations that are NA for each variable

clarite-cli describe percent-na [OPTIONS] DATA OUTPUT

Arguments

DATA

Required argument

OUTPUT

Required argument

skewness

Report and test the skewness for each continuous variable

clarite-cli describe skewness [OPTIONS] DATA OUTPUT

Options

--dropna, --keepna

Omit NA values before calculating skew

Arguments

DATA

Required argument

OUTPUT

Required argument

clarite-cli load

clarite-cli load [OPTIONS] COMMAND [ARGS]...

from-csv

Load data from a comma-separated file and save it in the standard format

clarite-cli load from-csv [OPTIONS] INPUT OUTPUT

Options

-i, --index <index>

Name of the column to use as the index. Default is the first column.

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

INPUT

Required argument

OUTPUT

Required argument

from-tsv

Load data from a tab-separated file and save it in the standard format

clarite-cli load from-tsv [OPTIONS] INPUT OUTPUT

Options

-i, --index <index>

Name of the column to use as the index. Default is the first column.

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

INPUT

Required argument

OUTPUT

Required argument

clarite-cli modify

clarite-cli modify [OPTIONS] COMMAND [ARGS]...

categorize

Categorize data based on the number of unique values

clarite-cli modify categorize [OPTIONS] DATA OUTPUT

Options

--cat_min <cat_min>

Minimum number of unique values in a variable to make it a categorical type

--cat_max <cat_max>

Maximum number of unique values in a variable to make it a categorical type

--cont_min <cont_min>

Minimum number of unique values in a variable to make it a continuous type

Arguments

DATA

Required argument

OUTPUT

Required argument

colfilter

Remove some variables from a dataset

clarite-cli modify colfilter [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

colfilter-min-cat-n

Filter variables based on a minimum number of non-NA observations per category

clarite-cli modify colfilter-min-cat-n [OPTIONS] DATA OUTPUT

Options

-n <n>

Remove variables with less than this many non-na observations in each category

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

colfilter-min-n

Filter variables based on a minimum number of non-NA observations

clarite-cli modify colfilter-min-n [OPTIONS] DATA OUTPUT

Options

-n <n>

Remove variables with less than this many non-na observations

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

colfilter-percent-zero

Filter variables based on the fraction of observations with a value of zero

clarite-cli modify colfilter-percent-zero [OPTIONS] DATA OUTPUT

Options

-p, --filter-percent <filter_percent>

Remove variables when the percentage of observations equal to 0 is >= this value (0 to 100)

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

drop-extra-categories

Remove extra categories from categorical datatypes

clarite-cli modify drop-extra-categories [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

make-binary

Set the type of variables to ‘binary’

clarite-cli modify make-binary [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

make-categorical

Set the type of variables to ‘categorical’

clarite-cli modify make-categorical [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

make-continuous

Set the type of variables to ‘continuous’

clarite-cli modify make-continuous [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

merge-observations

Merge observations from two different datasets into one

clarite-cli modify merge-observations [OPTIONS] TOP BOTTOM OUTPUT

Arguments

TOP

Required argument

BOTTOM

Required argument

OUTPUT

Required argument

merge-variables

Merge variables from two different datasets into one

clarite-cli modify merge-variables [OPTIONS] LEFT RIGHT OUTPUT

Options

-h, --how <how>

Type of Merge

Options

left | right | inner | outer

Arguments

LEFT

Required argument

RIGHT

Required argument

OUTPUT

Required argument

move-variables

Move variables from one dataset to another

clarite-cli modify move-variables [OPTIONS] LEFT RIGHT

Options

--output_left <output_left>
--output_right <output_right>
-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

LEFT

Required argument

RIGHT

Required argument

recode-values

Replace values in the data with other values.The value being replaced (‘current’) and the new value (‘replacement’) are specified with their type, and only one may be included for each. If it is not specified, the value being replaced or being inserted is None.

clarite-cli modify recode-values [OPTIONS] DATA OUTPUT

Options

--current-str <cs>

Replace occurences of this string value

--current-int <ci>

Replace occurences of this integer value

--current-float <cf>

Replace occurences of this float value

--replacement-str <rs>

Insert this string value

--replacement-int <ri>

Insert this integer value

--replacement-float <rf>

Insert this float value

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

remove-outliers

Replace outlier values with NaN. Outliers are defined using a gaussian or IQR approach.

clarite-cli modify remove-outliers [OPTIONS] DATA OUTPUT

Options

-m, --method <method>
Options

gaussian | iqr

-c, --cutoff <cutoff>
-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

rowfilter

Select some rows from a dataset using a simple comparison, keeping rows where the comparison is True.

clarite-cli modify rowfilter [OPTIONS] DATA OUTPUT COLUMN

Options

--value-str <vs>

Compare values in the column to this string

--value-int <vi>

Compare values in the column to this integer

--value-float <vf>

Compare values in the column to this floating point number

-c, --comparison <comparison>

Keep rows where the value of the column is lt (<), lte (<=), eq (==), gte (>=), or gt (>) the specified value. Eq by default.

Options

lt | lte | eq | gte | gt

Arguments

DATA

Required argument

OUTPUT

Required argument

COLUMN

Required argument

rowfilter-incomplete-obs

Filter out observations that are not complete cases (contain no NA values)

clarite-cli modify rowfilter-incomplete-obs [OPTIONS] DATA OUTPUT

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

transform-variable

Apply a function to each value of a variable

clarite-cli modify transform-variable [OPTIONS] DATA OUTPUT TRANSFORM_METHOD

Options

-s, --skip <skip>

variables to skip. Either individual names, or a file containing one name per line.

-o, --only <only>

variables to process, skipping all others. Either individual names, or a file containing one name per line.

Arguments

DATA

Required argument

OUTPUT

Required argument

TRANSFORM_METHOD

Required argument

clarite-cli plot

clarite-cli plot [OPTIONS] COMMAND [ARGS]...

distributions

Generate a pdf containing distribution plots for each variable

clarite-cli plot distributions [OPTIONS] DATA OUTPUT

Options

-k, --kind <kind>

Kind of plot used for continuous data. Non-continuous always shows a count plot.

Options

count | box | violin | qq

--nrows <nrows>

Number of rows per page

--ncols <ncols>

Number of columns per page

-q, --quality <quality>

Quality of the generated plots: low (150 dpi), medium (300 dpi), or high (1200 dpi).

Options

low | medium | high

--sort, --no-sort

Sort variables alphabetically

Arguments

DATA

Required argument

OUTPUT

Required argument

histogram

Create a histogram plot of a variable

clarite-cli plot histogram [OPTIONS] DATA OUTPUT VARIABLE

Arguments

DATA

Required argument

OUTPUT

Required argument

VARIABLE

Required argument

manhattan

Generate a manhattan plot of EWAS results

clarite-cli plot manhattan [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>

tab-separate file with two columns: ‘Variable’ and ‘category’

--bonferroni <bonferroni>

cutoff value to plot bonferroni-adjusted pvalue line

--fdr <fdr>

cutoff value to plot fdr-adjusted pvalue line

-o, --other <other>

other datasets to include in the plot

--nlabeled <nlabeled>

label top n points

--label <label>

label points by name

Arguments

EWAS_RESULT

Required argument

OUTPUT

Required argument

manhattan-bonferroni

Generate a manhattan plot of EWAS results showing Bonferroni-corrected pvalues

clarite-cli plot manhattan-bonferroni [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>

tab-separate file with two columns: ‘Variable’ and ‘category’

--cutoff <cutoff>

cutoff value for plotting the significance line

--fdr <fdr>

cutoff value to plot Bonferroni-adjusted pvalue line

-o, --other <other>

other datasets to include in the plot

--nlabeled <nlabeled>

label top n points

--label <label>

label points by name

Arguments

EWAS_RESULT

Required argument

OUTPUT

Required argument

manhattan-fdr

Generate a manhattan plot of EWAS results showing FDR-corrected pvalues

clarite-cli plot manhattan-fdr [OPTIONS] EWAS_RESULT OUTPUT

Options

-c, --categories <categories>

tab-separate file with two columns: ‘Variable’ and ‘category’

--cutoff <cutoff>

cutoff value for plotting the significance line

--fdr <fdr>

cutoff value to plot fdr-adjusted pvalue line

-o, --other <other>

other datasets to include in the plot

--nlabeled <nlabeled>

label top n points

--label <label>

label points by name

Arguments

EWAS_RESULT

Required argument

OUTPUT

Required argument