CLI Reference¶
Once CLARITE is installed, the command line interface can be run using the clarte-cli
command.
The command line interface has command groups that are the same as the modules in the package (except for survey).
The --help
option will show documentation when run with any command or command group:
$ clarite-cli --help
Usage: clarite-cli [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
analyze
describe
load
modify
plot
–skip and –only¶
Many commands in the CLI have the skip and only options. These will limit the command to specific variables. If skip is specified, all variables except the specified ones will be processed. If only is specified, only the specified variables will be processed.
Only one or the other option may be used in a single command. They may be passed in any combination of two ways:
- As the name of a file containing one variable name per line
- As the variable name specfied directly in the terminal
For example:
results in:
-------------------------------------------------------------------------------------------------------------------------
--only: 1 variable(s) specified directly
8 variable(s) loaded from 'covars.txt'
=========================================================================================================================
Running rowfilter_incomplete_obs
-------------------------------------------------------------------------------------------------------------------------
Removed 3,687 of 22,624 observations (16.30%) due to NA values in any of 9 variables
=========================================================================================================================
Commands¶
clarite-cli analyze¶
clarite-cli analyze [OPTIONS] COMMAND [ARGS]...
add-corrected-pvals¶
Get FDR-corrected and Bonferroni-corrected pvalues
clarite-cli analyze add-corrected-pvals [OPTIONS] EWAS_RESULT OUTPUT
Arguments
-
EWAS_RESULT
¶
Required argument
-
OUTPUT
¶
Required argument
ewas¶
Run an EWAS analysis
clarite-cli analyze ewas [OPTIONS] OUTCOME DATA OUTPUT
Options
-
-c
,
--covariate
<covariate>
¶ Covariates
-
--covariance-calc
<covariance_calc>
¶ Covariance calculation method
Options: stata | jackknife
-
--min-n
<min_n>
¶ Minimum number of complete cases needed to run a regression
-
--survey-data
<survey_data>
¶ Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.
-
--strata
<strata>
¶ Name of the strata column in the survey data
-
--cluster
<cluster>
¶ Name of the cluster column in the survey data
-
--nested
,
--not-nested
¶
Whether survey data is nested or not
-
--weights-file
<weights_file>
¶ Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables
-
-w
,
--weight
<weight>
¶ Name of a survey weight column found in the survey data. This option can’t be used with –weights-file
-
--fpc
<fpc>
¶ Name of the finite population correction column in the survey data
-
--single-cluster
<single_cluster>
¶ How to handle singular clusters
Options: fail | adjust | average | certainty
Arguments
-
OUTCOME
¶
Required argument
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
ewas-r¶
Run an EWAS analysis using R
clarite-cli analyze ewas-r [OPTIONS] OUTCOME DATA OUTPUT
Options
-
-c
,
--covariate
<covariate>
¶ Covariates
-
--covariance-calc
<covariance_calc>
¶ Covariance calculation method
Options: stata | jackknife
-
--min-n
<min_n>
¶ Minimum number of complete cases needed to run a regression
-
--survey-data
<survey_data>
¶ Tab-separated data file with survey weights, strata IDs, and/or cluster IDs. Must have an ‘ID’ column.
-
--strata
<strata>
¶ Name of the strata column in the survey data
-
--cluster
<cluster>
¶ Name of the cluster column in the survey data
-
--nested
,
--not-nested
¶
Whether survey data is nested or not
-
--weights-file
<weights_file>
¶ Tab-delimited data file with ‘Variable’ and ‘Weight’ columns to match weights from the survey data to specific variables
-
-w
,
--weight
<weight>
¶ Name of a survey weight column found in the survey data. This option can’t be used with –weights-file
-
--fpc
<fpc>
¶ Name of the finite population correction column in the survey data
-
--single-cluster
<single_cluster>
¶ How to handle singular clusters
Options: fail | adjust | average | certainty
Arguments
-
OUTCOME
¶
Required argument
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
get-significant¶
filter out non-significant results
clarite-cli analyze get-significant [OPTIONS] EWAS_RESULT OUTPUT
Options
-
--fdr
,
--bonferroni
¶
Use FDR (–fdr) or Bonferroni pvalues (–bonferroni). FDR by default.
-
-p
,
--pvalue
<pvalue>
¶ Keep results with a pvalue <= this value (0.05 by default)
Arguments
-
EWAS_RESULT
¶
Required argument
-
OUTPUT
¶
Required argument
clarite-cli describe¶
clarite-cli describe [OPTIONS] COMMAND [ARGS]...
correlations¶
Report top correlations between variables
clarite-cli describe correlations [OPTIONS] DATA OUTPUT
Options
-
-t
,
--threshold
<threshold>
¶ Report correlations with R >= this value
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
freq-table¶
Report the number of occurences of each value for each variable
clarite-cli describe freq-table [OPTIONS] DATA OUTPUT
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
get-types¶
Get the type of each variable
clarite-cli describe get-types [OPTIONS] DATA OUTPUT
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
clarite-cli load¶
clarite-cli load [OPTIONS] COMMAND [ARGS]...
from-csv¶
Load data from a comma-separated file and save it in the standard format
clarite-cli load from-csv [OPTIONS] INPUT OUTPUT
Options
-
-i
,
--index
<index>
¶ Name of the column to use as the index. Default is the first column.
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
INPUT
¶
Required argument
-
OUTPUT
¶
Required argument
from-tsv¶
Load data from a tab-separated file and save it in the standard format
clarite-cli load from-tsv [OPTIONS] INPUT OUTPUT
Options
-
-i
,
--index
<index>
¶ Name of the column to use as the index. Default is the first column.
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
INPUT
¶
Required argument
-
OUTPUT
¶
Required argument
clarite-cli modify¶
clarite-cli modify [OPTIONS] COMMAND [ARGS]...
categorize¶
Categorize data based on the number of unique values
clarite-cli modify categorize [OPTIONS] DATA OUTPUT
Options
-
--cat_min
<cat_min>
¶ Minimum number of unique values in a variable to make it a categorical type
-
--cat_max
<cat_max>
¶ Maximum number of unique values in a variable to make it a categorical type
-
--cont_min
<cont_min>
¶ Minimum number of unique values in a variable to make it a continuous type
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
colfilter¶
Remove some variables from a dataset
clarite-cli modify colfilter [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
colfilter-min-cat-n¶
Filter variables based on a minimum number of non-NA observations per category
clarite-cli modify colfilter-min-cat-n [OPTIONS] DATA OUTPUT
Options
-
-n
<n>
¶ Remove variables with less than this many non-na observations in each category
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
colfilter-min-n¶
Filter variables based on a minimum number of non-NA observations
clarite-cli modify colfilter-min-n [OPTIONS] DATA OUTPUT
Options
-
-n
<n>
¶ Remove variables with less than this many non-na observations
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
colfilter-percent-zero¶
Filter variables based on the fraction of observations with a value of zero
clarite-cli modify colfilter-percent-zero [OPTIONS] DATA OUTPUT
Options
-
-p
,
--filter-percent
<filter_percent>
¶ Remove variables when the percentage of observations equal to 0 is >= this value (0 to 100)
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
drop-extra-categories¶
Remove extra categories from categorical datatypes
clarite-cli modify drop-extra-categories [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
make-binary¶
Set the type of variables to ‘binary’
clarite-cli modify make-binary [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
make-categorical¶
Set the type of variables to ‘categorical’
clarite-cli modify make-categorical [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
make-continuous¶
Set the type of variables to ‘continuous’
clarite-cli modify make-continuous [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
merge-observations¶
Merge observations from two different datasets into one
clarite-cli modify merge-observations [OPTIONS] TOP BOTTOM OUTPUT
Arguments
-
TOP
¶
Required argument
-
BOTTOM
¶
Required argument
-
OUTPUT
¶
Required argument
merge-variables¶
Merge variables from two different datasets into one
clarite-cli modify merge-variables [OPTIONS] LEFT RIGHT OUTPUT
Options
-
-h
,
--how
<how>
¶ Type of Merge
Options: left | right | inner | outer
Arguments
-
LEFT
¶
Required argument
-
RIGHT
¶
Required argument
-
OUTPUT
¶
Required argument
move-variables¶
Move variables from one dataset to another
clarite-cli modify move-variables [OPTIONS] LEFT RIGHT
Options
-
--output_left
<output_left>
¶
-
--output_right
<output_right>
¶
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
LEFT
¶
Required argument
-
RIGHT
¶
Required argument
recode-values¶
Replace values in the data with other values.The value being replaced (‘current’) and the new value (‘replacement’) are specified with their type, and only one may be included for each. If it is not specified, the value being replaced or being inserted is None.
clarite-cli modify recode-values [OPTIONS] DATA OUTPUT
Options
-
--current-str
<cs>
¶ Replace occurences of this string value
-
--current-int
<ci>
¶ Replace occurences of this integer value
-
--current-float
<cf>
¶ Replace occurences of this float value
-
--replacement-str
<rs>
¶ Insert this string value
-
--replacement-int
<ri>
¶ Insert this integer value
-
--replacement-float
<rf>
¶ Insert this float value
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
remove-outliers¶
Replace outlier values with NaN. Outliers are defined using a gaussian or IQR approach.
clarite-cli modify remove-outliers [OPTIONS] DATA OUTPUT
Options
-
-m
,
--method
<method>
¶ Options: gaussian | iqr
-
-c
,
--cutoff
<cutoff>
¶
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
rowfilter¶
Select some rows from a dataset using a simple comparison, keeping rows where the comparison is True.
clarite-cli modify rowfilter [OPTIONS] DATA OUTPUT COLUMN
Options
-
--value-str
<vs>
¶ Compare values in the column to this string
-
--value-int
<vi>
¶ Compare values in the column to this integer
-
--value-float
<vf>
¶ Compare values in the column to this floating point number
-
-c
,
--comparison
<comparison>
¶ Keep rows where the value of the column is lt (<), lte (<=), eq (==), gte (>=), or gt (>) the specified value. Eq by default.
Options: lt | lte | eq | gte | gt
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
-
COLUMN
¶
Required argument
rowfilter-incomplete-obs¶
Filter out observations that are not complete cases (contain no NA values)
clarite-cli modify rowfilter-incomplete-obs [OPTIONS] DATA OUTPUT
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
transform-variable¶
Apply a function to each value of a variable
clarite-cli modify transform-variable [OPTIONS] DATA OUTPUT TRANSFORM_METHOD
Options
-
-s
,
--skip
<skip>
¶ variables to skip. Either individual names, or a file containing one name per line.
-
-o
,
--only
<only>
¶ variables to process, skipping all others. Either individual names, or a file containing one name per line.
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
-
TRANSFORM_METHOD
¶
Required argument
clarite-cli plot¶
clarite-cli plot [OPTIONS] COMMAND [ARGS]...
distributions¶
Generate a pdf containing distribution plots for each variable
clarite-cli plot distributions [OPTIONS] DATA OUTPUT
Options
-
-k
,
--kind
<kind>
¶ Kind of plot used for continuous data. Non-continuous always shows a count plot.
Options: count | box | violin | qq
-
--nrows
<nrows>
¶ Number of rows per page
-
--ncols
<ncols>
¶ Number of columns per page
-
-q
,
--quality
<quality>
¶ Quality of the generated plots: low (150 dpi), medium (300 dpi), or high (1200 dpi).
Options: low | medium | high
-
--sort
,
--no-sort
¶
Sort variables alphabetically
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
histogram¶
Create a histogram plot of a variable
clarite-cli plot histogram [OPTIONS] DATA OUTPUT VARIABLE
Arguments
-
DATA
¶
Required argument
-
OUTPUT
¶
Required argument
-
VARIABLE
¶
Required argument
manhattan¶
Generate a manhattan plot of EWAS results
clarite-cli plot manhattan [OPTIONS] EWAS_RESULT OUTPUT
Options
-
-c
,
--categories
<categories>
¶ tab-separate file with two columns: ‘Variable’ and ‘category’
-
--bonferroni
<bonferroni>
¶ cutoff value to plot bonferroni-adjusted pvalue line
-
--fdr
<fdr>
¶ cutoff value to plot fdr-adjusted pvalue line
-
-o
,
--other
<other>
¶ other datasets to include in the plot
-
--nlabeled
<nlabeled>
¶ label top n points
-
--label
<label>
¶ label points by name
Arguments
-
EWAS_RESULT
¶
Required argument
-
OUTPUT
¶
Required argument
manhattan-bonferroni¶
Generate a manhattan plot of EWAS results showing Bonferroni-corrected pvalues
clarite-cli plot manhattan-bonferroni [OPTIONS] EWAS_RESULT OUTPUT
Options
-
-c
,
--categories
<categories>
¶ tab-separate file with two columns: ‘Variable’ and ‘category’
-
--cutoff
<cutoff>
¶ cutoff value for plotting the significance line
-
--fdr
<fdr>
¶ cutoff value to plot Bonferroni-adjusted pvalue line
-
-o
,
--other
<other>
¶ other datasets to include in the plot
-
--nlabeled
<nlabeled>
¶ label top n points
-
--label
<label>
¶ label points by name
Arguments
-
EWAS_RESULT
¶
Required argument
-
OUTPUT
¶
Required argument
manhattan-fdr¶
Generate a manhattan plot of EWAS results showing FDR-corrected pvalues
clarite-cli plot manhattan-fdr [OPTIONS] EWAS_RESULT OUTPUT
Options
-
-c
,
--categories
<categories>
¶ tab-separate file with two columns: ‘Variable’ and ‘category’
-
--cutoff
<cutoff>
¶ cutoff value for plotting the significance line
-
--fdr
<fdr>
¶ cutoff value to plot fdr-adjusted pvalue line
-
-o
,
--other
<other>
¶ other datasets to include in the plot
-
--nlabeled
<nlabeled>
¶ label top n points
-
--label
<label>
¶ label points by name
Arguments
-
EWAS_RESULT
¶
Required argument
-
OUTPUT
¶
Required argument