clarite.modify.make_binary

clarite.modify.make_binary(data: pandas.core.frame.DataFrame, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)

Set variable types as Binary

Checks that each variable has at most 2 values and converts the type to pd.Categorical.

Note: When these variables are used in regression, they are ordered by value. For example, Sex (Male=1, Female=2) will encode “Male” as 0 and “Female” as 1 during the EWAS regression step.

Parameters:
data: pd.DataFrame or pd.Series

Data to be processed

skip: str, list or None (default is None)

List of variables that should not be made binary

only: str, list or None (default is None)

List of variables that are the only ones to be made binary

Returns:
data: pd.DataFrame

DataFrame with the same data but validated and converted to binary types

Examples

>>> import clarite
>>> nhanes = clarite.modify.make_binary(nhanes, only=['female', 'black', 'mexican', 'other_hispanic'])
================================================================================
Running make_binary
--------------------------------------------------------------------------------
Set 4 of 970 variable(s) as binary, each with 22,624 observations