clarite.modify.colfilter_min_cat_n¶

clarite.modify.colfilter_min_cat_n(data, n: int = 200, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)¶

Remove binary and categorical variables which have less than <n> occurences of each unique value

Parameters:

data: pd.DataFrame: The DataFrame to be processed and returned
n: int, default 200: The minimum number of occurences of each unique value required in order for a variable not to be filtered
skip: str, list or None (default is None): List of variables that the filter should not be applied to
only: str, list or None (default is None): List of variables that the filter should only be applied to

Returns:

data: pd.DataFrame: The filtered DataFrame

Examples

>>> import clarite
>>> nhanes_filtered = clarite.modify.colfilter_min_cat_n(nhanes)
================================================================================
Running colfilter_min_cat_n
--------------------------------------------------------------------------------
WARNING: 36 variables need to be categorized into a type manually
Testing 362 of 362 binary variables
        Removed 248 (68.51%) tested binary variables which had a category with less than 200 values
Testing 47 of 47 categorical variables
        Removed 36 (76.60%) tested categorical variables which had a category with less than 200 values