clarite.modify.colfilter_min_n¶

clarite.modify.colfilter_min_n(data: pandas.core.frame.DataFrame, n: int = 200, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)¶

Remove variables which have less than <n> non-NA values

Parameters:	data: pd.DataFrame The DataFrame to be processed and returned n: int, default 200 The minimum number of unique values required in order for a variable not to be filtered skip: str, list or None (default is None) List of variables that the filter should not be applied to only: str, list or None (default is None) List of variables that the filter should only be applied to
Returns:	data: pd.DataFrame The filtered DataFrame

Examples

>>> import clarite
>>> nhanes_filtered = clarite.modify.colfilter_min_n(nhanes)
================================================================================
Running colfilter_min_n
--------------------------------------------------------------------------------
WARNING: 36 variables need to be categorized into a type manually
Testing 362 of 362 binary variables
        Removed 12 (3.31%) tested binary variables which had less than 200 non-null values
Testing 47 of 47 categorical variables
        Removed 8 (17.02%) tested categorical variables which had less than 200 non-null values
Testing 483 of 483 continuous variables
        Removed 8 (1.66%) tested continuous variables which had less than 200 non-null values