clarite.modify.colfilter_percent_zero¶

clarite.modify.colfilter_percent_zero(data: pandas.core.frame.DataFrame, filter_percent: float = 90.0, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)¶

Remove continuous variables which have <proportion> or more values of zero (excluding NA)

Parameters:

data: pd.DataFrame: The DataFrame to be processed and returned
filter_percent: float, default 90.0: If the percentage of rows in the data with a value of zero is greater than or equal to this value, the variable is filtered out.
skip: str, list or None (default is None): List of variables that the filter should not be applied to
only: str, list or None (default is None): List of variables that the filter should only be applied to

Returns:

data: pd.DataFrame: The filtered DataFrame

Examples

>>> import clarite
>>> nhanes_filtered = clarite.modify.colfilter_percent_zero(nhanes_filtered)
================================================================================
Running colfilter_percent_zero
--------------------------------------------------------------------------------
WARNING: 36 variables need to be categorized into a type manually
Testing 483 of 483 continuous variables
        Removed 30 (6.21%) tested continuous variables which were equal to zero in at least 90.00% of non-NA observations.