clarite.modify.colfilter_percent_zero

clarite.modify.colfilter_percent_zero(data: pandas.core.frame.DataFrame, filter_percent: float = 90.0, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)

Remove continuous variables which have <proportion> or more values of zero (excluding NA)

Parameters:
data: pd.DataFrame

The DataFrame to be processed and returned

filter_percent: float, default 90.0

If the percentage of rows in the data with a value of zero is greater than or equal to this value, the variable is filtered out.

skip: str, list or None (default is None)

List of variables that the filter should not be applied to

only: str, list or None (default is None)

List of variables that the filter should only be applied to

Returns:
data: pd.DataFrame

The filtered DataFrame

Examples

>>> import clarite
>>> nhanes_filtered = clarite.modify.colfilter_percent_zero(nhanes_filtered)
================================================================================
Running colfilter_percent_zero
--------------------------------------------------------------------------------
WARNING: 36 variables need to be categorized into a type manually
Testing 483 of 483 continuous variables
        Removed 30 (6.21%) tested continuous variables which were equal to zero in at least 90.00% of non-NA observations.