clarite.modify.colfilter_percent_zero¶
-
clarite.modify.
colfilter_percent_zero
(data: pandas.core.frame.DataFrame, filter_percent: float = 90.0, skip: Union[str, List[str], None] = None, only: Union[str, List[str], None] = None)¶ Remove continuous variables which have <proportion> or more values of zero (excluding NA)
Parameters: - data: pd.DataFrame
The DataFrame to be processed and returned
- filter_percent: float, default 90.0
If the percentage of rows in the data with a value of zero is greater than or equal to this value, the variable is filtered out.
- skip: str, list or None (default is None)
List of variables that the filter should not be applied to
- only: str, list or None (default is None)
List of variables that the filter should only be applied to
Returns: - data: pd.DataFrame
The filtered DataFrame
Examples
>>> import clarite >>> nhanes_filtered = clarite.modify.colfilter_percent_zero(nhanes_filtered) ================================================================================ Running colfilter_percent_zero -------------------------------------------------------------------------------- WARNING: 36 variables need to be categorized into a type manually Testing 483 of 483 continuous variables Removed 30 (6.21%) tested continuous variables which were equal to zero in at least 90.00% of non-NA observations.