clarite.modify.colfilter_min_cat_n¶
-
clarite.modify.
colfilter_min_cat_n
(data, n: int = 200, skip: Union[str, List[str], None] = None, only: Union[str, List[str], None] = None)¶ Remove binary and categorical variables which have less than <n> occurences of each unique value
Parameters: - data: pd.DataFrame
The DataFrame to be processed and returned
- n: int, default 200
The minimum number of occurences of each unique value required in order for a variable not to be filtered
- skip: str, list or None (default is None)
List of variables that the filter should not be applied to
- only: str, list or None (default is None)
List of variables that the filter should only be applied to
Returns: - data: pd.DataFrame
The filtered DataFrame
Examples
>>> import clarite >>> nhanes_filtered = clarite.modify.colfilter_min_cat_n(nhanes) ================================================================================ Running colfilter_min_cat_n -------------------------------------------------------------------------------- WARNING: 36 variables need to be categorized into a type manually Testing 362 of 362 binary variables Removed 248 (68.51%) tested binary variables which had a category with less than 200 values Testing 47 of 47 categorical variables Removed 36 (76.60%) tested categorical variables which had a category with less than 200 values