clarite.modify.remove_outliers¶
-
clarite.modify.
remove_outliers
(data, method: str = 'gaussian', cutoff=3, skip: Union[str, List[str], NoneType] = None, only: Union[str, List[str], NoneType] = None)¶ Remove outliers from continuous variables by replacing them with np.nan
Parameters: - data: pd.DataFrame
The DataFrame to be processed and returned
- method: string, ‘gaussian’ (default) or ‘iqr’
Define outliers using a gaussian approach (standard deviations from the mean) or inter-quartile range
- cutoff: positive numeric, default of 3
Either the number of standard deviations from the mean (method=’gaussian’) or the multiple of the IQR (method=’iqr’) Any values equal to or more extreme will be replaced with np.nan
- skip: str, list or None (default is None)
List of variables that the replacement should not be applied to
- only: str, list or None (default is None)
List of variables that the replacement should only be applied to
Examples
>>> import clarite >>> nhanes_rm_outliers = clarite.modify.remove_outliers(nhanes, method='iqr', cutoff=1.5, only=['DR1TVB1', 'URXP07', 'SMQ077']) ================================================================================ Running remove_outliers -------------------------------------------------------------------------------- WARNING: 36 variables need to be categorized into a type manually Removing outliers from 2 continuous variables with values < 1st Quartile - (1.5 * IQR) or > 3rd quartile + (1.5 * IQR) Removed 0 low and 430 high IQR outliers from URXP07 (outside -153.55 to 341.25) Removed 0 low and 730 high IQR outliers from DR1TVB1 (outside -0.47 to 3.48) >>> nhanes_rm_outliers = clarite.modify.remove_outliers(nhanes, only=['DR1TVB1', 'URXP07']) ================================================================================ Running remove_outliers -------------------------------------------------------------------------------- WARNING: 36 variables need to be categorized into a type manually Removing outliers from 2 continuous variables with values more than 3 standard deviations from the mean Removed 0 low and 42 high gaussian outliers from URXP07 (outside -1,194.83 to 1,508.13) Removed 0 low and 301 high gaussian outliers from DR1TVB1 (outside -1.06 to 4.27)