Survey¶

Complex survey design

class clarite.survey.SurveyDesignSpec(survey_df: pandas.core.frame.DataFrame, strata: Optional[str] = None, cluster: Optional[str] = None, nest: bool = False, weights: Optional[Union[str, Dict[str, str]]] = None, fpc: Optional[str] = None, single_cluster: Optional[str] = 'fail', drop_unweighted: bool = False)¶
Holds parameters for building a statsmodels SurveyDesign object

Parameters

survey_df: pd.DataFrame
A DataFrame containing Cluster, Strata, and/or weights data. This should include all observations in the data analyzed using it (matching via index value)

strata: string or None
The name of the strata variable in the survey_df

cluster: string or None
The name of the cluster variable in the survey_df

nest: bool, default False
Whether or not the clusters are nested in the strata (The same cluster IDs are repeated in different strata)

weights: string or dictionary(string:string)
The name of the weights variable in the survey_df, or a dictionary mapping variable names to weight names

fpc: string or None
The name of the variable in the survey_df that contains the finite population correction information. This reduces variance when a substantial portion of the population is sampled. May be specified as the total population size, or the fraction of the population that was sampled.

single_cluster: {‘fail’, ‘adjust’, ‘average’, ‘certainty’}
Setting controlling variance calculation in single-cluster (‘lonely psu’) strata ‘fail’: default, throw an error ‘adjust’: use the average of all observations (more conservative) ‘average’: use the average value of other strata ‘certainty’: that strata doesn’t contribute to the variance (0 variance)

drop_unweighted: bool, default False
If True, drop observations that are missing a weight value. This may not be statistically sound. Otherwise the result for variables with missing weights (when the variable is not missing) is NULL.

Examples
>>> import clarite
>>> clarite.analyze.SurveyDesignSpec(survey_df=survey_design_replication,
                                     strata="SDMVSTRA",
                                     cluster="SDMVPSU",
                                     nest=True,
                                     weights=weights_replication,
                                     fpc=None,
                                     single_cluster='fail')