clarite.survey.SurveyDesignSpec

class clarite.survey.SurveyDesignSpec(survey_df: pandas.core.frame.DataFrame, strata: Optional[str] = None, cluster: Optional[str] = None, nest: bool = False, weights: Union[str, Dict[str, str]] = None, fpc: Optional[str] = None, single_cluster: Optional[str] = 'error')

Holds parameters for building a statsmodels SurveyDesign object

Parameters:
survey_df: pd.DataFrame

A DataFrame containing Cluster, Strata, and/or weights data

strata: string or None

The name of the strata variable in the survey_df

cluster: string or None

The name of the cluster variable in the survey_df

nest: bool, default False

Whether or not the clusters are nested in the strata (The same cluster IDs are repeated in different strata)

weights: string or dictionary(string:string)

The name of the weights variable in the survey_df, or a dictionary mapping variable names to weight names

fpc: string or None

The name of the variable in the survey_df that contains the finite population correction information. This reduces variance when a substantial portion of the population is sampled. May be specified as the total population size, or the fraction of the population that was sampled.

single_cluster: str

Setting controlling variance calculation in single-cluster (‘lonely psu’) strata ‘error’: default, throw an error ‘scaled’: use the average value of other strata ‘centered’: use the average of all observations ‘certainty’: that strata doesn’t contribute to the variance

Examples

>>> import clarite
>>> clarite.analyze.SurveyDesignSpec(survey_df=survey_design_replication,
                                     strata="SDMVSTRA",
                                     cluster="SDMVPSU",
                                     nest=True,
                                     weights=weights_replication,
                                     fpc=None,
                                     single_cluster='scaled')
__init__(self, survey_df: pandas.core.frame.DataFrame, strata: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, nest: bool = False, weights: Union[str, Dict[str, str]] = None, fpc: Union[str, NoneType] = None, single_cluster: Union[str, NoneType] = 'error')

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, survey_df, strata, …) Initialize self.
get_survey_design(self, regression_variable, …) Build a survey design based on the regression variable