match module

This module implements several variants of matching: one-to-one matching, one-to-many matching, with or without a caliper, and without or without replacement. Variants of the methods are examined in Austin (2014).

Austin, P. C. (2014), A comparison of 12 algorithms for matching on the propensity score. Statistic. Med., 33: 1057-1069.

class pscore_match.match.Match(groups, propensity)[source]
Parameters:
  • groups (array-like) – treatment assignments, must be 2 groups
  • propensity (array-like) – object containing propensity scores for each observation. Propensity and groups should be in the same order (matching indices)
create(method='one-to-one', **kwargs)[source]
Parameters:
  • method (string) – ‘one-to-one’ (default) or ‘many-to-one’
  • caliper_scale (string) – “propensity” (default) if caliper is a maximum difference in propensity scores, “logit” if caliper is a maximum SD of logit propensity, or “none” for no caliper
  • caliper (float) – specifies maximum distance (difference in propensity scores or SD of logit propensity)
  • replace (bool) – should individuals from the larger group be allowed to match multiple individuals in the smaller group? (default is False)
Returns:

  • A series containing the individuals in the control group matched to the treatment group.
  • Note that with caliper matching, not every treated individual may have a match.

plot_balance(covariates, test=['t', 'rank'], filename='balance-plot', **kwargs)[source]

Plot the p-values for covariate balance before and after matching

Parameters:
  • matches (Match) – Match class object with matches already fit
  • covariates (DataFrame) – Dataframe for with all observations and one covariate per column.
  • test (array-like or str) – Statistical test to compare treatment and control covariate distributions. Options are ‘t’ for a two sample t-test or ‘rank’ for Wilcoxon rank sum test
  • filename (str) – Optional, name of file to save plot in. Default ‘balance-plot’
  • kwargs (dict) – Key word arguments to pass into plotly.offline.plot
Returns:

Return type:

None

Notes

Creates a file with given filename

pscore_match.match.rank_test(covariates, groups)[source]

Wilcoxon rank sum test for the distribution of treatment and control covariates.

Parameters:
  • covariates (DataFrame) – Dataframe with one covariate per column. If matches are with replacement, then duplicates should be included as additional rows.
  • groups (array-like) – treatment assignments, must be 2 groups
Returns:

Return type:

A list of p-values, one for each column in covariates

pscore_match.match.t_test(covariates, groups)[source]

Two sample t test for the distribution of treatment and control covariates

Parameters:
  • covariates (DataFrame) – Dataframe with one covariate per column. If matches are with replacement, then duplicates should be included as additional rows.
  • groups (array-like) – treatment assignments, must be 2 groups
Returns:

Return type:

A list of p-values, one for each column in covariates

pscore_match.match.whichMatched(matches, data, show_duplicates=True)[source]

Simple function to convert output of Matches to DataFrame of all matched observations

Parameters:
  • matches (Match) – Match class object with matches already fit
  • data (DataFrame) – Dataframe with unique rows, for which we want to create new matched data. This may be a dataframe of covariates, treatment, outcome, or any combination.
  • show_duplicates (bool) – Should repeated matches be included as multiple rows? Default is True. If False, then duplicates appear as one row but a column of weights is added.
Returns:

  • DataFrame containing only the treatment group and matched controls,
  • with the same columns as input data