---
title: "Matched Set Objects"
author: "In Song Kim, Adam Rauh, Erik Wang, Kosuke Imai"
date: "`r Sys.Date()`"
output:
  html_document:
    df_print: paged
  pdf_document:
    number_sections: no
  fig_width: 6 
  fig_height: 4 
vignette: |
  %\VignetteIndexEntry{Matched Set Objects} 
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


This vignette will take a more detailed look at the `matched.set` object, which is the core object within the package that captures all of the information associated with treated units and their matched control units. 

First, we will create a smaller subset of the `dem` data set, which is included in the package. This is just to make our results easier to read. We use the DisplayTreatment function to get a sense of treatment variation within the subset of data. 
```{r}
library(PanelMatch)

uid <-unique(dem$wbcode2)[1:10]
subdem <- dem[dem$wbcode2 %in% uid, ]
DisplayTreatment(unit.id = "wbcode2", time.id = "year", treatment = 'dem', data = subdem)
```


### PanelMatch and matched.set objects

We can use the `PanelMatch` function with `refinement.method` set to `none` to obtain a `PanelMatch` object, from which we will extract a `matched.set` object. `PanelMatch` returns an S3 object of the `PanelMatch` class. These objects are just lists with some additional attributes. Here, we will focus on one element contained within `PanelMatch` objects: `matched.set` objects. Within the `PanelMatch` object, this element is always named either `att` or `atc`. When `qoi = ate`, then there are two `matched.set` objects included in the resulting `PanelMatch` call. Specifically, there will be two matched sets named `att` and `atc`, respectively.

```{r}

PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2", 
                         treatment = "dem", refinement.method = "none", 
                         data = subdem, match.missing = TRUE, 
                         qoi = "att" ,outcome.var = "y",
                         lead = 0, forbid.treatment.reversal = FALSE)

#Extract the matched.set object 
msets <- PM.results$att                         

```

### What are matched.set objects? 
In implementation, the `matched.set` is just a named list with some added attributes (lag, names of treatment, unit, and time variables) and a structured name scheme. Each entry in the list is a vector containing the unit ids of control units that are in a matched set. Additionally, each entry corresponds to a time/unit id pair (the unit id of a treated unit and the time at which treatment occurred). This is reflected in the names of each element of the list, as the name scheme `[id varable]`.`[time variable]` is used. 

Matched set objects are implemented as lists, but the default printing behavior resembles that of a data frame. One can toggle a `verbose` option on the `print` method to print as a list and also display a less summarized version of the matched set data.


```{r}
# observe the naming scheme
names(msets)

#data frame printing view: useful as a summary view with large data sets
# first column is unit id variable, second is time variable, and 
# third is the number of controls in that matched set
print(msets)

# prints as a list, shows all data at once
print(msets, verbose = TRUE)
```

### Control unit weights

Note that in the verbose print view above, one can see that each control unit will have an associated weight. These are the weights that are assigned from the refinement process. In this example, no refinement has been applied so each control unit in a matched set will have equal weight. See the `Using PanelMatch` vignette for more about refinement. 

Weights are attributes for each element in the `matched.set` list. As such, they can also be extracted as follows:

```{r}
# weights for control units in first matched set
attr(msets[[1]], "weights")
```

Note that this returns a vector of weights. The names of each element in the vector corresponds to the control unit that weight is associated with. Weights are normalized should always sum to 1 within each matched set. 

### Subsetting matched.set objects.

The '[' and '[[' operators are implemented and should work intuitively. 

Using '[' returns a subsetted `matched.set` object (list). The additional attributes will be copied and transferred as well with the custom operator. Note how, by default, it prints like the full form of the `matched.set`. Using '[[' will return the unit ids of the control units in the specified matched set.

Since `matched.set` objects are just lists with attributes, you can expect the `[` and `[[` functions to work similarly to how they would with a list. So, for instance, users can extract information about matched sets using numerical indices or by taking advantage of the naming scheme. 

```{r}
msets[1]

#prints the control units in this matched set
msets[[1]]


msets["4.1992"] #equivalent to msets[1]

msets[["4.1992"]] #equivalent to msets[[1]]

```

### Plotting with matched.set objects

Calling `plot` on a `matched.set` object will display a histogram of the sizes of the matched sets. By default, the number of empty matched sets (treated unit/time id pairs with no suitable controls for a match) is noted with a vertical bar at x = 0. One can include empty sets in the histogram by setting the `include.empty.sets` argument to `TRUE`

```{r}
plot(msets, xlim = c(0, 4))

# Use full data
PM.results.full <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2", 
                         treatment = "dem", refinement.method = "none", 
                         data = dem, match.missing = TRUE, 
                         qoi = "att" ,outcome.var = "y",
                         lead = 0, forbid.treatment.reversal = FALSE)

#Extract the matched.set object 
plot(PM.results.full$att)
```

### Summarizing matched set data

The `summary` function provides a variety of information about the sizes of matched sets, the unit and time ids of treated units, the number of empty sets, and the lag window size. The `summary` function also has an option to print only the `overview` data frame. Toggle this by setting `verbose = FALSE`

```{r}
print(summary(msets))

print(summary(msets, verbose = FALSE))
```


### Matched sets with the `DisplayTreatment` function
Passing a matched set (one treated unit and its corresponding set of controls) to the `DisplayTreatment` function will visually highlight the lag window histories used to create that matched set. There is also an option to only display units from the matched set (and the treated unit), which can be achieved by setting `show.set.only` to `TRUE`.

```{r}
#pass matched.set object using the `[` operator
DisplayTreatment(unit.id = "wbcode2", time.id = "year", treatment = 'dem', data = subdem, matched.set = msets[1])

# only show matched set units
DisplayTreatment(unit.id = "wbcode2", time.id = "year", treatment = 'dem', 
					data = subdem, matched.set = msets[1], 
					show.set.only = TRUE, y.size = 15, x.size = 13)
```