Simulation Model Notes

Modeling assumptions

Summary

k must be large (20~30) in order for variation in p to influence:

  1. the marker distribution(relevant plots)
  2. the proportion of observations with 'small' treatment effects (ie the proportion of events near the boundary). (relevant plot)
  3. the distribution of treatment effects (relevant plots)
  4. the value of theta (relevant plots)

This may be why the coverage with lowest performance did not have a strong association with a specific value of p. I wonder if minimizing over p for poor coverage is biasing our estimation of coverage downward. action: see if ci coverage is associated with p when k is high

Risk curves are fun to look at, but it is hard to draw any conclusions from them. Points 1 and 2 are more important than 3 and 4.

Risk Curves (top)

Risk by F(y) given trt by q0 and q1. True value of Theta in grey.

\( P = .1 \), \( k = 30 \).

plot of chunk unnamed-chunk-2

\( P = .1 \), \( k = 4 \).

plot of chunk unnamed-chunk-3

Treatment Effect Distribution(top)

Distribution of \( \Delta(y) = Pr(D=1|T=0, Y) - Pr(D-1|T=1, Y) \) by \( F(y) \).

\( K = 30 \)

plot of chunk unnamed-chunk-4

\( K = 4 \)

plot of chunk unnamed-chunk-5

Theta by p(top)

K is denoted by color

plot of chunk unnamed-chunk-6

Marker Distribution(top)

pdf of y colored by p. The vertical black line indicates where \( \Delta(y)=0 \).

\( K=30 \)

plot of chunk unnamed-chunk-7

\( K=4 \)

plot of chunk unnamed-chunk-8

Proportion of markers near the boundary(top)

This is a plot of p (x-axis) vs \( F(y_{\Delta=0} + 0.005) - F(y_{\Delta=0} - 0.005) \) where \( \Delta(y_{\Delta=0}) = 0 \). This is a very arbitrary measure of proportion of observations near the boundary of interest.

K is denoted by color.

plot of chunk unnamed-chunk-9