# Simulation Model Notes

Y: mixture of normals with: \( f(y) = p\phi(0, c1) + (1-p)\phi(0, c2) \)

- \( p\in(0,1) \)
- \( k = c2/c1 \)
- \( var(Y) = 1 \).

\( logit(Pr(D = 1 | T,Y)) = \beta_0 + \beta_1T +\psi_1Y + \psi_2YT \)

- \( \psi_1 = \psi_2=1 \) and \( \beta_0, \beta_1 \) chosen to fix:

- \( q_0 = expit(\beta_0) \)
- \( q_1 = expit(\beta_0 + \beta_1) \)

## Summary

k must be large (20~30) in order for variation in p to influence:

- the marker distribution(relevant plots)
- the proportion of observations with 'small' treatment effects (ie the proportion of events near the boundary). (relevant plot)
- the distribution of treatment effects (relevant plots)
- the value of theta (relevant plots)

*This may be why the coverage with lowest performance did not have a strong association with a specific value of p. I wonder if minimizing over p for poor coverage is biasing our estimation of coverage downward.* *action: see if ci coverage is associated with p when k is high*

Risk curves are fun to look at, but it is hard to draw any conclusions from them. Points 1 and 2 are more important than 3 and 4.

## Risk Curves (top)

Risk by F(y) given trt by q0 and q1. True value of Theta in grey.

### \( P = .1 \), \( k = 30 \).

### \( P = .1 \), \( k = 4 \).

Distribution of \( \Delta(y) = Pr(D=1|T=0, Y) - Pr(D-1|T=1, Y) \) by \( F(y) \).

### \( K = 30 \)

### \( K = 4 \)

### K is denoted by color

pdf of y colored by p. The vertical black line indicates where \( \Delta(y)=0 \).

### \( K=30 \)

### \( K=4 \)

This is a plot of p (x-axis) vs \( F(y_{\Delta=0} + 0.005) - F(y_{\Delta=0} - 0.005) \) where \( \Delta(y_{\Delta=0}) = 0 \). This is a very arbitrary measure of proportion of observations near the boundary of interest.

### K is denoted by color.