rmda
The package rmda
(risk model decision analysis) provides tools to evaluate the value of using a risk prediction instrument to decide treatment or intervention (versus no treatment or intervention). Given one or more risk prediction instruments (risk models) that estimate the probability of a binary outcome, rmda
provides functions to estimate and display decision curves and other figures that help assess the population impact of using a risk model for clinical decision making. Here, “population” refers to the relevant patient population.
Decision curves display estimates of the (standardized) net benefit over a range of probability thresholds used to categorize observations as ‘high risk.’ The curves help evaluate a treatment policy that recommends treatment for patients who are estimated to be ‘high risk’ by comparing the population impact of a risk-based policy to “treat all” and “treat none” intervention policies. Curves can be estimated using data from a prospective cohort. In addition, rmda
can estimate decision curves using data from a case-control study if an estimate of the population outcome prevalence is available. Version 1.5 and higher of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used to recommend that low-risk patients (i.e., patients below some risk threshold) opt-out of treatment.
Confidence intervals calculated using the bootstrap can be computed and displayed. A wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
Key functions are:
decision_curve
: Estimate (standardized) net benefit curves with bootstrap confidence intervals.
plot_decision_curve
: Plot a decision curve or multiple curves.
plot_clinical_impact
and plot_roc_components
: Alternative plots for the output of decision_curve
showing measures of clinical impact or the components of the ROC curve (true/false positive rates) across a range of risk thresholds. See help files or tutorial for more info.
cv_decision_curve
: Calculate k-fold cross-validated estimates of a decision curve and its components.
The easiest way to get the package is directly from CRAN:
install.packages("rmda")
You may also download the current version of the package here:
https://github.com/mdbrown/rmda/releases
navigate to the source package and use
install.packages("../rmda_1.5.tar.gz", repos = NULL, type = "source")
or install the package directly from github using devtools.
## install.packages("devtools")
library(devtools)
install_github("mdbrown/rmda")
Load the package and the provided simulated data set.
library(rmda)
#load simulated data
data(dcaData)
head(dcaData)
## # A tibble: 6 x 6
## Age Female Smokes Marker1 Marker2 Cancer
## <int> <dbl> <lgl> <dbl> <dbl> <int>
## 1 33 1 FALSE 0.2453110 1.02108482 0
## 2 29 1 FALSE 0.9429660 -0.25576212 0
## 3 28 1 FALSE 0.7735939 0.33184401 0
## 4 27 0 FALSE 0.4063595 -0.00568574 0
## 5 23 1 FALSE 0.5075153 0.20753273 0
## 6 35 1 FALSE 0.1856706 1.41251020 0
First we use the function decision_curve
to create a decision curve object for a logistic model to predict cancer status using age, gender and smoking status. We then plot it using plot_decision_curve
.
set.seed(123)
#first use rmda with the default settings (set bootstraps = 50 here to reduce computation time).
baseline.model <- decision_curve(Cancer~Age + Female + Smokes, #fitting a logistic model
data = dcaData,
study.design = "cohort",
policy = "opt-in", #default
bootstraps = 50)
#plot the curve
plot_decision_curve(baseline.model, curve.names = "baseline model")
Next, we create a decision curve with two markers added to the original baseline model. We then pass a list with both decision curves to plot_decision_curve
to plot both curves.
set.seed(123)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
bootstraps = 50)
#since we want to plot more than one curve, we pass a list of 'decision_curve' objects to the plot
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"), xlim = c(0, 1), legend.position = "bottomright")
#see all available options using
?decision_curve
?plot_decision_curve
We include two other functions plot_roc_components
and plot_clinical_impact
.
plot_roc_components
plots the components of the ROC curve-true positive rate and false positive rates-over a range of high risk thresholds.
If we were to use the specified model to classify 1,000 hypothetical subjects, plot_clinical_impact
plots the number classified as high risk and the number with the outcome classified as high risk for a given high risk threshold.
#plot the components of the roc curve--true positive rate and false positive rate
plot_roc_components(full.model, xlim = c(0, 0.4),
col = c("black", "red"))
#plot the clinical impact
plot_clinical_impact(full.model, xlim = c(0, .4),
col = c("black", "blue"))
For many clinical contexts, a risk model is used in one of two ways. In the first setting, the standard-of-care for a population is to assign a particular ‘treatment’ to no one. Clinicians then use a risk model to categorize patients as ‘high-risk’, with the recommendation to treat high-risk patients with some intervention. We refer to this use of a risk model as an ‘opt-in’ policy. The relative costs and benefits of such a policy are specified as the ‘cost’ of treating an individual without the event (a control) compared to the ‘benefit’ of treating an observation with the event (a case).
The standardized net benefit for an ‘opt-in’ policy is defined in terms of the true and false positive rates, the outcome prevalence \(\rho\), and the risk threshold \(r\).
\[ sNB^{(opt-in)}(r) = TPR(r) - \Big(\frac{1-\rho}{\rho}\Big)\Big(\frac{r}{1-r}\Big)FPR(r) \] Alternatively, there are contexts where the standard-of-care is to recommend a treatment to an entire patient population. The potential use of a risk model in this setting is to identify patients who are ‘low-risk’ and recommend that those patients ‘opt-out’ of treatment.
The standardized net benefit for an ‘opt-out’ policy is defined in terms of the true and false negative rates, the outcome prevalence , and the risk threshold \(r\):
\[ sNB^{(opt-out)}(r) =TNR(r) - \Big(\frac{\rho}{ 1-\rho} \Big)\Big(\frac{1-r}{r}\Big)FNR(r) \]
So far in this tutorial we have calculated the net benefit curves using the default policy = 'opt-in'
setting. To calculate the net benefit for an ‘opt-out’ policy, we set policy = 'opt-out'
when using the decision_curve
function.
set.seed(123)
opt.out.dc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
bootstraps = 50,
policy = 'opt-out') #set policy = 'opt-out' (default is 'opt-in')
plot_decision_curve( opt.out.dc, xlim = c(0, 1), ylim = c(-.2,2),
standardize = FALSE,
curve.names = "model", legend = 'bottomright')
The clinical impact plots are also influenced by the policy
argument. When we set policy = ‘opt-out’ the clinical impact plot displays the number considered low risk at each threshold instead of the number high risk.
plot_clinical_impact( opt.out.dc, col = c("black", "blue"), legend = 'bottomright')
We show several examples of how one might change the default settings.
baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData,
thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment.
bootstraps = 25)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment.
bootstraps = 25)
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
lty = c(1,2),
lwd = c(3,2, 2, 1), # the first two correspond to the decision curves, then 'all' and then 'none'
legend.position = "bottomright") #adjust the legend position
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
confidence.intervals = FALSE, #remove confidence intervals
cost.benefit.axis = FALSE, #remove cost benefit axis
legend.position = "none") #remove the legend
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
cost.benefits = c("1:1000", "1:4", "1:9", "2:3", "1:3"), #set specific cost benefits
legend.position = "bottomright")
baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData,
thresholds = seq(0, .4, by = .01),
confidence.intervals = 0.9, #calculate 90% confidence intervals
bootstraps = 25)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .40, by = .01),
confidence.intervals = 0.9, #calculate 90% confidence intervals
bootstraps = 25)
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
ylim = c(-0.05, 0.15), #set ylim
lty = c(2,1),
standardize = FALSE, #plot Net benefit instead of standardized net benefit
legend.position = "topright")
If a risk model has already been specified, and so no model fitting is needed, the user can specify fitted.risks=TRUE
and provide them in the formula. No model fitting will be done and bootstrap confidence intervals will be conditional on the fitted model.
#helper function
expit <- function(xx) exp(xx)/ (1+exp(xx))
# Assume we have access to previously published models
# (or models built using a training set)
# that we can use to predict the risk of cancer.
# Basic model using demographic variables: Age, Female, Smokes.
dcaData$BasicModel <- with(dcaData, expit(-7.3 + 0.18*Age - 0.08*Female + 0.80*Smokes ) )
# Model using demographic + markers : Age, Female, Smokes, Marker1 and Marker2.
dcaData$FullModel <- with(dcaData, expit(-10.5 + 0.22*Age - 0.01*Female + 0.91*Smokes + 2.03*Marker1 - 1.56*Marker2))
full.model <- decision_curve(Cancer~FullModel,
data = dcaData,
fitted.risk = TRUE,
thresholds = seq(0, .4, by = .05),
bootstraps = 25)
plot_decision_curve(full.model, legend.position = "none")
decision_curve
also outputs all calculations and point estimates when assigned to an object.
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(.01, .99, by = .05),
bootstraps = 25, policy = "opt-out")
The best way to look at point estimates (w/ci’s) is to use summary
.
summary(full.model, measure = "NB") #outputs standardized net benefit by default
##
## Net Benefit (95% Confidence Intervals):
## --------------------------------------------------------------------------------------------
## risk cost:benefit percent All Cancer ~ Age + Female + Smokes None
## threshold ratio low risk + Marker1 + Marker2
## ----------- -------------- -------------- -------- -------------------------------- --------
## 0.01 1:99 24.8 0 -0.152 -11
## (13.4, 44.4) (0, 0) (-0.532, 0.29)
##
## 0.06 3:47 60.2 0 0.335 -1
## (49.4, 72.2) (0, 0) (0.194, 0.555)
##
## 0.11 11:89 70.2 0 0.52 -0.091
## (63.2, 80.4) (0, 0) (0.341, 0.695)
##
## 0.16 4:21 75.6 0 0.618 0.25
## (70.4, 84.2) (0, 0) (0.504, 0.717)
##
## 0.21 21:79 79.6 0 0.653 0.429
## (75.2, 87.6) (0, 0) (0.558, 0.762)
##
## 0.26 13:37 84 0 0.655 0.538
## (79.2, 90.8) (0, 0) (0.615, 0.773)
##
## 0.31 31:69 86.2 0 0.688 0.613
## (82.6, 93.2) (0, 0) (0.639, 0.792)
##
## 0.36 9:16 89 0 0.718 0.667
## (84.8, 94.6) (0, 0) (0.665, 0.822)
##
## 0.41 41:59 92.4 0 0.763 0.707
## (86.4, 96.6) (0, 0) (0.703, 0.838)
##
## 0.46 23:27 93.8 0 0.768 0.739
## (89.8, 97.8) (0, 0) (0.717, 0.845)
##
## 0.51 51:49 95 0 0.785 0.765
## (91.2, 98) (0, 0) (0.742, 0.848)
##
## 0.56 14:11 96.2 0 0.798 0.786
## (93.2, 99.2) (0, 0) (0.758, 0.861)
##
## 0.61 61:39 96.8 0 0.817 0.803
## (94.2, 99.2) (0, 0) (0.771, 0.867)
##
## 0.66 33:17 97.2 0 0.827 0.818
## (95.2, 99.2) (0, 0) (0.786, 0.879)
##
## 0.71 71:29 97.8 0 0.837 0.831
## (95.6, 99.4) (0, 0) (0.795, 0.888)
##
## 0.76 19:6 98.4 0 0.845 0.842
## (96, 99.4) (0, 0) (0.809, 0.895)
##
## 0.81 81:19 98.8 0 0.855 0.852
## (97.2, 100) (0, 0) (0.82, 0.9)
##
## 0.86 43:7 99.2 0 0.862 0.86
## (98, 100) (0, 0) (0.83, 0.905)
##
## 0.91 91:9 100 0 0.868 0.868
## (98.2, 100) (0, 0) (0.84, 0.908)
##
## 0.96 24:1 100 0 0.875 0.875
## (100, 100) (0, 0) (0.848, 0.913)
## --------------------------------------------------------------------------------------------
you can also choose which measure to report, and how many decimal places to print. The options for the measure
argument are ‘TPR’, ‘FPR’, ‘sNB’, and ‘NB’.
#output true positive rate
summary(full.model, nround = 2, measure = "TPR")
##
## Sensitivity (TPR) (95% Confidence Intervals):
## ------------------------------------------------------------------------------------------
## risk cost:benefit percent All Cancer ~ Age + Female + Smokes None
## threshold ratio low risk + Marker1 + Marker2
## ----------- -------------- -------------- -------- -------------------------------- ------
## 0.01 1:99 24.8 1 0.97 0
## (13.4, 44.4) (1, 1) (0.94, 1)
##
## 0.06 3:47 60.2 1 0.87 0
## (49.4, 72.2) (1, 1) (0.81, 0.93)
##
## 0.11 11:89 70.2 1 0.83 0
## (63.2, 80.4) (1, 1) (0.75, 0.92)
##
## 0.16 4:21 75.6 1 0.82 0
## (70.4, 84.2) (1, 1) (0.72, 0.87)
##
## 0.21 21:79 79.6 1 0.75 0
## (75.2, 87.6) (1, 1) (0.56, 0.83)
##
## 0.26 13:37 84 1 0.6 0
## (79.2, 90.8) (1, 1) (0.51, 0.76)
##
## 0.31 31:69 86.2 1 0.55 0
## (82.6, 93.2) (1, 1) (0.31, 0.7)
##
## 0.36 9:16 89 1 0.48 0
## (84.8, 94.6) (1, 1) (0.25, 0.66)
##
## 0.41 41:59 92.4 1 0.45 0
## (86.4, 96.6) (1, 1) (0.23, 0.57)
##
## 0.46 23:27 93.8 1 0.35 0
## (89.8, 97.8) (1, 1) (0.19, 0.54)
##
## 0.51 51:49 95 1 0.3 0
## (91.2, 98) (1, 1) (0.17, 0.51)
##
## 0.56 14:11 96.2 1 0.23 0
## (93.2, 99.2) (1, 1) (0.08, 0.43)
##
## 0.61 61:39 96.8 1 0.23 0
## (94.2, 99.2) (1, 1) (0.08, 0.33)
##
## 0.66 33:17 97.2 1 0.2 0
## (95.2, 99.2) (1, 1) (0.08, 0.32)
##
## 0.71 71:29 97.8 1 0.17 0
## (95.6, 99.4) (1, 1) (0.05, 0.28)
##
## 0.76 19:6 98.4 1 0.12 0
## (96, 99.4) (1, 1) (0.05, 0.28)
##
## 0.81 81:19 98.8 1 0.1 0
## (97.2, 100) (1, 1) (0, 0.21)
##
## 0.86 43:7 99.2 1 0.07 0
## (98, 100) (1, 1) (0, 0.19)
##
## 0.91 91:9 100 1 0 0
## (98.2, 100) (1, 1) (0, 0.16)
##
## 0.96 24:1 100 1 0 0
## (100, 100) (1, 1) (0, 0)
## ------------------------------------------------------------------------------------------
You can also access the data directly. full.model$derived.data
holds the net benefit and it’s components.
head(full.model$derived.data)
## thresholds FPR FNR TPR TNR NB
## 1 0.01 0.7227273 0.03333333 0.9666667 0.2772727 -0.1520000
## 2 0.06 0.3340909 0.13333333 0.8666667 0.6659091 0.3353333
## 3 0.11 0.2250000 0.16666667 0.8333333 0.7750000 0.5201818
## 4 0.16 0.1659091 0.18333333 0.8166667 0.8340909 0.6185000
## 5 0.21 0.1295455 0.25000000 0.7500000 0.8704545 0.6531429
## 6 0.26 0.1000000 0.40000000 0.6000000 0.9000000 0.6553846
## sNB rho prob.high.risk prob.low.risk DP nonDP
## 1 -0.1727273 0.12 0.752 0.248 0.116 0.244
## 2 0.3810606 0.12 0.398 0.602 0.104 0.586
## 3 0.5911157 0.12 0.298 0.702 0.100 0.682
## 4 0.7028409 0.12 0.244 0.756 0.098 0.734
## 5 0.7422078 0.12 0.204 0.796 0.090 0.766
## 6 0.7447552 0.12 0.160 0.840 0.072 0.792
## model FPR_lower FPR_upper
## 1 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.51746725 0.8430913
## 2 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.22270742 0.4426230
## 3 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.13537118 0.2974239
## 4 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.10262009 0.2131148
## 5 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.06986900 0.1615925
## 6 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.05251641 0.1305361
## FNR_lower FNR_upper TPR_lower TPR_upper TNR_lower TNR_upper NB_lower
## 1 0.00000000 0.05970149 0.9402985 1.0000000 0.1569087 0.4825328 -0.5320000
## 2 0.06666667 0.18604651 0.8139535 0.9333333 0.5573770 0.7772926 0.1940000
## 3 0.08333333 0.25000000 0.7500000 0.9166667 0.7025761 0.8646288 0.3410909
## 4 0.13432836 0.27906977 0.7209302 0.8656716 0.7868852 0.8973799 0.5040000
## 5 0.17241379 0.43750000 0.5625000 0.8275862 0.8384075 0.9301310 0.5580000
## 6 0.24074074 0.48837209 0.5116279 0.7592593 0.8694639 0.9474836 0.6150769
## NB_upper sNB_lower sNB_upper rho_lower rho_upper prob.high.risk_lower
## 1 0.2900000 -0.6143187 0.3317865 0.084 0.146 0.556
## 2 0.5553333 0.2271663 0.6062591 0.084 0.146 0.278
## 3 0.6949091 0.3994039 0.7586344 0.084 0.146 0.196
## 4 0.7170000 0.5901639 0.7827511 0.084 0.146 0.158
## 5 0.7617143 0.6533958 0.8315658 0.084 0.146 0.124
## 6 0.7729231 0.7155800 0.8438025 0.084 0.146 0.092
## prob.high.risk_upper prob.low.risk_lower prob.low.risk_upper DP_lower
## 1 0.866 0.134 0.444 0.082
## 2 0.506 0.494 0.722 0.070
## 3 0.368 0.632 0.804 0.068
## 4 0.296 0.704 0.842 0.062
## 5 0.248 0.752 0.876 0.054
## 6 0.208 0.792 0.908 0.044
## DP_upper nonDP_lower nonDP_upper cost.benefit.ratio
## 1 0.146 0.134 0.442 1:99
## 2 0.132 0.476 0.712 3:47
## 3 0.130 0.600 0.792 11:89
## 4 0.122 0.672 0.822 4:21
## 5 0.116 0.716 0.852 21:79
## 6 0.100 0.746 0.866 13:37
If data is from a case-control study instead of an observational cohort, an estimate of the population level outcome prevalence is needed to calculate decision curves. Decision curves can be calculated by setting study.design = "case-control"
and setting the population.prevalence
. Once the decision_curve
is calculated, all other calls to the plot functions remain the same. Note that bootstrap sampling is done stratified by outcome status for these data.
#simulated case-control data with same variables as above
data(dcaData_cc)
#estimated from the population where the
#case-control sample comes from.
population.rho = 0.11
baseline.model_cc <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData_cc,
thresholds = seq(0, .4, by = .01),
bootstraps = 25,
study.design = "case-control",
population.prevalence = population.rho)
full.model_cc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData_cc,
thresholds = seq(0, .4, by = .01),
bootstraps = 25,
study.design = "case-control",
population.prevalence = population.rho)
plot_decision_curve( list(baseline.model_cc, full.model_cc),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
lty = c(1,2),
lwd = c(3,2, 2, 1),
legend.position = "bottomright")
We provide a wrapper to perform k-fold cross-validation to obtain bias corrected decision curves. Once cv_decision_curve
is called, all plot and summary functions work the same as shown above for decision_curve
output. Confidence interval calculation is not available at this time for cross-validated curves.
full.model_cv <- cv_decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
folds = 5,
thresholds = seq(0, .4, by = .01),
policy = "opt-out")
full.model_apparent <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .4, by = .01),
confidence.intervals = 'none', policy = "opt-out")
plot_decision_curve( list(full.model_apparent, full.model_cv),
curve.names = c("Apparent curve", "Cross-validated curve"),
col = c("red", "blue"),
lty = c(2,1),
lwd = c(3,2, 2, 1),
legend.position = "bottomright")
plot_clinical_impact(full.model_cv, legend.position = "bottomright")
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006 Nov-Dec;26(6):565-74.
Kerr KF, Brown MD, Zhu K, Janes H. Assessing the Clinical Impact of Risk Prediction Models With Decision Curves: Guidance for Correct Interpretation and Appropriate Use. J Clin Oncol. 2016; 34(21):2534-40.