Fast, low-dependency, underutilized machine learning algorithms in R. Useful for causal plug-ins (e.g., nuisance fits in DML) or simple predictive applications.
roadrunner ships C++ backed implementations of classical ML algorithms with thin, base-R-style interfaces. Six core fitters today:
ares()-- Multivariate Adaptive Regression Splines (MARS).krls()-- Kernel Regularized Least Squares (KRLS).plda()-- Penalized Linear Discriminant Analysis (L1 / fused-lasso).ols()-- Ordinary and weighted least squares.logreg()-- Binary logistic regression by IRLS.bgam()-- Component-wise P-spline gradient boosting.
Plus meep() -- a cross-fitted, stacked ensemble of these built-in algorithms and optionally external learners (ranger random forests and dbarts BART), built for Double Machine Learning and causal-forest nuisance estimation.
- Low dependency. Only
Rcpp,RcppArmadillo, andRcppParallel. - Fast. C++ engines via
Rcppwith multi-core scoring viaRcppParallel. - Deterministic. Fits are bit-for-bit identical across thread counts at a fixed seed.
- Simple API. Base-R style: formula and matrix interfaces and standard S3 methods (
predict,print,summary,plot).
# install.packages("pak")
pak::pak("CetiAlphaFive/roadrunner")library(roadrunner)
# Regression
fit <- ares(mpg ~ ., data = mtcars, degree = 2)
predict(fit, head(mtcars))
# Classification
y <- as.integer(mtcars$am)
fitc <- ares(as.matrix(mtcars[, -9]), y, family = "binomial")
predict(fitc)
# Hands-free hyperparameter tuning
fitt <- ares(mpg ~ ., data = mtcars, autotune = TRUE,
autotune.speed = "fast", seed.cv = 1995L)
fitt$autotune$degreeSee the ares vignette for autotune, classification, weights, bagging, and prediction-interval examples.
krls() fits a Kernel Regularized Least Squares model (Hainmueller and Hazlett 2014) with closed-form leave-one-out selection of the ridge penalty and per-observation marginal effects.
set.seed(1995)
n <- 200
X <- matrix(rnorm(n * 3), n, 3)
y <- sin(X[, 1]) + 0.5 * X[, 2]^2 - 0.3 * X[, 3] + rnorm(n, sd = 0.2)
fit <- krls(X, y) # default sigma = ncol(X); lambda by LOO
fit$avgderivatives # average marginal effects per variable
predict(fit, X)$fit # in-sample fitted values
# Predict on new data with pointwise SEs
Xnew <- matrix(rnorm(20 * 3), 20, 3)
pr <- predict(fit, Xnew, se.fit = TRUE)
head(pr$fit); head(pr$se.fit)krls() mirrors KRLS::krls() numerically (fits agree to ~1e-13 at matched sigma and lambda) and is 6-10x faster on benchmarks at n >= 500.
plda() fits penalized Fisher's linear discriminant analysis (Witten & Tibshirani 2011) with L1 or fused-lasso penalties, multi-class support, and built-in cross-validation autotune.
fit <- plda(Species ~ ., data = iris)
predict(fit, iris) # factor of predicted Speciesols() fits ordinary and weighted least squares; logreg() fits binary logistic regression by IRLS. Both have C++ engines, classical and HC0-HC3 robust standard errors, optional bagging, and the standard formula and matrix interfaces.
fit <- ols(mpg ~ wt + hp, data = mtcars)
summary(fit, robust = "HC3") # HC3 robust standard errors
predict(fit, mtcars[1:3, ], interval = "confidence")
df <- data.frame(am = mtcars$am, mtcars[c("wt", "hp")])
lr <- logreg(am ~ wt + hp, data = df)
predict(lr, df[1:3, ], type = "response")bgam() fits a smooth additive model by component-wise P-spline gradient boosting (Buehlmann & Yu 2003; Eilers & Marx 1996). The number of boosting iterations is tuned by cross-validation and doubles as built-in variable selection. Gaussian and binomial families.
fit <- bgam(mpg ~ ., data = mtcars) # CV-tuned number of boosting steps
predict(fit, head(mtcars))
plot(fit) # smooth partial-effect curvesmeep() cross-fits an ensemble of base learners and returns out-of-fold
predictions designed to drop into Double Machine Learning (DoubleML) or
causal forest (grf) implementations. The default learners are ares(),
krls(), ols(), logreg(), and plda(), chosen automatically per
nuisance by family: regression nuisances use ares/krls/ols,
classification nuisances use ares/krls/logreg/plda. By default the
cross-fitted propensity is isotonically calibrated (van der Laan et al. 2023)
for more reliable DML/AIPW inference; disable with calibrate = "none".
set.seed(1995)
n <- 800
X <- matrix(runif(n * 4, -2, 2), n, 4)
Dstar <- sin(X[, 1]) + 0.5 * X[, 2] + 0.3 * X[, 3]^2 + rnorm(n, sd = 1.5)
D <- as.integer(Dstar > median(Dstar)) # binary treatment
Y <- D + cos(X[, 1]) + 0.4 * X[, 2]^2 + 0.5 * X[, 3] + rnorm(n, sd = 1.5)
m <- meep(X, Y, treatment = D, folds = 5, seed = 1995)
m$y_hat_oof # cross-fitted E[Y | X]
m$d_hat_oof # cross-fitted E[D | X]
# hand the cross-fitted nuisances to a causal forest
# grf::causal_forest(X, Y, D, Y.hat = m$y_hat_oof, W.hat = m$d_hat_oof)On smooth, structured signal the ensemble fits the nuisances more tightly than grf's built-in regression forests (out-of-bag vs out-of-fold R-squared on the toy above):
cf <- grf::causal_forest(X, Y, D, seed = 1995)
r2 <- function(p, a) 1 - sum((a - p)^2) / sum((a - mean(a))^2)
data.frame(
nuisance = c("E[Y|X]", "E[D|X]"),
grf_oob_r2 = c(r2(cf$Y.hat, Y), r2(cf$W.hat, D)),
meep_oof_r2 = c(r2(m$y_hat_oof, Y), r2(m$d_hat_oof, D))
)
#> nuisance grf_oob_r2 meep_oof_r2
#> E[Y|X] 0.179 0.198
#> E[D|X] 0.162 0.180Add random-forest and BART learners to the stack with extra.learners (the external packages stay optional -- you install them yourself), and use plot() for a quick read on each learner and the stack -- ROC curves for binary nuisances, OOF R-squared and observed-vs-predicted for continuous ones:
m2 <- meep(X, Y, treatment = D, folds = 5, seed = 1995,
extra.learners = c("forest", "BART")) # ranger + dbarts
plot(m2)- Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Statistics, 19(1):1-67.
- Hainmueller, J. and Hazlett, C. (2014). Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach. Political Analysis, 22(2):143-168.
- Witten, D. M. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society, Series B, 73(5):753-772.
- Buehlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association, 98(462):324-339.
- Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2):89-121.
- Hofner, B., Mayr, A., Robinzonov, N. and Schmid, M. (2014). Model-based boosting in R: A hands-on tutorial using the R package mboost. Computational Statistics, 29(1-2):3-35.
MIT (c) Jack T. Rametta

