roadrunner

Fast, low-dependency, underutilized machine learning algorithms in R. Useful for causal plug-ins (e.g., nuisance fits in DML) or simple predictive applications.

roadrunner ships C++ backed implementations of classical ML algorithms with thin, base-R-style interfaces. Six core fitters today:

ares() -- Multivariate Adaptive Regression Splines (MARS).
krls() -- Kernel Regularized Least Squares (KRLS).
plda() -- Penalized Linear Discriminant Analysis (L1 / fused-lasso).
ols() -- Ordinary and weighted least squares.
logreg() -- Binary logistic regression by IRLS.
bgam() -- Component-wise P-spline gradient boosting.

Plus meep() -- a cross-fitted, stacked ensemble of these built-in algorithms and optionally external learners (ranger random forests and dbarts BART), built for Double Machine Learning and causal-forest nuisance estimation.

Package design

Low dependency. Only Rcpp, RcppArmadillo, and RcppParallel.
Fast. C++ engines via Rcpp with multi-core scoring via RcppParallel.
Deterministic. Fits are bit-for-bit identical across thread counts at a fixed seed.
Simple API. Base-R style: formula and matrix interfaces and standard S3 methods (predict, print, summary, plot).

Install

# install.packages("pak")
pak::pak("CetiAlphaFive/roadrunner")

Usage

MARS via `ares()`

library(roadrunner)

# Regression
fit <- ares(mpg ~ ., data = mtcars, degree = 2)
predict(fit, head(mtcars))

# Classification
y <- as.integer(mtcars$am)
fitc <- ares(as.matrix(mtcars[, -9]), y, family = "binomial")
predict(fitc)

# Hands-free hyperparameter tuning
fitt <- ares(mpg ~ ., data = mtcars, autotune = TRUE,
             autotune.speed = "fast", seed.cv = 1995L)
fitt$autotune$degree

See the ares vignette for autotune, classification, weights, bagging, and prediction-interval examples.

KRLS via `krls()`

krls() fits a Kernel Regularized Least Squares model (Hainmueller and Hazlett 2014) with closed-form leave-one-out selection of the ridge penalty and per-observation marginal effects.

set.seed(1995)
n <- 200
X <- matrix(rnorm(n * 3), n, 3)
y <- sin(X[, 1]) + 0.5 * X[, 2]^2 - 0.3 * X[, 3] + rnorm(n, sd = 0.2)

fit <- krls(X, y)             # default sigma = ncol(X); lambda by LOO
fit$avgderivatives            # average marginal effects per variable
predict(fit, X)$fit           # in-sample fitted values

# Predict on new data with pointwise SEs
Xnew <- matrix(rnorm(20 * 3), 20, 3)
pr <- predict(fit, Xnew, se.fit = TRUE)
head(pr$fit); head(pr$se.fit)

krls() mirrors KRLS::krls() numerically (fits agree to ~1e-13 at matched sigma and lambda) and is 6-10x faster on benchmarks at n >= 500.

Penalized LDA via `plda()`

plda() fits penalized Fisher's linear discriminant analysis (Witten & Tibshirani 2011) with L1 or fused-lasso penalties, multi-class support, and built-in cross-validation autotune.

fit <- plda(Species ~ ., data = iris)
predict(fit, iris)        # factor of predicted Species

Linear models via `ols()` and `logreg()`

ols() fits ordinary and weighted least squares; logreg() fits binary logistic regression by IRLS. Both have C++ engines, classical and HC0-HC3 robust standard errors, optional bagging, and the standard formula and matrix interfaces.

fit <- ols(mpg ~ wt + hp, data = mtcars)
summary(fit, robust = "HC3")            # HC3 robust standard errors
predict(fit, mtcars[1:3, ], interval = "confidence")

df  <- data.frame(am = mtcars$am, mtcars[c("wt", "hp")])
lr  <- logreg(am ~ wt + hp, data = df)
predict(lr, df[1:3, ], type = "response")

Boosted additive models via `bgam()`

bgam() fits a smooth additive model by component-wise P-spline gradient boosting (Buehlmann & Yu 2003; Eilers & Marx 1996). The number of boosting iterations is tuned by cross-validation and doubles as built-in variable selection. Gaussian and binomial families.

fit <- bgam(mpg ~ ., data = mtcars)   # CV-tuned number of boosting steps
predict(fit, head(mtcars))
plot(fit)                             # smooth partial-effect curves

Causal ensembles via `meep()`

meep() cross-fits an ensemble of base learners and returns out-of-fold predictions designed to drop into Double Machine Learning (DoubleML) or causal forest (grf) implementations. The default learners are ares(), krls(), ols(), logreg(), and plda(), chosen automatically per nuisance by family: regression nuisances use ares/krls/ols, classification nuisances use ares/krls/logreg/plda. By default the cross-fitted propensity is isotonically calibrated (van der Laan et al. 2023) for more reliable DML/AIPW inference; disable with calibrate = "none".

set.seed(1995)
n <- 800
X <- matrix(runif(n * 4, -2, 2), n, 4)
Dstar <- sin(X[, 1]) + 0.5 * X[, 2] + 0.3 * X[, 3]^2 + rnorm(n, sd = 1.5)
D <- as.integer(Dstar > median(Dstar))   # binary treatment
Y <- D + cos(X[, 1]) + 0.4 * X[, 2]^2 + 0.5 * X[, 3] + rnorm(n, sd = 1.5)

m  <- meep(X, Y, treatment = D, folds = 5, seed = 1995)
m$y_hat_oof   # cross-fitted E[Y | X]
m$d_hat_oof   # cross-fitted E[D | X]

# hand the cross-fitted nuisances to a causal forest
# grf::causal_forest(X, Y, D, Y.hat = m$y_hat_oof, W.hat = m$d_hat_oof)

On smooth, structured signal the ensemble fits the nuisances more tightly than grf's built-in regression forests (out-of-bag vs out-of-fold R-squared on the toy above):

cf <- grf::causal_forest(X, Y, D, seed = 1995)
r2 <- function(p, a) 1 - sum((a - p)^2) / sum((a - mean(a))^2)

data.frame(
  nuisance    = c("E[Y|X]", "E[D|X]"),
  grf_oob_r2  = c(r2(cf$Y.hat, Y),     r2(cf$W.hat, D)),
  meep_oof_r2 = c(r2(m$y_hat_oof, Y),  r2(m$d_hat_oof, D))
)
#>  nuisance grf_oob_r2 meep_oof_r2
#>    E[Y|X]      0.179       0.198
#>    E[D|X]      0.162       0.180

Add random-forest and BART learners to the stack with extra.learners (the external packages stay optional -- you install them yourself), and use plot() for a quick read on each learner and the stack -- ROC curves for binary nuisances, OOF R-squared and observed-vs-predicted for continuous ones:

m2 <- meep(X, Y, treatment = D, folds = 5, seed = 1995,
           extra.learners = c("forest", "BART"))   # ranger + dbarts
plot(m2)

References

Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Statistics, 19(1):1-67.
Hainmueller, J. and Hazlett, C. (2014). Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach. Political Analysis, 22(2):143-168.
Witten, D. M. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society, Series B, 73(5):753-772.
Buehlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association, 98(462):324-339.
Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2):89-121.
Hofner, B., Mayr, A., Robinzonov, N. and Schmid, M. (2014). Model-based boosting in R: A hands-on tutorial using the R package mboost. Computational Statistics, 29(1-2):3-35.

License

MIT (c) Jack T. Rametta

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github		.github
R		R
dev/plans		dev/plans
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.lintr		.lintr
CITATION.cff		CITATION.cff
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
roadrunner.Rproj		roadrunner.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

roadrunner

Package design

Install

Usage

MARS via `ares()`

KRLS via `krls()`

Penalized LDA via `plda()`

Linear models via `ols()` and `logreg()`

Boosted additive models via `bgam()`

Causal ensembles via `meep()`

References

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

roadrunner

Package design

Install

Usage

MARS via ares()

KRLS via krls()

Penalized LDA via plda()

Linear models via ols() and logreg()

Boosted additive models via bgam()

Causal ensembles via meep()

References

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

MARS via `ares()`

KRLS via `krls()`

Penalized LDA via `plda()`

Linear models via `ols()` and `logreg()`

Boosted additive models via `bgam()`

Causal ensembles via `meep()`

Packages