For use in model formulas, natural cubic spline as in splines::ns but with knot positions chosen using k-means rather than quantiles. Automatically uses less knots if there are insufficient distinct values.

well_knotted_spline(x, n_knots, verbose = TRUE)

Arguments

x

The predictor variable. A numeric vector.

n_knots

Number of knots to use.

verbose

If TRUE, produce a message about the knots chosen.

Value

A matrix of predictors, similar to ns.

This function supports "safe prediction" (see makepredictcall). Original knot locations will be used for prediction with predict.

Details

Wong (1982, 1984) showed the asymptotic density of k-means in 1 dimension is proportional to the cube root of the density of x. Compared to using quantiles (the default for ns), choosing knots using k-means produces a better spread of knot locations if the distribution of values is very uneven.

k-means is computed in an optimal, deterministic way using Ckmeans.1d.dp.

References

Wong, M. (1982). Asymptotic properties of univariate sample k-means clusters. Working paper #1341-82, Sloan School of Management, MIT. https://dspace.mit.edu/handle/1721.1/46876

Wong, M. (1984). Asymptotic properties of univariate sample k-means clusters. Journal of Classification, 1(1), 255–270. https://doi.org/10.1007/BF01890126

See also

Examples

lm(mpg ~ well_knotted_spline(wt,3), data=mtcars)
#> wt range 1.513 5.424 knots 2.199364 3.485556 5.339667
#> #> Call: #> lm(formula = mpg ~ well_knotted_spline(wt, 3), data = mtcars) #> #> Coefficients: #> (Intercept) well_knotted_spline(wt, 3)1 #> 32.13 -17.46 #> well_knotted_spline(wt, 3)2 well_knotted_spline(wt, 3)3 #> -17.82 -26.07 #> well_knotted_spline(wt, 3)4 #> -15.17 #>
# When insufficient unique values exist, less knots are used lm(mpg ~ well_knotted_spline(gear,3), data=mtcars)
#> gear range 3 5 knots 3.6875
#> #> Call: #> lm(formula = mpg ~ well_knotted_spline(gear, 3), data = mtcars) #> #> Coefficients: #> (Intercept) well_knotted_spline(gear, 3)1 #> 16.1067 14.8746 #> well_knotted_spline(gear, 3)2 #> 0.9145 #>
library(ggplot2) ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth(method="lm", formula=y~well_knotted_spline(x,10))
#> x range 0.20 5.01 knots 0.3084627 0.4023654 0.5362350 0.7209752 0.8934446 1.0360370 1.2324172 1.5597639 2.0635750 2.6274262