well_knotted_spline.Rd
For use in model formulas,
natural cubic spline as in splines::ns
but with knot positions chosen using
k-means rather than quantiles.
Automatically uses less knots if there are insufficient distinct values.
well_knotted_spline(x, n_knots, verbose = TRUE)
x | The predictor variable. A numeric vector. |
---|---|
n_knots | Number of knots to use. |
verbose | If TRUE, produce a message about the knots chosen. |
A matrix of predictors, similar to ns
.
This function supports "safe prediction"
(see makepredictcall
).
Original knot locations will be used for prediction with
predict
.
Wong (1982, 1984) showed the asymptotic density of k-means in 1 dimension is
proportional to the cube root of the density of x.
Compared to using quantiles (the default for ns
),
choosing knots using k-means produces a better spread of knot locations
if the distribution of values is very uneven.
k-means is computed in an optimal, deterministic way using
Ckmeans.1d.dp
.
Wong, M. (1982). Asymptotic properties of univariate sample k-means clusters. Working paper #1341-82, Sloan School of Management, MIT. https://dspace.mit.edu/handle/1721.1/46876
Wong, M. (1984). Asymptotic properties of univariate sample k-means clusters. Journal of Classification, 1(1), 255–270. https://doi.org/10.1007/BF01890126
#>#> #> Call: #> lm(formula = mpg ~ well_knotted_spline(wt, 3), data = mtcars) #> #> Coefficients: #> (Intercept) well_knotted_spline(wt, 3)1 #> 32.13 -17.46 #> well_knotted_spline(wt, 3)2 well_knotted_spline(wt, 3)3 #> -17.82 -26.07 #> well_knotted_spline(wt, 3)4 #> -15.17 #># When insufficient unique values exist, less knots are used lm(mpg ~ well_knotted_spline(gear,3), data=mtcars)#>#> #> Call: #> lm(formula = mpg ~ well_knotted_spline(gear, 3), data = mtcars) #> #> Coefficients: #> (Intercept) well_knotted_spline(gear, 3)1 #> 16.1067 14.8746 #> well_knotted_spline(gear, 3)2 #> 0.9145 #>library(ggplot2) ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth(method="lm", formula=y~well_knotted_spline(x,10))#>