Home United States USA — software Estimates on Training vs. Validation Samples

Estimates on Training vs. Validation Samples

355
0
SHARE

Let’s explore training vs. validation.
Let’s be friends:
Comment (0)
Join the DZone community and get the full member experience.
Before moving to cross-validation, it was natural to say, « I will burn (say) 50 percent of my data to train a model, and then use the remaining to fit the model. » For instance, we can use training data for variable selection (e.g. using some stepwise procedure in a logistic regression), and then, once the variable has been selected, fit the model on the remaining set of observations. A natural question is usually, « Does it really matter? »
In order to visualize this problem, consider my (simple) dataset.
Let us generate 100 training samples (where we keep about 50 percent of the observations). On each of them, we use a stepwise procedure, and we keep the estimates of the remaining variables (and their standard deviation actually)
Then, for the 7 covariates (and the constant) we can look at the value of the coefficient in the model fitted on the training sample, and the value on the model fitted on the validation sample (of course, only when they were remaining)
For instance, with the intercept, we have the following
where horizontal segments are confidence intervals of the parameter on the model fitted on the training sample, the vertical on the validation sample.

Continue reading...