In regression analysis, bootstrapping is a method for statistical

inference, which focused on building a sampling distribution with the key idea

of resampling the originally observed data with replacement. The term

bootstrapping, proposed by Bradley Efron in his “Bootstrap methods:

another look at the jackknife” published in 1979, is extracted from the cliché

of ‘pulling oneself up by one’s bootstraps’. So, from the meaning of this

concept, sample data is considered as a population and repeated samples are

drawn from the sample data, which is considered as a population, to generate

the statistical inference about the sample data. The essential bootstrap analogy states that “the

population is to the sample as the sample is to the bootstrap samples”.

The bootstrap falls into two types, parametric and nonparametric. Parametric

bootstrapping assumes that the original data set is drawn from some specific

distributions, e.g. normal distribution. And the samples generally are pulled as the same size

as the original data set. Nonparametric bootstrapping is just the one described

in the beginning, which draws a portion of bootstrapping samples from the

original data. Bootstrapping is quite useful in non-linear regression and

generalized linear models. For small sample size, the parametric bootstrapping

method is highly preferred. In large sample size, nonparametric bootstrapping

method would be preferably utilized. For a further clarification of nonparametric

bootstrapping, a sample data set, A = {x1, x2, …, xk} is randomly drawn from

a population B = {X1, X2, …, XK} and K is much larger than k. The statistic T

= t(A) is considered as an estimate of the corresponding population parameter P

= t(B). Nonparametric bootstrapping generates the estimate of the sampling

distribution of a statistic in an empirical way. No assumptions of the form of the population

is necessary. Next, a sample of size k is drawn from the elements of A with replacement,

which represents as A?1 =

{x?11,

x?12,

…, x?1k}.

In the resampling, a * note is added to distinguish resampled data from

original data. Replacement is mandatory and supposed to be repeated typically

1000 or 10000 times, which is still developing since computation power develops,

otherwise only original sample A would be generated. And for each bootstrap estimate of these samples, mean is

calculated to estimate the expectation of the bootstrapped statistics. Mean minus T is the estimate of T’s bias. And

T?, the bootstrap variance estimate,

estimates the sampling variance of the

population, P. Then bootstrap confidence intervals can be constructed using

either bootstrap percentile interval approach or normal theory interval

approach. Confidence intervals by bootstrap percentile method is to use the empirical

quantiles of the bootstrap estimates, which is written as T?(lower)