In the original data set. Nonparametric bootstrapping is

                   In regression analysis, bootstrapping is a method for statistical
inference, which focused on building a sampling distribution with the key idea
of resampling the originally observed data with replacement. The term
bootstrapping, proposed by Bradley Efron in his “Bootstrap methods:
another look at the jackknife” published in 1979, is extracted from the cliché
of ‘pulling oneself up by one’s bootstraps’. So, from the meaning of this
concept, sample data is considered as a population and repeated samples are
drawn from the sample data, which is considered as a population, to generate
the statistical inference about the sample data.  The essential bootstrap analogy states that “the
population is to the sample as the sample is to the bootstrap samples”.

                   
The bootstrap falls into two types, parametric and nonparametric. Parametric
bootstrapping assumes that the original data set is drawn from some specific
distributions, e.g. normal distribution. And the samples generally are pulled as the same size
as the original data set. Nonparametric bootstrapping is just the one described
in the beginning, which draws a portion of bootstrapping samples from the
original data. Bootstrapping is quite useful in non-linear regression and
generalized linear models. For small sample size, the parametric bootstrapping
method is highly preferred. In large sample size, nonparametric bootstrapping
method would be preferably utilized. For a further clarification of nonparametric
bootstrapping, a sample data set, A = {x1, x2, …, xk} is randomly drawn from
a population B = {X1, X2, …, XK} and K is much larger than k. The statistic T
= t(A) is considered as an estimate of the corresponding population parameter P
= t(B). Nonparametric bootstrapping generates the estimate of the sampling
distribution of a statistic in an empirical way.  No assumptions of the form of the population
is necessary. Next, a sample of size k is drawn from the elements of A with replacement,
which represents as A?1 =
{x?11,
x?12,
…, x?1k}.
In the resampling, a * note is added to distinguish resampled data from
original data. Replacement is mandatory and supposed to be repeated typically
1000 or 10000 times, which is still developing since computation power develops,
otherwise only original sample A would be generated.  And for each bootstrap estimate of these samples, mean is
calculated to estimate the expectation of the bootstrapped statistics.  Mean minus T is the estimate of T’s bias. And
T?, the bootstrap variance estimate,
estimates the sampling variance of the
population, P. Then bootstrap confidence intervals can be constructed using
either bootstrap percentile interval approach or normal theory interval
approach. Confidence intervals by bootstrap percentile method is to use the empirical
quantiles of the bootstrap estimates, which is written as T?(lower) < P < T?(upper). More specifically, it can be written as Tˆ ? (Tˆ ? upper – T*ˆ) ? P ? Tˆ + (T*ˆ + Tˆ ?lower).                     Bootstrapping is an effective method to doublecheck the stability of the model estimation results. It is much better than the intervals calculated by sample variance with normality assumption. And simplicity is bootstrapping's another important benefit. For complicated estimators, such as correlation coefficients, percentile points, for complex parameters in the distribution, it is a pretty simple way to generate estimates of confidence intervals and standard errors. However, simplicity can also bring up disadvantage for bootstrapping, which makes the important assumptions for the bootstrapping easy to neglect. And bootstrapping is often over-optimistic and doesn't assure finite sample size.                       There are several types of bootstrapping schemes in the regression problems. One typical approach is to resample residuals in the regression models. The main procedure is firstly fit the original data set with the model, and generate model estimates, ?ˆ and calculate residuals, ?ˆ; secondly randomly and repeatedly sample the residuals (typically 1000 or 10000 times) to get K sets residuals of size k and add each resampled residual to the original equation, generating bootstrapped Y*; Finally use bootstrapped Y* to refit the model and get bootstrap estimate ?ˆ?.                      Another typical approach in the regression context is random-x resampling, which is also called case resampling. We can either apply Monte Carlo algorithm, which is to repeatedly resample the data of the same size as the original data set with replacement, or identify any possible resampling of the data set. In our case, before fitting regression model with the original predictor variable and response pairs (xi, yi), for i = 1, 2, . . ., k, these data pairs are resampled to get K new data pairs of size k. Then the regression model is fit to each of these K new data sets. ?ˆ? is generated from K parameter estimates.                       In the next section, I'm going to review the nonparametric bootstrapping package in R with some examples in my research area-----population pharmacokinetics analysis. In R, a package is called "boot", which provides various sources for bootstrapping either a single statistic or a vector. To run the boot function in the boot library, there are 3 necessary parameters: 1)     data, which can be a vector, matrix, or data frame for bootstrap resampling; 2)     statistic, the function that produces the statistic for bootstrapping. This function should include the data set and an indices parameter, giving the selection of cases for each resampling; 3)     R, the number of resampling times. The function boot() runs the statistic function for R times. In each call, it generates a group of random indices with replacement to select a sample. Then calculated statistics for each sample are collected in the bootobject function. So the function boot() is used as  bootobject <- boot(data= , statistic= , R=, ...). After seeing the satisfying plot, we use boot.ci(bootobject, conf=, type= ) to get confidence intervals.                        Bootstrapping is prevalently used in the population analysis of clinical trials in pharmaceutical/biotech industries. It is a pretty useful tool to assess and control the model analysis stability. A good example is how bootstrapping validates population pharmacokinetic model for Triptan, a vasopressor used for the treatment of migraine attack. A single oral dose of 50 mg was administered to 26 healthy Korean male subjects. Plasma concentration data were obtained from pre-dose through 12 hours post-dose. Population pharmacokinetics analysis of Triptan was performed using plasma concentration data by the software called NONMEM building models using differential equations. Total 364 observations of plasma concentrations were successfully described by a one compartment model with first-order of both absorption with lag time and elimination, and a combined transit compartment. The model scheme is shown as Figure 1 as below: Figure 1: The scheme of the final PK model of Triptan The final model was validated through a 1000-time resampling bootstrapping, which was to conduct with 1000 datasets resampled from the original dataset with replacement. The median and 90% prediction intervals of parameters were shown in the Table 1 to compare with the final parameter estimates. Results from the visual prediction check with 1000 Table 1: NONMEM estimated Parameters and Bootstrap Results simulations were assessed by visual comparison of the gray area of 90% prediction interval from the simulated data with an overlay of the circled raw data. Any observed circled data going outside the gray area indicates that the estimates were not legitimate. Figure 2: Visual predictive check plot of the model from time 0 to 12 h after a single oral administration of 50 mg Triptan. Circles represent the raw data set: the 90% prediction interval of the 1000 times simulations (gray area), and observed concentration (solid line) of the 5th, median, and 95th percentiles.                      Our conclusion is that the final model and its estimated parameter were sufficiently robust and stable by the assessment of the bootstrapping. All estimated parameter from the final model were within the 95% bootstrap confidence intervals.