Robust Nonlinear Regression - with Applications using R

Robust Nonlinear Regression - with Applications using R

von: Hossein Riazoshams, Habshah Midi, Gebrenegus Ghilagaber

Wiley, 2018

ISBN: 9781119010449 , 264 Seiten

Format: ePUB

Kopierschutz: DRM

Mac OSX,Windows PC für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Apple iPod touch, iPhone und Android Smartphones

Preis: 73,99 EUR

eBook anfordern eBook anfordern

Mehr zum Inhalt

Robust Nonlinear Regression - with Applications using R


 

1
Robust Statistics and its Application in Linear Regression


This is an introductory chapter giving the mathematical background to the robust statistics that are used in the rest of the book. Robust linear regression methods are then generalized to nonlinear regression in the rest of the book.

The robust approach to linear regression is described in this chapter. It is the main motivation for extending statistical inference approaches used in linear regression to nonlinear regression. This is done by considering the gradient of a nonlinear model as the design matrix in a linear regression. Outlier detection methods used in linear regression are also extended to use in nonlinear regression.

In this chapter the consistency and asymptotic distributions of robust estimators and robust linear regression are presented. The validity of the results requires certain regularity conditions, which are presented here. Proofs of the theorems are very technical and since this book is about nonlinear regression, they have been omitted.

1.1 Robust Aspects of Data


Robust statistics were developed to interpret data for which classical assumptions, such as randomness, independence, distribution models, prior assumptions about parameters and other prior hypotheses do not apply. Robust statistics can be used in a wide range of problems.

The classical approach in statistics assumes that data are collected from a distribution function; that is, the observed values follow the simultaneous distribution function . If the observations are identically independently distributed (i.i.d.) with distribution , we write (the tilde sign designates a distribution). In real‐life data, these explicit or other implicit assumptions might not be true. Outlier data effects are examples of situations that require robust statistics to be used for such null conditions.

1.2 Robust Statistics and the Mechanism for Producing Outliers


Robust statistics were developed to analyse data drawn from wide range of distributions and particularly data that do not follow a normal distribution, for example when a normal distribution is mixed with another known statistical distribution:1

where is a small value representing the proportion of outliers, is the normal cumulative distribution function (CDF) with appropriate mean and variance, and belongs to a suitable class of CDFs. A normal distribution () with a large variance can produce a wide distribution, such as:

for a large value of (see Figure 1.1a). A mixture of two normal distributions with a large difference in their means can be generated by:

where the variance value is much smaller than , and the mean is the mean of the shifted distribution (see Figure 1.1b). The models in this book will be used to interpret data sets with outliers. Figure 1.1a shows the CDF of a mixture of two normal distributions with different means:

and Figure 1.1b shows the CDF of a mixture of two normal distributions with different variances:

Figure 1.1 Contaminated normal densities: (a) mixture of two normal distributions with different means; (b) mixture of two normal distributions with different variances.

Source: Maronna et al. (2006). Reproduced with permission of John Wiley and Sons.

1.3 Location and Scale Parameters


In this section we discuss the location and scale models for random sample data. In later chapters these concepts will be extended to nonlinear regression. The location model is a nonlinear regression model and the scale parameter describes the nonconstant variance case, which is common in nonlinear regression.

1.3.1 Location Parameter


Nonlinear regression, and linear regression in particular, can be represented by a location model, a scale model or simultaneously by a location model and a scale model (Maronna et al. 2006). Not only regression but also many other random models can be systematically studied using this probabilistic interpretation. We assume that an observation depends on the unknown true value and that a random process acts additively as

where the errors are random variables. This is called thelocation model and was defined by Huber (1964). If the errors are independent with common distribution then the outcomes are independent, with common distribution function

and density function . An estimate is a function of the observations . We are looking for estimates that, with high probability, satisfy . Themaximum likelihood estimate (MLE) of is a function of observations that maximize the likelihood function (joint density):

The estimate of the location can be obtained from:

Since is positive and the logarithm function is an increasing function, the MLE of a location can be calculated using a simple maximization logarithm statement:

If the distribution is known then the MLE will have desirable mathematical and optimality properties, in the sense that among unbiased estimators it has the lowest variance and an approximately normal distribution. In the presence of outliers, since the distribution and, in particular, the mixture distribution (1.1) are unknown or only approximately known, statistically optimal properties might not be achieved. In this situation, some optimal estimates can still be found, however. Maronna et al. (2006, p. 22) state that to achieve optimality, the goal is to find estimates that are:

  • nearly optimal when is normal
  • nearly optimal when is approximately normal.

To this end, since MLEs have good properties such as sufficiency, known distribution and minimal bias within an unbiased estimator but are sensitive to the distribution assumptions, an MLE‐type estimate of (1.4) can be defined. This is called an M‐estimate. As well as the M‐estimate for location, a more general definition can be developed. Let:

(1.5)

The negative logarithm of (1.3) can then be written as .

A more sophisticated form of M‐estimate can be defined by generalizing to give an estimator for a multidimensional unknown parameter of an arbitrary modeling of a given random sample .

Definition 1.1


If a random sample is given, and is an unknown ‐dimensional parameter of a statistical model describing the behavior of the data, any estimator of is a function of a random sample . The M‐estimate of can be defined in two different ways: by a minimization problem of the form (estimating equation and functional form are represented together):

or as the solution of the equation with the functional form

where the functional form means , is an empirical CDF, and (the robust loss function) and are arbitrary functions. If is partially differentiable, we can define the psi function as , which is specifically proportional to the derivative (), and the results of Equations (1.6) and (1.7) are equal. In this section we are interested in the M‐estimate of the location for which .

The M‐estimate was first introduced for the location parameter by Huber (1964). Later, Huber (1972) developed the general form of the M‐estimate, and the mathematical properties of the estimator (1973; 1981).

Definition 1.2


The M‐estimate of location is defined as the answer to the minimization problem:

or the answer to the equation:

If the function is differentiable, with derivative , the M‐estimate of the location (1.8) can be computed from the implicit equation (1.9).

If is a normal distribution, the function, ignoring constants, is a quadratic function and the parameter estimate is equivalent to the least squares estimate, given by:

which has the average solution .

If is a double exponential distribution with density , the rho function, apart from constants, is the absolute value function , and the parameter estimate is equivalent to the least median estimate given by:

(1.10)

which has median solution (see Exercise 1). Apart from the mean and median, the distribution of the M‐estimate is not known, but the convergence properties and distribution can be derived. The M‐estimate is defined under two different formulations: the approach from the estimating equation or by minimization of , where is a primitive function of with respect to . The consistency and asymptotic assumptions of the M‐estimate depend on a variety of assumptions. The approach does not have a unique root or an exact root, and a rule is required for selecting a root when multiple roots exist.

Theorem 1.3


Let . Assume that:

Assumption A 1.4


  1. has unique root
  2. is continuous an either bounded or monotone.

Then the equation has a sequence of roots that converge in probability .

In most cases, the equation does not have an explicit answer and has to be estimated using numerical iteration methods. Starting from consistent estimates , one step of the Newton–Raphson estimate is , where . The...