User:MalininArt

From Wikipedia, the free encyclopedia

User:MalininArt

Approach to multidimentional interpolation, approximation and machine learning on basis of random function theory.[edit]

Bahvalov J.N., Malinin A.P.

In this paper investigate’s the problem of multivariate interpolation and approximation. It is shown that these problems can be solved on basis of the theory of random functions. The paper proposed a method of machine learning, ensuring optimum results in terms of the considered mathematical apparatus. The method is simple to implement and allows us to reduce a variety of tasks in machine learning to solving of systems of linear equations.

Introduction.[edit]

This article is divided into two parts.

  • The first part is devoted to the foundations of the theory of random functions in relation to the problems of multivariate interpolation and approximation, as well as machine learning and its theoretical basis. The purpose of the theoretical part is to show that machine learning in his paradigm of "supervised learning", a multi-dimensional problem of interpolation and approximation, can be generalized to the theory of random functions.
  • The second part provides practical conclusions from the provisions of the first part in the form of a complete machine-learning method (the method of multivariate interpolation and approximation).
  • The proposed method allows to obtain an exact solution of multivariate interpolation and approximation ("supervised learning") guarantees optimal under certain assumptions, the minimum specified in the theoretical part. The method is simple and reduces to a system of linear equations, but in this case does not limit the possibilities. Using of "special function" obtained in the theoretical part, enables us to reduce the multidimensional problem of interpolation and approximation or a variety of tasks in machine learning (with certain assumptions) to a system of linear equations, guaranteeing optimality of the solution, the lack of retraining, the oscillation interpolant and other undesirable effects.

Part 1.[edit]

Introduction.[edit]

The purpose of this paper show that the machine learning paradigm in his "supervised learning" a multi-dimensional problem of interpolation and approximation, can be resolved on the basis of the theory of random functions.

Using the mathematical apparatus of random functions is convenient for theoretical justification of methods of machine learning and the notion of "probability of realization" can serve as a criterion to justify generalizing abilities of the teaching method.

Using the concept of a random function and the related theoretical constructs provides accurate solutions of problems in machine learning tasks as a multi-dimensional interpolation (or approximation) guarantees optimal solutions in terms of the criterion of probability. There is a way with minimal assumptions to get an optimal solution of multivariate interpolation and approximation by solving a system of linear equations, with proof of its optimality. This allows an exact solution that eliminates the problem of finding global optimum and the choice of network topology, which occur in many existing iterative algorithms, "supervised learning" neural networks.

Theory of random processes is a mathematical science that studies the laws of random phenomena in the dynamics of their development. Just as the notion of a random process is a generalization of the random variable, the random function is a generalization of a random process, since functions may depend not on one but on several arguments.

Realizations of the random functions are functions whose form it takes under the supervision or during the experiment. A random function is a set of its realizations, with a corresponding probability distribution. Any function can be considered as the realization of a random function without any additional assumptions.

In solving the problem of multidimensional interpolation or approximation, or in machine learning want to associate the values ​​of input and output variables with a function. Solving these problems requires finding the right function, satisfying the training vectors and such functions may be infinite.

Consider the problem of machine learning or multidimensional interpolation or approximation. Assume that the desired function can be regarded as a realization of a random function, the information contained in set of training vectors (or interpolation points).

In this approach the criterion to search for the desired function is the probability of a realization. In this case it is necessary to find the most probable realization of the random function, which satisfy the training examples. This idea is the basis for the consideration proposed in this paper method.

The adequacy of this approach and its range of applicability will be entirely determined by what specific characteristics to give an a priori random function.

Multidimensional interpolation.[edit]

Consider the problem of multidimensional interpolation, assuming that the interpolation knots belong to the realization of a random function.

  • Assume that the function (1) can be regarded as the realization of a random function which can be written as (2):

<br\>

(1)

<br\>

(2)

<br\><br\> where - the coordinate functions, - normal random uncorrelated variables with mean zero and unit variance, - mathematical expectation

  • Under the coordinate functions in (2) can be understood as any arbitrary function. The condition is that there exist some coordinate functions, such that you can burn them through a random function of the form (2). The number of terms in (2) can be arbitrary. For example, as a special case (2) can be considered expansion into an infinite series of sines and cosines, when the realizations of the random function can be set of all continuous real functions
  • Assume that variables in (2) have normal distribution. Requirement to the normal law with mean zero and unit variance is not necessary (but convenient in the future). To perform the following conversions would be enough to require, that the joint probability distribution law of random variables in (2) was symmetrical about the origin and decreases monotonically with increasing distance from the origin.
  • Requirements (2) to mean zero and unit variance made to facilitate conversions. If the variance of the random variable is not equal to one, then can replace it and simultaneously replaced the corresponding coordinate function (by multiplying a factor), having received expression of the form (2). If mean is not zero, than can change the function of the expectation of a way to get an expression like (2).
  • Consider in more detail function . In the particular case it may be zero or constant.
  • If this function is known beforehand, we can exclude it from further conversions, having carried out the subtraction of the values of the interpolation nodes and deletion of the expression (2). After interpolation of the modified nodes will be done again, we can add this function to the resulting solution to obtain the final result.
  • If this function is not known beforehand, then we can accept it as a kind of realization of another random function which can again be expressed in terms of expression (2) and then repeat this procedure if necessary.
  • Eliminating the function of the expectation, obtain an expression of a random function:

(3)

  • Introduce the notation – specific values of random variables in (3). Then the probability distribution of realizations of the parameter space written as (4):

Let sequences

<br\>

are the coordinates and the corresponding values of interpolation.

  • Assuming that the interpolation points belong to one of the realizations of the random function (3), which corresponds to vector , they can be expressed in the form of equations (6):

Then the most probable realization of the random function (3) would be one in which the probability density function (4) takes the maximum value in the light of constraints (6).

Find the maximum of (4), equivalent to finding the minimum of (7).

  • This suggests that the solution of multivariate interpolation search as the most probable realization of the function (2) or (3) satisfying the interpolation nodes is reduced to a quadratic programming problem with constraints as equalities.
  • To find the minimum (7) with a system of constraints (6), use the method of Lagrange multipliers.
  • Let us find the Lagrange function:

<br\>

Denote – solution. From the condition , obtain the equations (6). From the condition , get:

<br\>

  • Denote sequences of values of the coordinate functions as a vector .
  • Let us introduce the coefficients , denoting . Then (9) can be written in terms of new notation in the form of a linear combination of vectors (10):

<br\>

Expression (10) can be interpreted from a geometrical point of view. Find minimum of function (7), with a system of constraints (6), equivalent to lowering of the perpendicular from the origin to the hyperplane formed by the intersection of hyperplanes defined in the form of equations (6), therefore, will be a linear combination of the perpendiculars.<br\> System constraints (6) can be written in terms of the scalar product of vectors (11):

<br\>

Replacing using the expression (10) obtain the system of equations:

<br\>

Denote the vector sequence of values .<br\> Then the most probable realization of the random function (3) can be expressed as (13).

<br\> Consider in more detail scalar product  :

<br\> which is nothing more than a canonical decomposition of the correlation function.

<br\>

Then the system of equations (12) can be written as (15):

<br\>

and (13) for the most probable realization in the form of (16):<br\>

<br\>

  • In such a way it can be concluded that the most probable realization of the random function (3) satisfying the interpolation nodes is a linear combination of sections of its correlation functions.
  • Therefore, knowing the correlation function and solving the system of equations (15), we obtain the most probable realization of the random function (16).
  • If now consider interpolation how a variant of machine learning that result can be easily generalized to multi-output version. In this case, can calculate a single inverse matrix of equations (15), which can be used for interpolation at all possible options for the values of output variables.
  • However, in order to carry out such calculations it is necessary to know the correlation function, as well as to know mathematical expectation in (2) or correctly converted to (3).
  • Consider the case where there is no a priori information about a random function in which can solve the problem of interpolation. Elimination of uncertainty by selecting the correlation function, making certain assumptions.
  • Suppose that initially the potential options interpolant (unknown function) can be any continuous real function and unknown about a priori probability distribution. Suppose that in this case, any points, as well as directions in space are equivalent interpolation between them. Therefore, require that the interpolation does not depend on the choice of the coordinate system of interpolation nodes on the operations of its rotation, position of the origin and scale.
  • These requirements can be achieved by introducing a set of provisions for the implementation of a priori probabilities of existence a random function. Assume a priori equiprobable realization transformed into each other by the operations of the shift, rotation or scale change.
  • Requirements outlined above is enough to uniquely calculate the correlation function necessary for the expression (15) and (16).

Denote the stated requirements in set of propositions: 1. The probability density of realizations that are transformed into each other by parallel translation of the coordinate system should be the same. Suppose that there are two realizations and such that:

Then the probability density of realizations and should be the same. 2. The probability density of realizations that are transformed into each other the same scaling in all directions should be the same. Suppose that there are two realizations and such that:

<br\> where - is a coefficient

Then the probability density of realizations and should be the same. 3. The probability density of realizations that are transformed into each other by turning the coordinate system should be the same. Suppose that there are two realizations and such that:

<br\> where - is a rotation matrix.

Then the probability density of realizations and should be the same.

  • Implementation of the first step means that the random function is stationary. In this case, it can be written as a multidimensional spectral decomposition. Denote its spectral density (nonnegative real function, symmetric with respect to any frequencies and ).

<br\> – scalar product

  • In (20) – realization of a complex random function. Each of its value is an independent random variable having a normal distribution, unit variance and mean zero for both the real and the imaginary part. Expression (20) can be considered as a special case of (3), in which , the role of the coordinate functions performs , is analog of random variables values.
  • Impose an additional condition on . Because at the moment considered interpolation of real functions then values for all frequencies and must be complex conjugates. Divide the frequency space into two parts and . The division must be carried out in such way as to any frequency and were in different parts (for example, any hyperplane passing through the origin). Then can be written as:

<br\>

  • Where and can be regarded as a realization of real random functions, each value of which is the same for frequencies and . This value is an independent random variable having a normal distribution with mean zero and unit variance.<br\>

If the condition for the reality of realizations is fulfilled, then the random function (20) can be written as (22):

Since and symmetric with respect to and then in the second integral can change the domain of integration to replacing the sign of the frequency. In this case, obtain:

Performing further conversions, obtain:

<br\>

  • Each value of is an independent complex random variable with mean zero and unit variance, therefore it is possible to get (25) – an analog of (7). When the constraints on the interpolation points is fulfilled, minimizing (25) provide the most probable realization

<br\>

For any two realizations of equal a priori probability, the value of (25) must have the same value. Expression (25) can be written as (26):

<br\>

Let the sequence and – are the interpolation points. Then, using expression (24), the constraints on these nodes will represent system of equations:

<br\>

Finding the most probable realization will be conform the determination of functions and such that (27) hold and the value of expression (26) is minimized.

Further arguments can be performed as in (8) - (16).
The Lagrange function becomes:<br\>

In differentiating (28) with respect to possible get the system (27). Since any value of functions and is the independent variable for any , then for each of them can do independent differentiation. In this case, obtain:

Substituting solutions (29) and (30) into (24) obtain the most probable realization:

Expression (31) is converted into (32):

Change the domain of integration to :

Denote the coefficients:

The expression for the correlation function (it will be proportional to the correlation function now):

As can be seen from (35), the spectral density of a random function plays the role of the spectrum for its correlation function. Correlation function itself is transformed into the autocorrelation, as expected from the stationarity:

  • The absence of sinusoidal components in (35 - 36), suggests that the autocorrelation function must be symmetric about the origin. If the spectral density of a random function is known, it is possible to calculate the autocorrelation function.<br\>

Then the most probable realization can be expressed:

Find a system of equations to calculate the coefficients in (37). Substituting (29) and (30) in (27):

When use (34) and (36) in (38), obtain a system of equations to calculate the coefficients of (37) (similar to (15)).

  • Thus able the expression (37) and (39) – counterparts (15) and (16). The expression (36) is for the correlation function. For the disclosure of uncertainty remains to determine the spectral density of a random function.

Denoted as – Fourier spectral decomposition of realizations considered a random function. From the expression (20) – the spectral decomposition of a random function can be seen that the spectrum of realizations will be equal:

Now back to the second point (the expression (18)). Denoted as – the spectrum of function , – the spectrum of function . Shall assume that

Whereas

<br\> - scalar product

Obtain

Can be derived the ratio of the spectra of such functions:

For both the considered functions, the expression (25) must be the same. The function from (25) can be expressed by spectra of the realizations and spectral density of the random function:

Equating expression (25) for two realizations, which were considered earlier and using expressions (44) and (45) obtain:

<br\>

From the expression (46), comparing the beginning and end of these equalities can obtain an expression for the spectral density:

To satisfy the third position (19), autocorrelation function should have a radial symmetry. Accordingly, the spectral density should have a radial symmetry. Then from (47) can obtain an expression for the spectral density:

<br\> where - is a coefficient

or (same thing):

<br\>

  • Thus, in (48-49) have the desired spectral density of a random function. When using the correlation function with such spectrum and solving systems of linear equations (39) it is guaranteed to obtain the most probable realization of the random function (if one takes as the basis of (17) - (19)).
  • The coefficient in (48) - (49) can be any. If this coefficient change, then the coefficients in the left side of the system (39) and in (37) will also change, as a result obtain same function.
  • To evaluate the expression of the function with the spectrum (49) can be use symmetry property of the function. In this case, to simplify one can first consider the one-dimensional version and then get a multi-dimensional version, using the requirement of symmetry.<br\>

In the one-dimensional version, the required spectral density is:

Then the one-dimensional version expression for the autocorrelation function (36) becomes:

Expression (51) can be written as:

Successively integrating by parts, obtain:

As the last term on the right in (53) obtained an integral cosine. Consider it in more detail:

Integral cosine may be disclosed:

<br\> where - Euler's constant

Using (53) and (55) obtain:

Thus, obtain the result:

  • Although in the actual calculations can not be used , its value can be taken arbitrarily close to zero, thus bringing its spectrum closer to the expression (50). For the calculations, instead of (57) can be used in a more simplified version of (58).
  • Given that the coefficient can be chosen arbitrarily (take equal to 0.5) and using the symmetry property, obtain the version of the function for the multidimensional case:

{{eqno|58}}

<br\><br\> where – norm of the vector,

Let write again the expression (39) and (37): {{eqno|59}}

<br\> {{eqno|60}}

  • Thus, obtain the final expression of multivariate interpolation method in the form of (58) – (60). Method in the limit of the function (58) gives the most probable realization of a random function satisfying the provisions of (17) - (19).
  • In the actual computations (as shown by computational experiments) as and can be taken of approximately the same order, and many orders of magnitude greater than the range of variation . The following will return to

this issue.

  • As shown in Figure 1, the system of linear equations with using of the function (58) allows produce high-quality interpolation of nonlinear functions.

The variance of a random function. Set of realizations satisfying the interpolation nodes.[edit]

  • In the above case, a solution of the interpolation problem (58 - 60) means finding the most probable realization of a random function satisfying the interpolation nodes. However, set of realizations that satisfy the interpolation nodes is indefinitely. Is necessary to have the characteristics of this set to go from multidimensional interpolation to approximation.
  • Express the variance of the random function (3) for some arbitrary point , knowing that the variance of random variables in (3) equal to one:

{{eqno|61}}

  • As can be seen from (60) the variance of the random function (3) at the point equal to the value of the correlation function of two identical values of this point. For a random function with the autocorrelation function (51), (57) or (58) the variance will be the same at all points and equal to their value of zero. For (58) the variance will be equal to :

{{eqno|62}}

  • Because in (58), the value directed at infinity then in the limit of variancehttp://en.wikipedia.org/w/index.php?title=User:MalininArt/Editnotice&action=edit&redlink=1 this random function will be equal to infinity.

Consider what would be the value realization of the random function (3) for . Denote how and vector of random variables in (3) how . Then {{eqno|63}}

The value of the random function (3) or (63) for a certain specific value will be a random normal variable with variance defined by (61 – 62).

  • Let the sequence and the corresponding sequence – are the interpolation nodes.
  • Denote the sequence , how .
  • Then the vector can be represented as the sum of two perpendicular vectors:

{{eqno|64}}

<br\> where<br\>

{{eqno|65}}

Moreover, require that the vector is perpendicular to any of the vectors . Then the expansion (64) for each will be individual and unique. Coefficients in (65) one can always find a way to get the expansion (64). The vector is the projection of the vector to the hyperplane, formed a basis of vectors and the vector is the remaining part of it, perpendicular to this hyperplane. Then (63) can be written as: {{eqno|66}}

<br\> (66) shows that this representation the random variable can be considered as the sum of two random variables and , and they will be independent. Let prove it.

  • Any set of random values of the vector can also be decomposed into two perpendicular vectors:

{{eqno|67}}

<br\> where

{{eqno|68}}

<br\>

Also require that the vector is perpendicular to any of the vectors . Then obtain: {{eqno|69}}

{{eqno|70}}

{{eqno|71}}

Part of the terms in (69) reduced, due to the fact that the scalar product of perpendicular vectors is zero.

  • The coefficients from (68) are the coefficients from (10) or (15).
  • Vectors and are perpendicular and form in the sum of vector having multivariate normal distribution. Therefore, using the properties of the normal distribution can be considered (66) as an expansion of the original random function to the sum and of two independent random functions.

Consider : {{eqno|72}}

(72) becomes the most probable realization, for known values at the interpolation nodes.

  • Thus, the random function (3) can be represented as the sum of two independent random functions. – random function, its realizations are the most probable realizations of the original random function (3) computed by (15) and (16). – random function representing the difference ("error") between the most probable realization and the real realization of a random function which is not known.
  • If considered as a random variable at a particular point , it will be normally distributed with mean zero.
  • Since the expansion (64) - (65) does not depend on the specific values at the interpolation nodes then the error probability of interpolation – at any given will not depend on the values at the interpolation nodes, but will depend on the location coordinates of nodes themselves.

Suppose that the values of the unknown functions are known at the interpolation nodes. Then (66) apply to the following form: {{eqno|73}}

<br\><br\> where - – the most probable realization calculated in (15)-(16)

Expression (73) can be understood as a random function which expresses the set of realizations satisfying the interpolation nodes (with the corresponding probability distribution between them). Variance (61) of random function using (69) can be written: {{eqno|74}}

The variance of the random function (73) is equal to: {{eqno|75}}

Consider , where –coordinate one of the interpolation nodes. {{eqno|76}}

The system of equations: {{eqno|77}}

Express the variance : {{eqno|78}}

Again, the function (58):

<br\><br\> where - norm of the vector,

If use the obtained autocorrelation function (58), obtain the following system of equations: {{eqno|79}}

As seen from (79), a system of equations, the left side of which coincides with (59), but in the right side instead of the values at the interpolation nodes are the values of the autocorrelation function. Expression of the variance : {{eqno|80}}

Then the variance of the set of realizations that satisfy the interpolation nodes: {{eqno|81}}

  • The standard deviation of the set of realizations that satisfy the interpolation nodes:

{{eqno|82}}

  • As noted earlier, the multiplication of functions in equations (15) - (16) or (58) - (60) for any finite non-zero coefficient, and their use does not change the most probable realization.
  • However, the variance is affected by the autocorrelation function at zero. In addition, as will be seen below, this affects the transition to a multi-dimensional problem of approximation.
  • Since the multiplication of the autocorrelation function for a finite non-zero coefficient has no effect on the results of interpolation, then can perform "calibrate" the variance of the autocorrelation function in accordance with a priori preferences.
  • The calibration can be carried out in various ways. Below is one of the options.
  • The calibration can be performed on the basis of a priori preferences and specifying the coefficient of the autocorrelation function.
Priori estimate of the variance at unit distance from the node interpolation.
Priori estimate of the variance at unit distance from the node interpolation.
  • Suppose know the value realization of the random functions into a single interpolation node (Fig. 2). If it is possible to estimate the variance of realizations at unit distance from the node then can be calibrate.
  • Let the a priori random function satisfies requirement that the variance realizations that satisfy the node is equal to some desired value at unit distance from the known node.
  • Assume that the function (58) is used with given values of and . Introduce the calibration coefficient:

{{eqno|83}}

<br\><br\> where - the calibration coefficient

Calculate the value of the calibration coefficient for a given and such that the variance at unit distance from the node equal to the desired value . From (79) obtain: {{eqno|84}}

Substituting (83) in (84): {{eqno|85}}

Express the : {{eqno|86}}

Then: {{eqno|87}}

Express the desired variance at unit distance: {{eqno|88}}

Performing conversions, obtain: {{eqno|89}}

Where can calculate the calibration coefficient: {{eqno|90}}

<br\> If instead of the unit distance for the calibration more convenient to take another distance equal to and estimate the variance , then it is also possible to calibrate. If execute conversion similar to (84) - (90) obtain an expression for the the calibration coefficient: {{eqno|90.a}}

Example:<br\> Consider the case of one-dimensional interpolation. Suppose that the interpolation is in the range from 0 to 10. Shall use the following values for the function (83): . Compute the calibration coefficient:

An example of a one-dimensional interpolation.
An example of a one-dimensional interpolation.
  • One-dimensional interpolation example is shown in Figure 3. The result was obtained using the function (83) and equations (59) - (60). However, as mentioned above, calibration coefficient does not affect the result of interpolation.
  • Let find the standard deviation of a random function, which is a subset realizations of a random function satisfying the interpolation nodes (expressions (79) - (82)). The result is shown in Figure 4.
The standard deviation of a set realizations that satisfy the interpolation nodes.
The standard deviation of a set realizations that satisfy the interpolation nodes.
  • Random function (73) for arbitrary is a random variable having a normal distribution. With known standard deviation can display the resulting set realizations graphically (Fig. 5).
Graphic display of the set realizations that satisfy the interpolation nodes. Intensity of blue color symbolizes the probability of realization through points.
Graphic display of the set realizations that satisfy the interpolation nodes. Intensity of blue color symbolizes the probability of realization through points.
  • As seen from Figure 4 the standard deviation of the interpolation nodes is zero, which corresponds to the expectations, since the values of the unknown function at these nodes are known and any uncertainty can not be.
  • Above was an example of a one-dimensional interpolation and calculation for this particular case the standard deviation on the remaining set of realizations that satisfy the nodes. Was considered one-dimensional case because of the visibility of its graphical representation. For the proposed transformations there is no restriction on the dimension of the interpolation.
  • If such an interpolation method is considered as a machine learning, then the calculation of standard deviation may allow not only to calculate the values of output variables, but also to evaluate their reliability, the potential spread of their values.
  • Despite the fact that the computational experiments strongly support the efficiency of the method using ( by interpolation of the measured ones), remains an open question on the impact of their values on the interpolation error compared with the calculated "ideal function" (57) – (58).
  • Obviously, this error is associated with a deviation of the spectrum used by the autocorrelation function from the reference spectrum (49) – (50), which can also lead to a distortion of the resulting interpolant at the same frequency range. The main distortion will be reduced at lower frequencies in the range or , that for and field interpolation should not significantly affect the results.
  • However, the question remains open about the error, which is planned in future to devote a separate research.

Multidimensional approximation.[edit]

  • In solving the problem of multidimensional interpolation was assumed that the required realization must exactly match the interpolation nodes. If transfer in terms of machine learning, was considered a variant of training, when training error should be zero. In many cases it is not required to accurately reproduce the training set, or it may be harmful impact on the quality of generalizing abilities. The data may be inaccurate, contain errors or even be contradictory. Consider the approximation of multidimensional data.
  • Suppose that the values of the interpolation nodes differ from values of their realization on some random variable.
  • Let the sequence and the corresponding sequence – are the interpolation nodes. Assume that the coordinates of the nodes are complex numbers to which were added an infinitesimal imaginary part. Now the coordinates of nodes can not be exactly equal to either each other or what any other real number.
  • In this case, expression (3) can be written as follows:

{{eqno|91}}

where

{{eqno|92}}

<br\>

- updated coordinates of the interpolation nodes

  • In (91) have been added random variables and set of functions , which take a value of in the corresponding interpolation nodes and having a zero value at all other points.

Expressions (91) and (92) completely mimic the presence of errors in the values of the interpolation nodes. Using a random function (91) is equivalent to the presence in the values of interpolation noise with normal distribution and standard deviation equal to . At the same time can use functions as an additional coordinate functions in the expression (3). In (3) can be performed without changes all the transformations (3) - (16). The changes will affect only the correlation function, denote it as : {{eqno|93}}

<br\> - some arbitrary values.

  • Then the correlation function at the interpolation nodes can be expressed as:

<br\> {{eqno|94}}

<br\>

  • For arbitrary real values correlation function remains unchanged (14).

{{eqno|95}}

  • Using (94) find that accounting of the existence noise in the values of interpolation with a normal distribution and standard deviation equal to is equivalent to adding to the main diagonal of the equation (15) values .

For the autocorrelation function (58) or (83) obtain the system of equations: <br\> {{eqno|96}}

<br\> The equation for the most probable realization remains unchanged: {{eqno|97}}

Example of approximation.

Example of approximation.
Example of approximation.

Graphic display of the set of realizations corresponding to the approximation shown in Figure 7.

Example of a set of realizations of the approximation.
Example of a set of realizations of the approximation.

All the arguments concerning the variance of the random function (61) - (62) remain valid. The changes will affect only the system of equations (79), which now can be written as (98): {{eqno|98}}

  • It should be noted that the value of standard deviation of the realizations that satisfy the nodes (displayed in Figure 7) reflects only the distribution of generating realizations of the random function without regard to possible errors in their values.
  • In expression (92) functions take a value equal to or zero. However, if it's known the distribution noise of approximation, this distribution can be accounted for in (92).

{{eqno|99}}

<br\>

where - value of standard deviation of noise in

Then the expression (96) becomes: {{eqno|100}}

<br\>

And expression (98) will be: {{eqno|101}}

<br\> Example. Consider the one-dimensional case. Assume that the distribution of noise (its standard deviation) in the approximation given by: {{eqno|102}}

I.e. in the region near zero noise is almost absent and increases with distance.

Approximation taking into account the noise distribution.
Approximation taking into account the noise distribution.
Graphic display of the set of realizations.
Graphic display of the set of realizations.
  • Figure 8 shows an approximation using equations (100) and the expression for the standard deviation (102).
  • Figure 9 for a visual comparison with Figure 8 displays the set of realizations, including the noise (102).

Conclusions.[edit]

The arguments and transformations show that the interpolation and approximation can be generalized to the theory of random functions as a search of the most probable realizations that satisfy the information about the nodes. The result was a method that can be viewed as a method of "machine learning", which has an exact solution, ensuring optimal results on the criterion of probability, subject to conditions imposed by the transformation of reference systems.

Part 2.[edit]

The method of machine learning.[edit]

Write down some key expressions obtained in the first part as a complete method.

  • Let the sequence is a set of input vectors for training. Let the corresponding sequence is a set of output values. If the value of the output are not one but a set of values (vector), the representation of the transformation can be considered sequentially for each of the output parameters.<br\ >
  • The method allows determining the function that relates the values at the input and output. The method also allows for arbitrary input values determine the standard deviation of errors.

Function that relates the values of the training set input and output will be determined by the expression (103): <br\> {{eqno|103}}

<br\> Function from (103) defined as (104):: <br\> {{eqno|104}}

<br\>

where - coefficients <br\> - norm of the vector

<br\> Coefficients from (103) computed from a system of linear equations (105): <br\><br\> {{eqno|105}}

<br\><br\> where - standard deviation for the noise at

<br\>

  • Values at the interpolation nodes define a priori the anticipated level of noise (errors) in the training data and the degree of approximation, in which the function (103) reproduces the training data.
  • If it will comply with the special case when the error, or any inconsistency in the data is missing and the desired function must accurately reproduce the training set i.e. the problem of learning becomes a problem of multidimensional interpolation.
  • If it is assumed that the training data equally to the whole sample contains noise, which has a normal distribution with standard deviation equal to . The task of the training can be viewed as multidimensional approximation.
  • If the noise level and its distribution is uneven but it is known, its presence can be defined as a function of (rather only its values at the interpolation nodes).

Now consider the coefficients in the expression (104).

  • Coefficients and in an "ideal case" should be roughly equal and fixed to infinity. In actual calculations, as these factors can be used provided normalization of input values in the range [-1,1](they must be several orders of magnitude greater than the range of variation of input values). With increasing and difference from using the function (104) instead of an "ideal function" rushes into the region of lower frequencies (at decomposition of the desired function to the spectrum) measured . Thus, the difference is less impact on learning outcomes in the parameter space where the training set.

"Calibration" coefficient associated with the properties of a priori random function. Although the random functions directly in the representation of the expression (103) - (105) are not used, these expressions are derived on the basis of their. (If then the value can be taken arbitrarily, by solving the system (105) and calculating the function (103) it will decrease.) The calculation procedure (proposed version):

Assessment of the desired dispersion at unit distance.
Assessment of the desired dispersion at unit distance.
  • Calibration is required to find a balance between possible errors, inaccuracies and contradictions in the training data and the possible nonlinearity of the function.
  • Suppose it is known that the unknown function (103) passes through a node. For the calibration can be set a priori the desired level of variance the set of possible realizations at unit distance from the node. With increasing in the calculations (103) - (105) "Big role" will be assigned to a possible nonlinearity of the function, with decreasing - in the presence of inaccuracies in the training data. With a certain can be calculated required coefficient :

<br\> {{eqno|106}}

<br\>

  • If instead of the unit distance for the calibration more convenient to take another distance equal to and estimate the variance , then it is also possible to calibrate. If execute conversion similar to (84) - (90) obtain an expression for the the calibration coefficient:

{{eqno|106.a}}

<br\>

An example of a one-dimensional interpolation.
An example of a one-dimensional interpolation.
  • Figure 11 shows an example of a one-dimensional interpolation ().As can be seen from the figure, the interpolation is made qualitatively (no oscillations, which, for example, often occur when using the Lagrange interpolation polynomial).
  • If interpolation is easily converted into an approximation – Fig.12.
An example of a one-dimensional approximation.
An example of a one-dimensional approximation.
An example of two-dimensional interpolation.
An example of two-dimensional interpolation.
An example of a two-dimensional approximation.
An example of a two-dimensional approximation.
  • Fig. 11 - 14 are examples of interpolation and approximation in one-and two-dimensional case for illustration of results. Expressions (103) - (105) may be used without restrictions for the space of any dimension.
  • To determine the standard deviation of the errors were obtained the expression:

{{eqno|107}}

  • values determined by the expression:

{{eqno|108}}

<br\>

  • Coefficients of (108) are the solution of equations (109):

{{eqno|109}}

  • It should be noted that when calculating the standard deviation for each value of , it is necessary to calculate the individual value and re-solve the system of equations (109) (although the calculations can be simplified, once calculating the inverse matrix).
An example of a one-dimensional interpolation.
An example of a one-dimensional interpolation.
The standard deviation of the distribution of possible errors as a random variable for the function obtained in Fig.15.
The standard deviation of the distribution of possible errors as a random variable for the function obtained in Fig.15.
  • The expected error of the function (103) as a random variable will have a normal distribution with standard deviation, that calculated from (107) and the expectation is zero. Knowing this, can display it graphically, as shown in Figure 17.
Possible expected error, or distribution of realizations of a random function satisfying data training.
Possible expected error, or distribution of realizations of a random function satisfying data training.
  • On the other hand, the fact that one can understand a possible error is the distribution of values realizations of the random function which satisfy data training.
An example of a one-dimensional approximation.
An example of a one-dimensional approximation.
The distribution of possible realizations or expected error.
The distribution of possible realizations or expected error.
  • Standard deviation in (107) reflects the range of values associated with the possible distribution of realizations. If need to take into account possible noise in the data itself, to calculate the standard deviation can use the expression:

{{eqno|110}}

<br\> Example. Consider the one-dimensional case.

  • Assume that the distribution of noise (its standard deviation) in the approximation given by:

  • I.e. in the region near zero noise is almost absent and increases with distance.
Approximation taking into account the noise distribution.
Approximation taking into account the noise distribution.
Graphic display of the set of realizations (with noise).
Graphic display of the set of realizations (with noise).

References[edit]

  1. Bahvalov J.N., Zuev A.N., Shirabakina T.A., Pattern recognition method based on the theory of random functions. – St. Petersburg: Izvestiya vuzov. Priborostroenie, 2005. T.48, №2 p.5-8.
  2. Bahvalov J.N. The method of multivariate interpolation and approximation and its application. – M.: Sputnik+, 2007. - 108 p.
  3. Bahvalov J.N., Potapov A.S. A statistical model of interpolation and its application to texture segmentation. - Eighth International Workshop on Nondestructive Testing and Computer Simulations in Science and Engineering, Alexander I. Melker, Editor, Proceedings of SPIE, 2004, Vol. 5831, p. 191-198.