Approximation of experimental data. Least square method


Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using method least squares , approximate these data linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And b accepts smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

100 RUR bonus for first order

Select job type Graduate work Course work Abstract Master's thesis Report on practice Article Report Review Test Monograph Problem Solving Business Plan Answers to Questions Creative work Essay Drawing Works Translation Presentations Typing Other Increasing the uniqueness of the text Master's thesis Laboratory work Online help

Find out the price

The least squares method is a mathematical (mathematical-statistical) technique used to align time series, identify the form of correlation between random variables, etc. It consists in the fact that the function describing this phenomenon is approximated by a simpler function. Moreover, the latter is selected in such a way that the standard deviation (see Dispersion) of the actual levels of the function at the observed points from the aligned ones is the smallest.

For example, according to available data ( xi,yi) (i = 1, 2, ..., n) such a curve is constructed y = a + bx, at which the minimum sum of squared deviations is achieved

i.e., a function depending on two parameters is minimized: a- segment on the ordinate axis and b- straight line slope.

Equations giving the necessary conditions function minimization S(a,b), are called normal equations. As approximating functions, not only linear (alignment along a straight line), but also quadratic, parabolic, exponential, etc. are used. For an example of aligning a time series along a straight line, see Fig. M.2, where the sum of squared distances ( y 1 – ȳ 1)2 + (y 2 – ȳ 2)2 .... - the smallest, and the resulting straight line the best way reflects the trend of a dynamic series of observations of some indicator over time.

For unbiased OLS estimates, it is necessary and sufficient to perform the most important condition regression analysis: conditional by factors expected value random error should be zero. This condition, in particular, is met if: 1.the mathematical expectation of random errors is zero, and 2.factors and random errors are independent random variables. The first condition can be considered always fulfilled for models with a constant, since the constant takes on a non-zero mathematical expectation of errors. The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case).

The most common method of statistical estimation of parameters of regression equations is the least squares method. This method is based on a number of assumptions regarding the nature of the data and the results of the model. The main ones are a clear division of the original variables into dependent and independent, the uncorrelatedness of the factors included in the equations, the linearity of the relationship, the absence of autocorrelation of the residuals, the equality of their mathematical expectations to zero and constant dispersion.

One of the main hypotheses of OLS is the assumption of equality of variances of deviations ei, i.e. their spread around the average (zero) value of the series should be a stable value. This property is called homoscedasticity. In practice, the variances of deviations are quite often unequal, that is, heteroscedasticity is observed. This may be due to various reasons. For example, there may be errors in the source data. Occasional inaccuracies in the source information, such as errors in the order of numbers, can have a significant impact on the results. Often, a larger spread of deviations єi is observed with large values ​​of the dependent variable (variables). If the data contains a significant error, then, naturally, the deviation of the model value calculated from the erroneous data will also be large. In order to get rid of this error, we need to reduce the contribution of this data to the calculation results, assigning less weight to them than to all others. This idea is implemented in weighted OLS.

Least square method

In the final lesson of the topic, we will get acquainted with the most famous application FNP, which finds the widest application in various fields of science and practical activities. This could be physics, chemistry, biology, economics, sociology, psychology, and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will issue you a ticket to amazing country entitled Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! ...But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY QUICKLY ;-) But first general statement of the problem+ accompanying example:

Let us study indicators in a certain subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption could be like scientific hypothesis, and be based on basic common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Let's denote by:

– retail area of ​​a grocery store, sq.m.,
– annual turnover of a grocery store, million rubles.

It is absolutely clear that the larger the store area, the greater in most cases its turnover will be.

Suppose that after carrying out observations/experiments/calculations/dances with a tambourine we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of trade turnover can be obtained by means of mathematical statistics. However, let’s not get distracted, the commercial espionage course is already paid =)

Tabular data can also be written in the form of points and depicted in the familiar form Cartesian system .

We will answer important question: How many points are needed for a qualitative study?

The bigger, the better. The minimum acceptable set consists of 5-6 points. In addition, when the amount of data is small, “anomalous” results cannot be included in the sample. So, for example, a small elite store can earn orders of magnitude more than “its colleagues,” thereby distorting general pattern, which is what you need to find!



To put it very simply, we need to select a function, schedule which passes as close as possible to the points . This function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general view. Let some function approximate experimental data:


How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or collapsed: (in case anyone doesn't know: is the sum icon, and – an auxiliary “counter” variable, which takes values ​​from 1 to ) .

Bringing experimental points closer various functions, we will receive different meanings, and obviously, where this amount is smaller, that function is more accurate.

Such a method exists and it is called least modulus method. However, in practice it has become much more widespread least square method, in which possible negative values ​​are eliminated not by the module, but by squaring the deviations:



, after which efforts are aimed at selecting a function such that the sum of squared deviations was as small as possible. Actually, this is where the name of the method comes from.

And now we're going back to something else important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic , exponential , logarithmic , quadratic etc. And, of course, here I would immediately like to “reduce the field of activity.” Which class of functions should I choose for research? A primitive but effective technique:

– The easiest way is to depict points on the drawing and analyze their location. If they tend to run in a straight line, then you should look for equation of a line with optimal values ​​and . In other words, the task is to find SUCH coefficients so that the sum of squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is obviously clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation – those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

Let's remember our example: suppose that “store” points tend to be located in a straight line and there is every reason to believe that linear dependence trade turnover from retail space. Let's find SUCH coefficients “a” and “be” such that the sum of squared deviations was the smallest. Everything is as usual - first 1st order partial derivatives. According to linearity rule You can differentiate right under the sum icon:

If you want to use this information for an essay or coursework - I will be very grateful for the link in the list of sources; you will find such detailed calculations in few places:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why “a” and “be” can be taken out beyond the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

Do we know the coordinates of the points? We know. Amounts can we find it? Easily. Let's make the simplest system of two linear equations with two unknowns(“a” and “be”). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for an extremum, we can verify that at this point the function reaches exactly minimum. The check involves additional calculations and therefore we will leave it behind the scenes (if necessary, the missing frame can be viewedHere ) . We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In our example situation, Eq. allows you to predict what trade turnover ("Igrek") the store will have at one or another value of the sales area (one or another meaning of “x”). Yes, the resulting forecast will only be a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with “real” numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grades. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponential and some other functions.

In fact, all that remains is to distribute the promised goodies - so that you can learn to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which to construct experimental points and a graph of the approximating function in a Cartesian rectangular coordinate system . Find the sum of squared deviations between the empirical and theoretical values. Find out if the feature would be better (from the point of view of the least squares method) bring experimental points closer.

Please note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can also be fractional. In addition, depending on the content of a particular task, both “X” and “game” values ​​can be completely or partially negative. Well, we have been given a “faceless” task, and we begin it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purpose of more compact recording, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means the system has a unique solution.

Let's check. I understand that you don’t want to, but why skip errors where they can absolutely not be missed? Let us substitute the found solution into left side each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative slope. Function tells us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression “to be in trend,” and I think that this term does not need additional comments.

Let's calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the “raspberry” segments (two of which are so small that they are not even visible).

Let's summarize the calculations in a table:


Again, they can be done manually; just in case, I’ll give an example for the 1st point:

but it’s much more effective to do it already in a known manner:

We repeat once again: What is the meaning of the result obtained? From all linear functions y function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function would it be better to bring the experimental points closer?

Let's find the corresponding sum of squared deviations - to distinguish, I will denote them by the letter “epsilon”. The technique is exactly the same:


And again, just in case, the calculations for the 1st point:

In Excel we use the standard function EXP (syntax can be found in Excel Help).

Conclusion: , which means that the exponential function approximates the experimental points worse than a straight line .

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I have built a graph of this exponential function– and it also passes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This concludes the solution, and I return to the question of the natural values ​​of the argument. IN various studies, as a rule, economic or sociological, natural “X’s” are used to number months, years or other equal time intervals. Consider, for example, the following problem:

The following data is available on the store’s retail turnover for the first half of the year:

Using analytical straight line alignment, determine the volume of turnover for July.

Yes, no problem: we number the months 1, 2, 3, 4, 5, 6 and use the usual algorithm, as a result of which we get an equation - the only thing is that when it comes to time, they usually use the letter “te” (although this is not critical). The resulting equation shows that in the first half of the year trade turnover increased by an average of 27.74 units. per month. Let's get the forecast for July (month no. 7): d.e.

And there are countless tasks like this. Those who wish can use an additional service, namely my Excel calculator (demo version), which solves the analyzed problem almost instantly! Working version of the program is available in exchange or for symbolic fee.

At the end of the lesson brief information o finding dependencies of some other types. Actually, there’s not much to tell, since the fundamental approach and solution algorithm remain the same.

Let us assume that the arrangement of the experimental points resembles a hyperbola. Then, to find the coefficients of the best hyperbola, you need to find the minimum of the function - anyone can carry out detailed calculations and arrive at a similar system:

From a formal technical point of view, it is obtained from a “linear” system (let's denote it with an asterisk) replacing "x" with . Well, what about the amounts? calculate, after which to the optimal coefficients “a” and “be” close at hand.

If there is every reason to believe that the points are located along a logarithmic curve, then to find the optimal values ​​we find the minimum of the function . Formally, in the system (*) needs to be replaced with:

When performing calculations in Excel, use the function LN. I confess that it would not be particularly difficult for me to create calculators for each of the cases under consideration, but it would still be better if you “programmed” the calculations yourself. Lesson videos to help.

With exponential dependence the situation is a little more complicated. To reduce the matter to the linear case, we take the function logarithm and use properties of the logarithm:

Now, comparing the resulting function with the linear function, we come to the conclusion that in the system (*) must be replaced by , and – by . For convenience, let's denote:

Please note that the system is resolved with respect to and, and therefore, after finding the roots, you must not forget to find the coefficient itself.

To bring experimental points closer optimal parabola , should be found minimum function of three variables . After performing standard actions, we get the following “working” system:

Yes, of course, there are more amounts here, but there are no difficulties at all when using your favorite application. And finally, I’ll tell you how to quickly perform a check using Excel and build the desired trend line: create a scatter plot, select any of the points with the mouse and right click select the option "Add trend line". Next, select the chart type and on the tab "Options" activate the option "Show equation on diagram". OK

As always, I would like to end the article with some in a beautiful phrase, and I almost typed “Be trendy!” But he changed his mind in time. And not because it is stereotyped. I don’t know how it is for anyone, but I don’t really want to follow the promoted American and especially European trend =) Therefore, I wish each of you to stick to your own line!

http://www.grandars.ru/student/vysshaya-matematika/metod-naimenshih-kvadratov.html

The least squares method is one of the most common and most developed due to its simplicity and efficiency of methods for estimating parameters of linear econometric models. At the same time, when using it, some caution should be observed, since models constructed using it may not satisfy a number of requirements for the quality of their parameters and, as a result, do not reflect the patterns of process development “well” enough.

Let us consider the procedure for estimating the parameters of a linear econometric model using the least squares method in more detail. Such a model in general can be represented by equation (1.2):

y t = a 0 + a 1 x 1t +...+ a n x nt + ε t.

The initial data when estimating the parameters a 0 , a 1 ,..., a n is a vector of values ​​of the dependent variable y= (y 1 , y 2 , ... , y T)" and the matrix of values ​​of independent variables

in which the first column, consisting of ones, corresponds to the model coefficient.

The least squares method received its name based on the basic principle that the parameter estimates obtained on its basis must satisfy: the sum of squares of the model error should be minimal.

Examples of solving problems using the least squares method

Example 2.1. The trading enterprise has a network of 12 stores, information on the activities of which is presented in table. 2.1.

The management of the enterprise would like to know how the size of the annual turnover depends on the retail space of the store.

Table 2.1

Store number Annual turnover, million rubles. Retail area, thousand m2
19,76 0,24
38,09 0,31
40,95 0,55
41,08 0,48
56,29 0,78
68,51 0,98
75,01 0,94
89,05 1,21
91,13 1,29
91,26 1,12
99,84 1,29
108,55 1,49

Least squares solution. Let us denote the annual turnover of the th store, million rubles; - retail area of ​​the th store, thousand m2.

Fig.2.1. Scatterplot for Example 2.1

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.1).

Based on the scatter diagram, we can conclude that annual turnover is positively dependent on retail space (i.e., y will increase with increasing ). The most suitable form of functional connection is linear.

Information for further calculations is presented in table. 2.2. Using the least squares method, we estimate the parameters of a linear one-factor econometric model

Table 2.2

t y t x 1t y t 2 x 1t 2 x 1t y t
19,76 0,24 390,4576 0,0576 4,7424
38,09 0,31 1450,8481 0,0961 11,8079
40,95 0,55 1676,9025 0,3025 22,5225
41,08 0,48 1687,5664 0,2304 19,7184
56,29 0,78 3168,5641 0,6084 43,9062
68,51 0,98 4693,6201 0,9604 67,1398
75,01 0,94 5626,5001 0,8836 70,5094
89,05 1,21 7929,9025 1,4641 107,7505
91,13 1,29 8304,6769 1,6641 117,5577
91,26 1,12 8328,3876 1,2544 102,2112
99,84 1,29 9968,0256 1,6641 128,7936
108,55 1,49 11783,1025 2,2201 161,7395
S 819,52 10,68 65008,554 11,4058 858,3991
Average 68,29 0,89

Thus,

Therefore, with an increase in retail space by 1 thousand m2, other things being equal, the average annual turnover increases by 67.8871 million rubles.

Example 2.2. The company's management noticed that the annual turnover depends not only on the store's sales area (see example 2.1), but also on the average number of visitors. The relevant information is presented in table. 2.3.

Table 2.3

Solution. Let us denote - the average number of visitors to the th store per day, thousand people.

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.2).

Based on the scatterplot, we can conclude that annual turnover is positively dependent on the average number of visitors per day (i.e., y will increase with increasing ). The form of functional dependence is linear.

Rice. 2.2. Scatterplot for Example 2.2

Table 2.4

t x 2t x 2t 2 y t x 2t x 1t x 2t
8,25 68,0625 163,02 1,98
10,24 104,8575 390,0416 3,1744
9,31 86,6761 381,2445 5,1205
11,01 121,2201 452,2908 5,2848
8,54 72,9316 480,7166 6,6612
7,51 56,4001 514,5101 7,3598
12,36 152,7696 927,1236 11,6184
10,81 116,8561 962,6305 13,0801
9,89 97,8121 901,2757 12,7581
13,72 188,2384 1252,0872 15,3664
12,27 150,5529 1225,0368 15,8283
13,92 193,7664 1511,016 20,7408
S 127,83 1410,44 9160,9934 118,9728
Average 10,65

In general, it is necessary to determine the parameters of a two-factor econometric model

y t = a 0 + a 1 x 1t + a 2 x 2t + ε t

The information required for further calculations is presented in table. 2.4.

Let us estimate the parameters of a linear two-factor econometric model using the least squares method.

Thus,

Estimation of the coefficient =61.6583 shows that, other things being equal, with an increase in retail space by 1 thousand m 2, the annual turnover will increase by an average of 61.6583 million rubles.

The coefficient estimate = 2.2748 shows that, other things being equal, with an increase in the average number of visitors per 1 thousand people. per day, annual turnover will increase by an average of 2.2748 million rubles.

Example 2.3. Using the information presented in table. 2.2 and 2.4, estimate the parameter of the one-factor econometric model

where is the centered value of the annual turnover of the th store, million rubles; - centered value of the average daily number of visitors to the t-th store, thousand people. (see examples 2.1-2.2).

Solution. Additional Information, necessary for calculations, is presented in table. 2.5.

Table 2.5

-48,53 -2,40 5,7720 116,6013
-30,20 -0,41 0,1702 12,4589
-27,34 -1,34 1,8023 36,7084
-27,21 0,36 0,1278 -9,7288
-12,00 -2,11 4,4627 25,3570
0,22 -3,14 9,8753 -0,6809
6,72 1,71 2,9156 11,4687
20,76 0,16 0,0348 3,2992
22,84 -0,76 0,5814 -17,413
22,97 3,07 9,4096 70,4503
31,55 1,62 2,6163 51,0267
40,26 3,27 10,6766 131,5387
Amount 48,4344 431,0566

Using formula (2.35), we obtain

Thus,

http://www.cleverstudents.ru/articles/mnk.html

Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The second order differential has the form:

That is

Therefore, the matrix of quadratic form has the form

and the values ​​of the elements do not depend on A And b.

Let us show that the matrix is ​​positive definite. To do this, the angular minors must be positive.

Angular minor of the first order . The inequality is strict, since the points

The method of least squares is a mathematical procedure for constructing a linear equation that best fits a set of ordered pairs by finding the values ​​for a and b, the coefficients in the equation of the line. The goal of least squares is to minimize the total squared error between the values ​​of y and ŷ. If for each point we determine the error ŷ, the least squares method minimizes:

where n = number of ordered pairs around the line. as closely as possible to the data.

This concept is illustrated in the figure

Based on the figure, the line that best fits the data, the regression line, minimizes the total squared error of the four points on the graph. I'll show you how to determine this using least squares with the following example.

Imagine a young couple who have recently moved in together and share a vanity table in the bathroom. The young man began to notice that half of his table was inexorably shrinking, losing ground to hair mousses and soy complexes. Over the past few months, the guy had been closely monitoring the rate at which the number of objects on her side of the table was increasing. The table below shows the number of items the girl has accumulated on her bathroom vanity over the past few months.

Since our goal is to find out whether the number of items increases over time, “Month” will be the independent variable, and “Number of items” will be the dependent variable.

Using the least squares method, we determine the equation that best fits the data by calculating the values ​​of a, the y-intercept, and b, the slope of the line:

a = y avg - bx avg

where x avg is the average value of x, the independent variable, y avg is the average value of y, the independent variable.

The table below summarizes the calculations required for these equations.

The effect curve for our bathtub example would be given by the following equation:

Since our equation has a positive slope of 0.976, the guy has evidence that the number of items on the table increases over time at an average rate of 1 item per month. The graph shows the effect curve with ordered pairs.

The expectation for the number of items over the next six months (month 16) will be calculated as follows:

ŷ = 5.13 + 0.976x = 5.13 + 0.976(16) ~ 20.7 = 21 items

So, it's time for our hero to take some action.

TREND function in Excel

As you probably already guessed, Excel has a function for calculating values ​​by least squares method. This function is called TREND. Its syntax is as follows:

TREND ( known values Y; known values ​​of X; new X values; const)

known Y values ​​– an array of dependent variables, in our case, the number of objects on the table

known values ​​X – an array of independent variables, in our case this is the month

new X values ​​– new X values ​​(months) for which TREND function returns the expected value of the dependent variables (number of items)

const - optional. A Boolean value that specifies whether the constant b is required to be 0.

For example, the figure shows the TREND function used to determine the expected number of items on a bathroom vanity for the 16th month.

Least square method

Least square method ( OLS, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models using sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution lies in or satisfies some criterion for minimizing the sum of squares of some functions of the required variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or constraints, the number of which exceeds the number of these quantities, etc.

The essence of MNC

Let some (parametric) model of a probabilistic (regression) relationship between the (explained) variable be given y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- random model error.

Let there also be sample observations of the values ​​of these variables. Let be the observation number (). Then are the values ​​of the variables in the th observation. Then, for given values ​​of parameters b, it is possible to calculate the theoretical (model) values ​​of the explained variable y:

The size of the residuals depends on the values ​​of the parameters b.

The essence of the least squares method (ordinary, classical) is to find parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English) Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero and solving the resulting system of equations:

If the model's random errors are normally distributed, have the same variance, and are uncorrelated, OLS parameter estimates are the same as maximum likelihood estimates (MLM).

OLS in the case of a linear model

Let the regression dependence be linear:

Let y is a column vector of observations of the explained variable, and is a matrix of factor observations (the rows of the matrix are the vectors of factor values ​​in a given observation, the columns are the vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

Accordingly, the sum of squares of the regression residuals will be equal to

Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):

.

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

For analytical purposes, the latter representation of this formula is useful. If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

Example: simplest (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimators

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbaised Estimator) - the best linear unbiased estimate; V Russian literature The Gauss-Markov theorem is often used). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

Generalized OLS

The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is ​​proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), for such matrices there is a decomposition. Consequently, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

The covariance matrix of these estimates will accordingly be equal to

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). IN in this case the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” inversely proportional to the variance of the random error in this observation: . In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the expected standard deviation random errors), and the usual OLS is applied to weighted data.

Some special cases of using MNC in practice

Approximation of linear dependence

Let us consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This could be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), measurements of these quantities were carried out, as a result of which the values ​​and their corresponding values. The measurement data must be recorded in a table.

Table. Measurement results.

Measurement no.
1
2
3
4
5
6

The question is: what value of the coefficient can be selected to best describe the dependence? According to the least squares method, this value should be such that the sum of the squared deviations of the values ​​from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let us find from this formula the value of the coefficient. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient, which is what was required in the problem.

Story

Before early XIX V. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was responsible for the first application of the method, and Legendre (1805) independently discovered and published it under modern name(fr. Méthode des moindres quarrés ) . Laplace related the method to probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative uses of OLS

The idea of ​​the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (Euclidean metric in finite-dimensional spaces).

One application is the “solution” of systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is ​​not square, but rectangular of size .

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be “solved” only in the sense of choosing such a vector to minimize the “distance” between the vectors and . To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations

Editor's Choice
I baked these wonderful potato pies in the oven and they turned out incredibly tasty and tender. I made them from beautiful...

Surely everyone loves such an old but tasty dish as pies. A similar product can have a lot of different fillings and options...

Crackers made from white or rye bread are familiar to everyone. Many housewives use them as a hearty addition to various treats:...

Hello! How are you doing there? Hello! Everything is fine, how are you? Yes, that’s not bad either, we’ve come to visit you :) Are you looking forward to it? Certainly! Well that's it...
To prepare a large three-liter pan of excellent soup, you will need very few ingredients - just take a few...
There are many interesting recipes using low-calorie and healthy poultry giblets. For example, chicken hearts are cooked very often, they...
1 Chicken hearts stewed in sour cream in a frying pan 2 In a slow cooker 3 In sour cream and cheese sauce 4 In sour cream with potatoes 5 Option with...
Calorie content: Not specified Cooking time: Not specified Lavash envelopes are a convenient and tasty snack. Lavash envelopes...
Made from mackerel at home - you'll lick your fingers! The canned food recipe is simple, suitable even for a novice cook. The fish turns out...