Chapter 9
PROJECTIONS II - MULTIPLE CORRELATION
by: josavere
Introduction
With the Digital Revolution, predictive analytics looks for future results using data from the past; the models use different methodologies and mathematics with a very similar general objective; there are some techniques that are specific to classification (the results of the model are binary; a yes or no, in the form of 0 and 1) and others that are regression techniques that allow a value to be predicted. It can also be applied to any type of unknown event from the past, present, or future. The Digital Revolution provides BIG DATA with an abundance of structured variables, such as data tables, and unstructured variables, such as texts, images, or videos, offering new possibilities for prediction and bringing about a change of disposition. Flexible and heterogeneous prediction rules are now being built with a proven ability to predict well data different from those used to estimate them; the final predictor used combines different models, procedures, and data types. Decision Trees, Neural Networks, Support Vector Machines, Bayesian Analysis, Logistic Regression, Linear Regression, Time Series and Data Mining, K-Nearest Neighbors, Ensemble Models, Gradient Boosting, Incremental Response Models, replace, introducing multiple parameters extracted from Big Data, with many advantages, the models traditionally used by statistics. Big Data Analytics is the technology used to analyze an enormous amount of structured and unstructured data that are gathered, organized, and interpreted by software, transforming them into useful information for decision making and to generate ideas about market trends. In addition, it contributes to the generation of ideas for new products and services, customer attraction, audience understanding, security, and more benefits to making strategic decisions.
1. APPLICATION TO SALES. USE OF THE COMPUTER
X2: the interest rate.
X3: the employment level.
With the provided data (in this case, twelve years previous to the prediction) build a matrix, using software Excel, offered by Microsoft.
Years |
sales in million Y |
monthly entrance in million of $ X_{1} |
interest rate (%) X_{2} |
level of use (%) X_{3} |
1 |
1.800 |
2.0 |
48.0 |
92.0 |
2 |
2.000 |
2.0 |
42.0 |
92.5 |
3 |
2.320 |
2.4 |
39.4 |
92.6 |
4 |
2.540 |
2.6 |
37.3 |
93.1 |
5 |
2.712 |
2.7 |
32.8 |
93.6 |
6 |
2.850 |
2.9 |
29.0 |
93.9 |
7 |
3.020 |
3.2 |
26.3 |
94.0 |
8 |
3.430 |
3.5 |
22.0 |
94.3 |
9 |
3.600 |
3.7 |
21.2 |
94.7 |
10 |
3.800 |
3.9 |
20.0 |
94.8 |
11 |
3.960 |
4.1 |
19.8 |
95.0 |
12 |
4.200 |
4.4 |
19.5 |
95.2 |
STEP 1: enter the Excel program and prepare the matrix with the information available:
STEP 2: slow down the independent variables.
Multiple regressions must be made with the values of the previous period.
Note: observe that for a matrix of n rows, (n+1) observations are needed.
STEP 3: line of regression calculation.
Enter the menu "tools" and then to the submenu” Analysis of Data".
If Excel does not have the Analysis of data option, enter the menu "tools" and then to the sub menu complements, there choose the option tools for Analysis, and then click on accept.
Click on the Analysis of data option, this will show you a window of dialogue; select the option” Regression”.
Click on Accept, and you will observe the following window:
On the icon "Range Y of entrance" enter the data of the dependent variable, that for our example would be sales, (is not necessary to close the window of Regression, it is enough to indicate the data of sales, using the left key of the mouse):
Thereafter, in the square "Range X of entrance", enter the data of the independent variables (see box), using the mouse and the red guides on the right:
As it can be observed, in the superior circle, the confidence level that has been requested is 95%; the maximum projection error of data accepted is 5%. In addition to this, notice that the option "on a new sheet" is selected; you can see this in the circle on the left-hand side at the key of the page and the result of the data will be shown in a new sheet of Excel clicking on Accept.
The confidence level depends on the willingness of the analyst. It is recommended to work with levels of 95% or superior to that.
The result is the following:
STEP 4: Interpretation of the data.
The multiple coefficients of correlation indicate the percentage of the total sales of the company that is explained by a linear relation of the independent variables (salary, interest rate, and level of employment); in our example is 99.8%
The coefficient of determination of 99,6%, indicates the level of adjustment of the model; the closer the value of R2 is to 1 the greater the amount of total variation that can be explained by the terms that appear in the model.
The adjusted R2 is a measure that includes in the most accurate way the variations that the model calculates in a direct way, adjusting or correcting the number of explanatory variables.
A standard mistake is equivalent to a typical deviation, which indicates that in 95% of the cases each of the observations stands between the expected value, more or less: 51.8.
Observations are the number of analyzed periods, (12). Remember that for practical effects one should work with (n+1) observations, (13).
STEP 5: preparation of the equation.
Taking the intercept (-5282,68), and the coefficients of X1 (772,86); X2 (-10,47) and X3 (68,31) we now process the multiple regression equation:
Y = - 5282,687201 + 772.8693637 X1 - 10,47165995 X2 + 68.31834232 X3
Note: The reader can extend the knowledge of statistics in a book specialized on the subject.
STEP 6: Test of the Model.
It is done by the verification of the goodness of the adjustment, as it is indicated next.
Years | sales | wage | interest rate. (%) | rate of work (%) | y considered | [(Y-y)/Y]% |
1 | 1800 | 1,8 | 48,5 | 91 | 1817,57 | -0,98 |
2 | 2000 | 2 | 48 | 92 | 2045,7 | -2,28 |
3 | 2320 | 2,2 | 42 | 92,5 | 2297,26 | 0,98 |
4 | 2540 | 2,4 | 39,4 | 92,6 | 2485,89 | 2,13 |
5 | 2712 | 2,6 | 37,3 | 93,1 | 2696,62 | 0,57 |
6 | 2850 | 2,7 | 32,8 | 93,6 | 2855,19 | -0,18 |
7 | 3020 | 2,9 | 29 | 93,9 | 3070,05 | -1,66 |
8 | 3430 | 3,2 | 26,3 | 94 | 3337,01 | 2,71 |
9 | 3600 | 3,5 | 22 | 94,3 | 3634,4 | -0,96 |
10 | 3800 | 3,7 | 21,2 | 94,7 | 3824,68 | -0,65 |
11 | 3960 | 3,9 | 20 | 94,8 | 3998,65 | -0,98 |
12 | 4200 | 4,1 | 19,8 | 95 | 4168,98 | 0,74 |
Observed that the model adjusts perfectly, and therefore it is applicable to the prediction of future values.
According to the macroeconomic projections, that can be obtained from data provided by the government; or by projections estimated within the same company, the following indicators are expected:
Year |
wage |
interest rate. (%) |
rate of work (%) |
projected sales |
13 |
4,3 |
20,5 |
94,6 |
13 |
14 |
4,35 |
22,3 |
94 |
14 |
15 |
4,4 |
23,8 |
93,7 |
15 |
16 |
4,45 |
26,4 |
92,8 |
16 |
17 |
4,5 |
29 |
90 |
17 |
We now apply the multiple regression equation that we had already found and that we indicate here:
The projection would then be:
Year |
wage |
interest rate. (%) |
rate of work (%) |
projected sales |
% of variation |
13 |
4,30 |
20,50 |
94,60 |
4288,90 |
-2,117 |
14 |
4,35 |
22,30 |
94,00 |
4267,70 |
-0,494 |
15 |
4,40 |
23,80 |
93,70 |
4270,14 |
0,057 |
16 |
4,45 |
26,40 |
92,80 |
4220,07 |
-1,173 |
17 |
4,50 |
29,00 |
90,00 |
4040,20 |
-4,262 |
Y_{13}= 4288,90; Y_{14}= 4267,70; Y_{15}= 4270,14; Y_{16}= 4220,07; Y_{17}= 4040,20
With this, the integrated graph of real values of sales (the first 12 periods) plus the projected values (the next five periods, from the 13Th to the 17Th) shows the possible behavior of the sales, this in order to accomplish our goal: the PLAN OF GENERATION OF VALUE.(josavere)