9. Projections II - multiple correlation

# IntroductionWith the Digital Revolution, predictive analytics looks for future results using data from the past; the models use different methodologies and mathematics with a very similar general objective; there are some techniques that are specific to classification (the results of the model are binary; a yes or no, in the form of 0 and 1) and others that are regression techniques that allow a value to be predicted. It can also be applied to any type of unknown event from the past, present, or future. The Digital Revolution provides BIG DATA with an abundance of structured variables, such as data tables, and unstructured variables, such as texts, images, or videos, offering new possibilities for prediction and bringing about a change of disposition. Flexible and heterogeneous prediction rules are now being built with a proven ability to predict well data different from those used to estimate them; the final predictor used combines different models, procedures, and data types. Decision Trees, Neural Networks, Support Vector Machines, Bayesian Analysis, Logistic Regression, Linear Regression, Time Series and Data Mining, K-Nearest Neighbors, Ensemble Models, Gradient Boosting, Incremental Response Models, replace, introducing multiple parameters extracted from Big Data, with many advantages, the models traditionally used by statistics. Big Data Analytics is the technology used to analyze an enormous amount of structured and unstructured data that are gathered, organized, and interpreted by software, transforming them into useful information for decision making and to generate ideas about market trends. In addition, it contributes to the generation of ideas for new products and services, customer attraction, audience understanding, security, and more benefits to making strategic decisions.

1. APPLICATION TO SALES. USE OF THE COMPUTER

In the great majority of cases, the change in the sales obeys to the combination of a great number of variables, among which we could mention: the available familiar income, the employment level of the economy, the interest rates, the elasticity of the demand, policies of the company, etc.
For these cases, we have a model of regression and correlations multiple that are indicated in the construction of the following example, indicated step by step:
The Y, the dependent variable (sales in constant dollars), which depends on other independent variables, for our example will be: monthly income, the interest rate, and the level of employment of our niche of the market, with which:
X 1: monthly income available for the market niche,
X2: the interest rate.
X3: the employment level.
With the provided data (in this case, twelve years previous to the prediction) build a matrix, using software Excel, offered by Microsoft.
 Years sales in million Y monthly entrance in million of \$ X1 interest rate (%) X2 level of use (%) X3 1 1.800 2.0 48.0 92.0 2 2.000 2.0 42.0 92.5 3 2.320 2.4 39.4 92.6 4 2.540 2.6 37.3 93.1 5 2.712 2.7 32.8 93.6 6 2.850 2.9 29.0 93.9 7 3.020 3.2 26.3 94.0 8 3.430 3.5 22.0 94.3 9 3.600 3.7 21.2 94.7 10 3.800 3.9 20.0 94.8 11 3.960 4.1 19.8 95.0 12 4.200 4.4 19.5 95.2

STEP 1: enter the Excel program and prepare the matrix with the information available:

STEP 2: slow down the independent variables.
Multiple regressions must be made with the values of the previous period.

Note: observe that for a matrix of n rows, (n+1) observations are needed.

STEP 3: line of regression calculation.
Enter the menu "tools" and then to the submenu” Analysis of Data".

If Excel does not have the Analysis of data option, enter the menu "tools" and then to the sub menu complements, there choose the option tools for Analysis, and then click on accept.
Click on the Analysis of data option, this will show you a window of dialogue; select the option” Regression”.

Click on Accept, and you will observe the following window:

On the icon "Range Y of entrance" enter the data of the dependent variable, that for our example would be sales, (is not necessary to close the window of Regression, it is enough to indicate the data of sales, using the left key of the mouse):

Thereafter, in the square "Range X of entrance", enter the data of the independent variables  (see box), using the mouse and the red guides on the right:

As it can be observed, in the superior circle, the confidence level that has been requested is 95%; the maximum projection error of data accepted is 5%. In addition to this, notice that the option "on a new sheet" is selected; you can see this in the circle on the left-hand side at the key of the page and the result of the data will be shown in a new sheet of Excel clicking on Accept.

The confidence level depends on the willingness of the analyst. It is recommended to work with levels of 95% or superior to that.
The result is the following:

STEP 4: Interpretation of the data.

The multiple coefficients of correlation indicate the percentage of the total sales of the company that is explained by a linear relation of the independent variables (salary, interest rate, and level of employment); in our example is 99.8%
The coefficient of determination of 99,6%, indicates the level of adjustment of the model; the closer the value of R2 is to 1 the greater the amount of total variation that can be explained by the terms that appear in the model.
The adjusted R2 is a measure that includes in the most accurate way the variations that the model calculates in a direct way, adjusting or correcting the number of explanatory variables.
A standard mistake is equivalent to a typical deviation, which indicates that in 95% of the cases each of the observations stands between the expected value, more or less: 51.8.
Observations are the number of analyzed periods, (12).   Remember that for practical effects one should work with (n+1) observations, (13).

STEP 5: preparation of the equation.
Taking the intercept (-5282,68), and the coefficients of  X1 (772,86); X2 (-10,47) and X3 (68,31) we now process the multiple regression equation:

Y = - 5282,687201 + 772.8693637 X1 - 10,47165995 X2 + 68.31834232 X3

Note: The reader can extend the knowledge of statistics in a book specialized on the subject.

STEP 6: Test of the Model.
It is done by the verification of the goodness of the adjustment, as it is indicated next.

 Years sales wage interest rate. (%) rate of work (%) y considered [(Y-y)/Y]% 1 1800 1,8 48,5 91 1817,57 -0,98 2 2000 2 48 92 2045,7 -2,28 3 2320 2,2 42 92,5 2297,26 0,98 4 2540 2,4 39,4 92,6 2485,89 2,13 5 2712 2,6 37,3 93,1 2696,62 0,57 6 2850 2,7 32,8 93,6 2855,19 -0,18 7 3020 2,9 29 93,9 3070,05 -1,66 8 3430 3,2 26,3 94 3337,01 2,71 9 3600 3,5 22 94,3 3634,4 -0,96 10 3800 3,7 21,2 94,7 3824,68 -0,65 11 3960 3,9 20 94,8 3998,65 -0,98 12 4200 4,1 19,8 95 4168,98 0,74

Observed that the model adjusts perfectly, and therefore it is applicable to the prediction of future values.

Graphically, the real and estimated sales for the regression equation, which is calculated using the resulting equation by which we visualize the degree of adjustment:

Let us make a projection of sales for the next five years with the following indicators.

According to the macroeconomic projections, that can be obtained from data provided by the government; or by projections estimated within the same company, the following indicators are expected:

 Year wage interest rate. (%) rate of work (%) projected sales 13 4,3 20,5 94,6 13 14 4,35 22,3 94 14 15 4,4 23,8 93,7 15 16 4,45 26,4 92,8 16 17 4,5 29 90 17

We now apply the multiple regression equation that we had already found and that we indicate here:

Y = - 5282,687201 + 772.8693637 X1 - 10,47165995 X2 + 68.31834232 X3

The projection would then be:
 Year wage interest rate. (%) rate of work (%) projected sales % of variation 13 4,30 20,50 94,60 4288,90 -2,117 14 4,35 22,30 94,00 4267,70 -0,494 15 4,40 23,80 93,70 4270,14 0,057 16 4,45 26,40 92,80 4220,07 -1,173 17 4,50 29,00 90,00 4040,20 -4,262

Y13= 4288,90; Y14= 4267,70; Y15= 4270,14; Y16= 4220,07; Y17= 4040,20

Interpretation: for the 13Th year, according to the model, the estimated sales fluctuated between [4288, 90 - 51.84] and [4288.90 + 51.84], that is to say between [\$4237 and \$4340, millions of dollars], with a level of confidence of 95%. With the extreme values, pessimistic and optimistic calculations can be made and with the average value, we can achieve the most probable option.

With this, the integrated graph of real values of sales (the first 12 periods) plus the projected values (the next five periods, from the 13Th to the 17Th) shows the possible behavior of the sales, this in order to accomplish our goal:  the PLAN OF GENERATION OF VALUE.(josavere)

As it can be observed a period of strong recession is expected concerning the macroeconomic variables of the country being this reflected in the sales of the next five years, the reason why the management must apply itself to look for new alternatives, or in its defect, to prepare them for the crisis.