ANALYSIS OF WELL-BEING IN OECD COUNTRIES THROUGH STATIS METHODOLOGY

F. J. RIVADENEIRA, A. M. S. FIGUEIREDO, F. O. S. FIGUEIREDO, S. M. CARVAJAL and R. A. RIVADENEIRA Facultad de Ciencias Informáticas, Universidad Laica Eloy Alfaro de Manabí – ULEAM Faculdade de Economia, Universidade do Porto and INESC – TEC Porto Faculdade de Economia, Universidade do Porto and CEAUL Facultad de Economía, Universidad Técnica de Manabí – UTM Facultad de Ingeniería Química, Universidad Técnica de Manabí – UTM fabricio.rivadeneira@uleam.edu.ec

Este artículo presenta los conceptos y resultados principales de una tesis de Maestría en Análisis de Datos que tiene como objetivo analizar la evolución de algunos países desarrollados y también de algunos países emergentes que son miembros de la Organización para la Cooperación y el Desarrollo Económicos (OCDE) en lo que se refiere a algunos indicadores o variables de bienestar durante el período 2011-2015, a través de la metodología STATIS (Estructura Estadística de una Tabla de Tres Índices).Esta metodología permite analizar la presencia de una estructura común en varias tablas de datos obtenidos a lo largo del tiempo, para identificar las diferencias y similitudes a lo largo del periodo de tiempo en estudio y de acuerdo con los indicadores de bienestar incluido en el "Índice para una Vida Mejor" de la OCDE, y para analizar las trayectorias de los países.

INTRODUCTION
Currently there are a special interest in the joint analysis of multiple data tables, named several multi-blocks or multi-way analysis.Most of these methods are extensions of Principal Component Analysis.On the other hand, there is also a global interest in analyzing the well-being and the progress of countries, like the Better Life Index created by the Organisation for Economic Co-operation and Development (OECD, 2016) and the Social Progress Index created by Social Progress Imperative organization (Social Progress Imperative, 2016).
Thus, the methodology chosen in this paper is STATIS ('Structuration des Tableaux À Trois Indices de la Statistique' in French or 'Structuring Three-way data sets in Statistics' in English), that is one of the methods for developing other complex techniques of joint analysis of several data sets, and it is applied in the analysis of OECD countries using well-being indicators.
The statistical databases online platform of the OECD (2015) includes data tables for analyzing the well-being of societies, each table contains between 17 and 24 indicators or quantitative variables, and it depends of the availability of the countries in gathering the information.These indicators focus on eleven aspects or dimensions of life that matter to people, and represent key factors, like housing, income, jobs, community, education, environment, civic engagement, health, life satisfaction, safety, work-life balance (OECD, 2013).So, these eleven dimensions of the index are currently based on one to four indicators.All these data from OECD can be presented in a multi-block data structure.
Since OECD data of countries are essentially multi-block data tables, multi-block component methods can be used for analyzing differences or similarities between OECD tables.Through the joint analysis of multiple data tables using the STATIS methodology, this paper proposes to analyze a set of tables used to calculate the Better Life Index in OECD countries, in order to know the performance of 34 member countries, as well as their trends in the 2011-2015 period.For that, we used several tables, where each table contains a set of well-being indicators of the OECD countries for a specific year.
Thereby, the main objective of this paper is to obtain a structure common of the data tables that best represents the differences and similarities among the years according to the performances of the OECD countries related to the well-being indicators.This aims to summarize the information contained in the various data tables and additionally, to analyze trends representing the trajectories of the countries through the years, identifying and explaining what countries are responsible for the differences detected between the various data tables.

Research questions
So, several data tables from OECD countries are considered corresponding to different years, thus this study was shaped by the following research questions: • How to handle with various data tables that measure sets of well-being indicators collected on the same countries (observations) in some years?
• How to analyze several data tables that have been collected in different moments of time to determine a common structure associated to the OECD countries that best represents similitudes between the different data tables?
• How to compare globally the several data tables and which countries are responsible for the differences detected between the several data tables?

Some basic definitions
According to the particular structure of data, the data sets take different names, Rivadeneira (2016) shows some common structures and names.

Multi-Block data tables
Several data tables that have a common dimension between them, i.e. either the same rows or the same columns, but not necessarily both.Each group of variables, or each matrix, is usually called a block or a configuration and in general is measured on the same observations, as shown in Figure 1a.In Figure 1a, for each year (k), there is a data table consisting of measurements for a number of J K attributes, but the number of J K attributes can vary for each year, and the number of I observations remains the same.

Three-way data tables
There are data tables that can be presented in a three way, three mode or third dimension data structure as shown in Figure 1b, for each year (k), there is a data table consisting of measurements for a number of J attributes and a number of I observations.In most of the cases, multiway data tables contain the same number of rows and same number of columns

Overview of joint Analysis methods of tables
It is important to consider the different structures of data in order to decide the specific method of data analysis that must be applied such as multi-block methods, three-way analysis K methods, methods on instrumental variables, multiway analysis, and so on.Then, there are a lot of possible methods that researchers can consider for the analysis of multiple data tables, Rivadeneira (2016) divided in two categories: analysis of multiple tables or multi-block and threeway data tables, and methods for two or more multi-blocks data and multi-way data, as shown in Table 1.
There is other classification about the overview of analysis methods for multi-group data in Eslami et al. (2013).

Related works about STATIS
Based only on STATIS methodology as a common framework, some methods of joint analysis of tables have been developed, like DO-ACT, STATIS-4 and others (Abdi et al., 2012).Also there are some applications of this method in several areas, for example: Gonçalves (2010) studied the performance or evolution of economic activities in Portugal analyzing the information obtained along the years by Bank of Portugal and identifying differences and similarities between years and trends over time for those activities; Brás (2012) uses the information provided by the National Statistical Institute of Portugal (INE) and analyzed the evolution of the construction sector in Portugal in order to offer a better understanding of the Portuguese construction sector over the time; Lourenço (2013) analyzed the vulnerability indicators present in the Early Warning Systems (EWS) of European countries, detecting the main economic weaknesses that contributes to predict the occurrence of a crisis in a certain time horizon; Stanimirova et al. (2004) applied STATIS for the exploration of three-way environmental data, and compares its performance with Tucker3 and PARAFAC2 methods; González et al. (2005) analyzed the consumption of electrical power in a hotel during the months that the environmental conditions differ the most, to determine the appropriate actions on the way to its saving; Chaya et al. (2004) applied this methodology for the analysis of time-intensity profiling data, with sensory attributes of ranch salad dressing as variables, and a set of products as objects; Amendola et al. (2006)  Almeida (2012) applied a variant of this methodology called Dual STATIS in a data set that records information about cycles of couples with infertility diagnosis of the Assisted Medical Reproduction Center in Oporto Hospital to understand which variables contribute the most to the differences between the groups of couples.The method allowed us to discover a greater proximity between groups composed of couples who are not pregnant and a greater distance between the groups of couples who become pregnant.Also, Coquet et al. (1996) adapted STATIS, obtaining significant acceleration to study and characterize the internal molecular motions and conformations from a large number of molecular dynamics sets of coordinates, when simulated in a solution by molecular dynamics techniques.

METHODOLOGY
The STATIS methodology was firstly developed in the Statistics and Probability Laboratory of the University of Montpellier II by Escoufier (1973) and his team and by L'Hermier des Plantes (1976) and later developed by Lavit (1988) and Lavit et al. (1994).It lets you extract information from multidimensional data collected in diverse situations or time instants.
The STATIS methodology requires that the observations, countries in our case, must be the same for all data tables, and can be seen as an extension of Principal Component Analysis for the analysis of multiple data tables that measure sets of variables collected on the same observations.STATIS does not require the data tables to have the same number of columns.A sketch of all these steps is provided in Figure 2.
The 5 data tables used for measuring the well-being of societies came from 2011 to 2015 with the same countries described by seventeen to twenty-four quantitative variables, are formed as follows: • The data table from 2011 has 34 countries presented in rows and 17 indicators presented in columns.
• The data tables from 2012 to 2015 have 34 countries presented in rows and 24 indicators presented in columns, for each table.
This work was developed using the software of data analysis SPAD version 8.0, R language and Excel for the implementation of the Statis methodology.

RESULTS AND DISCUSSIONS
The results are presented following the STATIS methodology allowing the analysis of a possible common structure for the data tables that best represents the similarities among the years and, the evolution of the OECD countries described by the indicators considered in the study.The data were centered and reduced because the variables are heterogeneous, with different units.

Interstructure
The first phase of the Statis method compute the cross-product matrix between countries for each data table with their indicators of well-being as a representative object of each table, corresponding to each year under study.Then, a global comparison between data tables is done using the RV coefficient, in which we conclude what years are more similar and what are more different.
So, through the analysis of the Tables 2a and 2b about the RV coefficients and the Euclidean distances, respectively, we can conclude that the years 2012 and 2013, 2014 and 2015 are the closest, with a RV coefficient of 0,98, and a distance between these years of 0,19 and 0,18 respectively; while the pairs of years 2011 and 2014, 2011 and 2015 are the most different, with a RV coefficient of 0,92, and a distance between these years of 0,41.By diagonalization of the matrix of RV coefficients, we obtain a system of axes associated to the eigenvalues as well as the percentage of inertia explained by each axis, then in the plan defined by the first and second axes we can see in the Figure 3

Intrastructure
In this step, we compute the compromise matrix defined as a linear combination of the objects, weighted by the coordinates of the objects on the first axis of the Interstructure.Table 3 contains the scalar products or correlations between normed objects and the Euclidean distances between objects and the compromise, indicating the years closest and the most distant in relation to the compromise.Thus, through the analysis of the scalar products and the Euclidean distances, we can conclude that the years are highly correlated with the compromise, because in general distances are low and scalar products are high, proving that it is possible to find a common structure; being the year of 2013 the one that has the highest correlation with the compromise, and the year of 2011 the one with the smallest correlation.Applying PCA to the compromise object, we considered the first five axes because the first five axes explain 70,97% of the total inertia.Therefore, the following figures: Figures 4, 5, 6 and 8 are the graphical representations for the five axes, which show the countries' compromise Euclidean image in the plan defined by the first and second axes [1,2], the first and third axes [1,3], the first and fourth axes [1,4], and in the plan defined by the first and fifth axes [1,5], respectively.
In these figures, the farthest countries from the center are the countries that most contribute to the formation of the axis and are selected so that the sum of their contributions to the axis is about 80%.Additionally, all the countries selected for the axis have a contribution greater than the average contribution of a country and are well represented on that axis.The coordinates, absolute and relative contributions of the countries in the first five axes were taken into account for the interpretation of the axes, and for the interpretation of the compromise axes, we determined the linear correlations between the initial variables and the compromise axes.
Figure 4 shows that the countries with the greatest importance on the first axis are Switzerland (che), Canada (can), Turkey (tur), Mexico (mex) and Chile (chl).So, the first axis makes a distinction between Turkey, Mexico and Chile (all with negative coordinates) and the countries Switzerland and Canada (with positive coordinates).
The first axis is positively correlated with the variable Rooms per person (RmPS), Household net adjusted disposable income (HDIn), Employment rate (Empl), Personal earnings   The fourth axis (see Figure 6) opposes Mexico (mex) with negative coordinate to the countries with positive coordinates, like Korea and Japan.The fourth axis is negatively correlated with the variable Homicide rate (Homd), during all period.So, the fourth axis apposes Mexico to Korea and Japan because the Mexico with negative coordinate have high values in Homicide rate (Homd), while the countries Korea and Japan with positive coordinates have low values in Homd.Finally, the variable that is more correlated with the fifth axis is Consultation on rulemaking (CoRl) and it is positively correlated, so from Figure 7, this axis opposes the countries Chile (chl), Israel (isr) and Japan (jpn) (positive coordinates) with high values in CoRl to the countries New Zealand (nzl), Australia (aus) (negative coordinates) with low values in the variable CoRl.
The decomposition of the squared distances between pairs of normed objects, Table 4, allows to stand out which countries have contributed more for the differences between couple of years.Greece is responsible for these differences for any couple of years here considered, with the highest contribution between 2012 and 2014 (9,13%) and less significant between 2011 and 2012 (3,74%).Another countries that also generally contribute to the structural differences are Turkey and Mexico, in particular, between 2011 and 2013 (8,67% and 8,37% respectively).
Figure 8 shows the trajectories in the plan [1,2] that explains 50.18% of the total variance.Although the representation of the trajectories is approximated, their irregularities are visibly presented.
The first axis is positively correlated with the variable Rooms per person, Household net adjusted disposable income, Employment rate, Personal earnings, Quality of support network, Water quality and Life expectancy and negatively correlated with Dwellings without basic facilities.Thus, as the trajectory evolution of Estonia, Germany and Iceland is from the left to the right side, it can indicate progress in the OECD well-being conceptual dimensions, in contrast to Greece, Israel and Mexico, whose trajectory evolution is from the right to left side.Hungary and Korea have a more elongated down to up trajectory in relation to the second axis, which can indicate a reduction of unemployed for one year or more, as the second axis is negatively correlated with the variable Long-term unemployment rate, differentiating countries like, Greece and Spain.

CONCLUSIONS
This study has consisted in analyzing, between 2011 and 2015, the similarities and differences between the thirty-four members of the OECD, identifying a common structure, and allowing analyzing through Statis Methodology the well-being described by seventeen to twentyfour indicators or quantitative variables.So, indicators that feature the quality of life and material conditions of these countries were averaged with equal weights and collected in five Better Life Index data tables that deals with data pertinent to measure well-being of societies, from the statistical databases online platform of the OECD.Through the decomposition of the sum of squared distances between normed objects we identified countries that have contributed more to the differences during all period 2011 -2015: Greece, Turkey, Mexico, Spain, Estonia, Chile, Korea, Israel and Slovak Republic.The countries that less contributed were Finland, Ireland and Sweden.
Finally, the trajectory evolution of Estonia, Germany and Iceland indicated progress in the conceptual dimensions of OECD well-being (Quality of life and Material conditions), in contrast to Greece and Mexico.
As a final point, it highlights the important contribution of the Statis methodology for the joint analysis of multiple data tables in the sense that allows to analyze jointly information collected at different time instants.It also has the major advantage of reducing the size of the initial set of data and provides a set of graphical representations, indicative of the relationships of the variables and similarities or oppositions between individuals, as well as their evolution.

Figure 1 :
Figure 1: General structure of data -a) multi-block data or multiple tables; b) three-way data sets or third order tensor.
studied the causes of the socio-economic disparities among the European regions; Figueiredo et al. (2012) analyzed the dynamics and evolution of the structural economic reforms during the period 1989 -1996 where the privatization of state-owned enterprises taking place in the Portuguese banking sector.

Figure 2 :
Figure 2: The main steps of STATIS methodology.
the short distance between the years 2012 and 2013, 2014 and 2015 which indicates proximity or similarity between these years, while the years 2011 and 2014, 2011 and 2015 are more distant between them, and which show the same results we had from Table2a and 2b.

(
Pear), Quality of support network (QSNw), Water quality (WatQ), Life expectancy (LfEx) and negatively correlated with Dwellings without basic facilities (DwoF), during all period.Therefore, the first axis opposes Switzerland and Canada with high values in variables RmPS, HDIn, Empl, PEar, QSNw, WatQ, LfEx and low value in Dwellings without basic facilities to Turkey, Mexico and Chile with low values in the variables RmPS, HDIn, Empl, PEar, QSNw, WatQ, LfEx and high value in DwoF.

Figure 7 : 2 ]Figure 7 :
Figure 7: Countries' trajectories in the plan [1, 2] Using the matrix of the RV coefficients we concluded that the years 2012 and 2013, 2014 and 2015 are the closest or similar between them; while the pairs of years 2011 and 2014, 2011 and 2015 are the most different or more distant between them.In general, the Statis Methodology opposed Switzerland and Canada with Turkey, Mexico and Chile, because Switzerland and Canada present high values in variables Rooms per person, Household net adjusted disposable income, Employment rate, Personal earnings, Quality of support network, Water quality, Life expectancy, and low values in the variable Dwellings without basic facilities; while Turkey, Mexico and Chile present high values in the variable Dwellings without basic facilities, during all period.Slovak Republic (svk), Hungary (hun), and Greece (grc) present low values in the number of persons who have been unemployed for one year or more, are opposed to Mexico, that has more unemployed persons for one year or more.Spain opposes to the countries Korea and Japan in the variable Student skills, Spain has low value in variable Student skills and opposes to the countries Korea and Japan that have high values, during all period.Mexico with high values in Homicide rate opposes with countries like, Korea and Japan with low values in Homicide rate.Chile, Israel and Japan have high values in the variable Consultation on rule-making, while the countries New Zealand, Australia have low values in this variable.