R Code: How to visualise more than 2 dimensions ?

0
Mutivariate plot

In the post How to visualize more than 2 dimension?, several plots and techniques were shown to plot data with numerous dimensions. In this post, the R code will be shown and explained.

The data are taken from Kaggle and the World happiness report 2016 data set. The data are well structured and do not need a lot of processing.

Reading the data

First, you’ll need to create a project and put the data from kaggle in its directory. Now, you can read the data and take a look at it.

require(data.table)
HappinessData=data.table(read.csv('2016.csv'))
print(head(HappinessData))

The data set has 13 columns, 11 are numerical variables and the first two are categorical. The variables 3 to 6 (“Happiness.Rank”, “Happiness.Score”, “Lower.Confidence.Interval”, “Upper.Confidence.Interval”) are strongly correlated since they all depends on the Happiness Score.

Quantitative variables plot

1. Colors and size to account for quantitative variable

Quantitative1

require(ggplot)
ggplot(HappinessData,aes(x=Economy..GDP.per.Capita.,y=Happiness.Score,color=Freedom,size=Health..Life.Expectancy.))+
geom_point(alpha=0.4)+
 xlab('GDP per capita')+ylab('Happiness score')

Here the ggplot magic is happening, with only 2 lines, you can easily plot a rather complicated plot.

2. Scatter plot matrix

scatter plot matrix

ggpairs(HappinessData[,c(4,7:10),with=F])

Again, this is gg magic, the plot only need one line.
You can go further by adding a different color for each region for instance:

ggpairs(HappinessData[,c(2,4,7:10),with=F],mapping =aes(color=Region,alpha=0.3))

3. Parallel axis plot

Parralel coordinates

The package GGally provides a function returning a parallel axis ggplot.

require('GGally')
ggparcoord(HappinessData,c(4,7:10),alphaLines=0.5,groupColumn=2)+
ggtitle('Parallel axis diagram of happinness')

Since the function is returning a ggplot, you can also add a facet wrap easily:

ggparcoord(HappinessData,c(4,7:10),alphaLines=0.5,groupColumn=1)+
ggtitle('Parallel axis diagram of happinness')+
 facet_wrap(~as.character(Region),ncol = 2)+ 
theme(legend.position="none")

4. Correlation plot

To plot correlation matrix, the corrplot package and the cor functions are the easiest way:

Correlation plot
Correlation plot with hierarchical clustering of variables
require(corrplot)
corrplot(cor(HappinessData[,c(4,7:13),with=F]),order = 'hclust',addrect = 3)

The order option indicates according to which criteria the variables should be ordered. Here a hierarchical clustering is used. The rectangles show the closest variables according to this criterion.

Sequential data and hierarchical data

To plot these two type of data, the package D3partitionR was used. The random option which use random data to plot these plots was used.

require(D3partitionR)
##Sequential plot
D3partitionR(T,type = 'sunburst',trail = T)
D3partitionR(T,type = 'collapsibleTree',trail = T)

##Hierachical plot
D3partitionR(T,type = 'treeMap',trail = T)
D3partitionR(T,type = 'circleTreeMap',trail = T)

If you have other packages or ways to do the plot, please share and comment.
Thanks for reading.

LEAVE A REPLY

Please enter your comment!
Please enter your name here