How to create heatmap and correlogram chart in R programming

  

Heatmap

heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. This page displays many examples built with R, both static and interactive.

Using the heatmap() function


The heatmap() function is natively provided in R. It produces high quality matrix and offers statistical tools to normalize input data, run clustering algorithm and visualize the result with dendrograms. It is one of the very rare case where I prefer base R to ggplot2.

Building heatmap with R

How to do it: below is the most basic heatmap you can build in base R, using the heatmap() function with no parameters. Note that it takes as input a matrix. If you have a data frame, you can convert it to a matrix with as.matrix(), but you need numeric variables only.

How to read it: each column is a variable. Each observation is a row. Each square is a value, the closer to yellow the higher. You can transpose the matrix with t(data) to swap X and Y axis.

Note: as you can see this heatmap is not very insightful: all the variation is absorbed by the hp and disp variables that have very high values compared to the others.

# The mtcars dataset:

data <- as.matrix(mtcars)

 # Default Heatmap

heatmap(data)

 

Normalization

Normalizing the matrix is done using the scale argument of the heatmap() function. It can be applied to row or to column. Here the column option is chosen, since we need to absorb the variation between column.

# Use 'scale' to normalize

heatmap(data, scale="column")

 Dendrogram and Reordering

You may have noticed that order of both rows and columns is different compare to the native mtcar matrix. This is because heatmap() reorders both variables and observations using a clustering algorithm: it computes the distance between each pair of rows and columns and try to order them by similarity.

Moreover, the corresponding dendrograms are provided beside the heatmap. We can avoid it and just visualize the raw matrix: use the Rowv and Colv arguments as follow.

heatmap(data, Colv = NA, Rowv = NA, scale="column")

Color palette

There are several ways to custom the color palette:

  • use the native palettes of R: terrain.color()rainbow()heat.colors()topo.colors() or cm.colors()
  • use the palettes proposed by RColorBrewer.

There are several ways to custom the color palette:

  • use the native palettes of R: terrain.color()rainbow()heat.colors()topo.colors() or cm.colors()

·         use the palettes proposed by RColorBrewer.

# 1: native palette from R

heatmap(data, scale="column", col = cm.colors(256))

heatmap(data, scale="column", col = terrain.colors(256))

 

# 2: Rcolorbrewer palette

library(RColorBrewer)

coul <- colorRampPalette(brewer.pal(8, "PiYG"))(25)

heatmap(data, scale="column", col = coul)

 

Custom Layout

 

You can custom title & axis titles with the usual main and xlab/ylab arguments (left).

You can also change labels with labRow/colRow and their size with cexRow/cexCol.

# Add classic arguments like main title and axis title

heatmap(data, Colv = NA, Rowv = NA, scale="column", col = coul, xlab="variable", ylab="car", main="heatmap")

 # Custom x and y labels with cexRow and labRow (col respectively)

heatmap(data, scale="column", cexRow=1.5, labRow=paste("new_", rownames(data),sep=""), col= colorRampPalette(brewer.pal(8, "Blues"))(25))

 

Add color beside heatmap

 Often, heatmap intends to compare the observed structure with an expected one.

You can add a vector of color beside the heatmap to represents the expected structure using the RowSideColors argument.

# Example: grouping from the first letter:

my_group <- as.numeric(as.factor(substr(rownames(data), 1 , 1)))

colSide <- brewer.pal(9, "Set1")[my_group]

colMain <- colorRampPalette(brewer.pal(8, "Blues"))(25)

heatmap(data, Colv = NA, Rowv = NA, scale="column" , RowSideColors=colSide, col=colMain   )

 

 Correlogram

correlogram or correlation matrix allows to analyse the relationship between each pair of numeric variables in a dataset. It gives a quick overview of the whole dataset. It is more used for exploratory purpose than explanatory.

Using the ggally package

The GGally package offers great options to build correlograms. The ggpairs() function build a classic correlogram with scatterplot, correlation coefficient and variable distribution. On top of that, it is possible to inject ggplot2 code, for instance to color categories.

Correlation matrix with ggally

Scatterplot matrix with ggpairs()

The ggpairs() function of the GGally package allows to build a great scatterplot matrix.

Scatterplots of each pair of numeric variable are drawn on the left part of the figure. Pearson correlation is displayed on the right. Variable distribution is available on the diagonal.

# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables

library(GGally)

 # Create data

data <- data.frame( var1 = 1:100 + rnorm(100,sd=20), v2 = 1:100 + rnorm(100,sd=27), v3 = rep(1, 100) + rnorm(100, sd = 1))

data$v4 = data$var1 ** 2

data$v5 = -(data$var1 ** 2)

 # Check correlations (as scatterplots), distribution and print corrleation coefficient

ggpairs(data, title="correlogram with ggpairs()")

Visualize correlation with ggcorr()

The ggcorr() function allows to visualize the correlation of each pair of variable as a square. Note that the method argument allows to pick the correlation type you desire.

# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables

library(GGally)

 # Create data

data <- data.frame( var1 = 1:100 + rnorm(100,sd=20), v2 = 1:100 + rnorm(100,sd=27), v3 = rep(1, 100) + rnorm(100, sd = 1))

data$v4 = data$var1 ** 2

data$v5 = -(data$var1 ** 2)

 # Check correlation between variables

#cor(data)

 # Nice visualization of correlations

ggcorr(data, method = c("everything", "pearson"))

Split by group

It is possible to use ggplot2 aesthetics on the chart, for instance to color each category.

# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables

library(GGally)

 # From the help page:

data(flea)

ggpairs(flea, columns = 2:4, ggplot2::aes(colour=species))

Change plot types

Change the type of plot used on each part of the correlogram. This is done with the upper and lower argument.

# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables

library(GGally)

 # From the help page:

data(tips, package = "reshape")

ggpairs(

  tips[, c(1, 3, 4, 2)],

  upper = list(continuous = "density", combo = "box_no_facet"),

  lower = list(continuous = "points", combo = "dot_no_facet")

)

Using the corrgram package

The corrgram is another great alternative to build correlograms. You can choose what to display in the upper, lower and diagonal part of the figure: scatterplot, pie chart, text, ellipse and more.

Correlogram with the corrgram library

The corrgram package allows to build correlogram. The output allows to check the relationship between each pair of a set of numeric variable.

Relationship can be visualized with different methods:

  • panel.ellipse to display ellipses
  • panel.shade for coloured squares
  • panel.pie for pie charts
  • panel.pts for scatterplots
# Corrgram library
library(corrgram)
 # mtcars dataset is natively available in R
# head(mtcars)
 # First
corrgram(mtcars, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Car Milage Data in PC2/PC1 Order") 
 # Second
corrgram(mtcars, order=TRUE, lower.panel=panel.ellipse, upper.panel=panel.pts, text.panel=panel.txt, diag.panel=panel.minmax, main="Car Milage Data in PC2/PC1 Order") 
 # Third
corrgram(mtcars, order=NULL, lower.panel=panel.shade, upper.panel=NULL, text.panel=panel.txt, main="Car Milage Data (unsorted)")

 

Correlogram with the ellipse package

Scatterplot matrix with ggpairs()

The ellipse package allows to build a correlogram thanks to the plotcorr() function.

First of all, you have to compute the correlation matrix of your dataset using the cor() function of R. Each correliation will be represented as an ellipse by the plotcorr() function. Color, shape and orientation depend on the correlation value.

# Libraries

library(ellipse)

library(RColorBrewer)

 # Use of the mtcars data proposed by R

data <- cor(mtcars)

 # Build a Pannel of 100 colors with Rcolor Brewer

my_colors <- brewer.pal(5, "Spectral")

my_colors <- colorRampPalette(my_colors)(100)

 # Order the correlation matrix

ord <- order(data[1, ])

data_ord <- data[ord, ord]

plotcorr(data_ord , col=my_colors[data_ord*50+50] , mar=c(1,1,1,1)  )

 Correlogram with the car package

Scatterplot matrix with the native plot() function

This is a scatterplot matrix built with the scatterplotMatrix() function of the car package.

See more correlogram examples in the dedicated section.

Note the |cyl syntax: it means that categories available in the cyl variable must be represented distinctly (color, shape, size..).

# Packages

library(car)

library(RColorBrewer) # for the color palette

 # Let's use the car dataset natively available in R

data <- mtcars

 # Make the plot

my_colors <- brewer.pal(nlevels(as.factor(data$cyl)), "Set2")

scatterplotMatrix(~mpg+disp+drat|cyl, data=data ,

      reg.line="" , smoother="", col=my_colors ,

      smoother.args=list(col="grey") , cex=1.5 ,

      pch=c(15,16,17) ,

      main="Scatter plot with Three Cylinder Options"

      )

 

Comments

Popular posts from this blog

How to create Animated 3d chart with R.

Linux/Unix Commands frequently used

R Programming Introduction