How to create heatmap and correlogram chart in R programming
Heatmap
A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. This page displays many examples built with R, both static and interactive.
Using the heatmap() function
The heatmap() function is natively provided in R. It produces high quality matrix and offers statistical tools to normalize input data, run clustering algorithm and visualize the result with dendrograms. It is one of the very rare case where I prefer base R to ggplot2.
Building heatmap with R
How to do it: below is the most basic heatmap you can build in base R, using the heatmap() function with no parameters. Note that it takes as input a matrix. If you have a data frame, you can convert it to a matrix with as.matrix(), but you need numeric variables only.
How to read it: each column is a variable. Each observation is a row. Each square is a value, the closer to yellow the higher. You can transpose the matrix with t(data) to swap X and Y axis.
Note: as you can see this heatmap is not very insightful: all the variation is absorbed by the hp and disp variables that have very high values compared to the others.
# The mtcars dataset:
data <- as.matrix(mtcars)
# Default Heatmap
heatmap(data)
Normalization
Normalizing the matrix is done using the scale argument of the heatmap() function. It can be applied to row or to column. Here the column option is chosen, since we need to absorb the variation between column.
# Use 'scale' to normalize
heatmap(data, scale="column")
You may have noticed that order of both rows and columns is different compare to the native mtcar matrix. This is because heatmap() reorders both variables and observations using a clustering algorithm: it computes the distance between each pair of rows and columns and try to order them by similarity.
Moreover, the corresponding dendrograms are provided beside the heatmap. We can avoid it and just visualize the raw matrix: use the Rowv and Colv arguments as follow.
heatmap(data, Colv = NA, Rowv = NA, scale="column")
Color palette
There are several ways to custom the color palette:
- use the native palettes of R: terrain.color(), rainbow(), heat.colors(), topo.colors() or cm.colors()
- use the palettes proposed by RColorBrewer.
There are several ways to custom the color palette:
- use the native palettes of R: terrain.color(), rainbow(), heat.colors(), topo.colors() or cm.colors()
· use the palettes proposed by RColorBrewer.
# 1: native palette from R
heatmap(data, scale="column", col = cm.colors(256))
heatmap(data, scale="column", col = terrain.colors(256))
# 2: Rcolorbrewer palette
library(RColorBrewer)
coul <- colorRampPalette(brewer.pal(8, "PiYG"))(25)
heatmap(data, scale="column", col = coul)
Custom Layout
You can custom title & axis titles with the usual main and xlab/ylab arguments (left).
You can also change labels with labRow/colRow and their size with cexRow/cexCol.
# Add classic arguments like main title and axis title
heatmap(data, Colv = NA, Rowv = NA, scale="column", col = coul, xlab="variable", ylab="car", main="heatmap")
# Custom x and y labels with cexRow and labRow (col respectively)
heatmap(data, scale="column", cexRow=1.5, labRow=paste("new_", rownames(data),sep=""), col= colorRampPalette(brewer.pal(8, "Blues"))(25))Add color beside heatmap
Often, heatmap intends to compare the observed structure with an expected one.
You can add a vector of color beside the heatmap to represents the expected structure using the RowSideColors argument.
# Example: grouping from the first letter:
my_group <- as.numeric(as.factor(substr(rownames(data), 1 , 1)))
colSide <- brewer.pal(9, "Set1")[my_group]
colMain <- colorRampPalette(brewer.pal(8, "Blues"))(25)
heatmap(data, Colv = NA, Rowv = NA, scale="column" , RowSideColors=colSide, col=colMain )
A correlogram or correlation matrix allows to analyse the relationship between each pair of numeric variables in a dataset. It gives a quick overview of the whole dataset. It is more used for exploratory purpose than explanatory.
Using the ggally package
The GGally package offers great options to build correlograms. The ggpairs() function build a classic correlogram with scatterplot, correlation coefficient and variable distribution. On top of that, it is possible to inject ggplot2 code, for instance to color categories.
Correlation matrix with ggally
Scatterplot matrix with ggpairs()
The ggpairs() function of the GGally package allows to build a great scatterplot matrix.
Scatterplots of each pair of numeric variable are drawn on the left part of the figure. Pearson correlation is displayed on the right. Variable distribution is available on the diagonal.
# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables
library(GGally)
# Create data
data <- data.frame( var1 = 1:100 + rnorm(100,sd=20), v2 = 1:100 + rnorm(100,sd=27), v3 = rep(1, 100) + rnorm(100, sd = 1))
data$v4 = data$var1 ** 2
data$v5 = -(data$var1 ** 2)
# Check correlations (as scatterplots), distribution and print corrleation coefficient
ggpairs(data, title="correlogram with ggpairs()")
Visualize correlation with ggcorr()
The ggcorr() function allows to visualize the correlation of each pair of variable as a square. Note that the method argument allows to pick the correlation type you desire.
# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables
library(GGally)
# Create data
data <- data.frame( var1 = 1:100 + rnorm(100,sd=20), v2 = 1:100 + rnorm(100,sd=27), v3 = rep(1, 100) + rnorm(100, sd = 1))
data$v4 = data$var1 ** 2
data$v5 = -(data$var1 ** 2)
# Check correlation between variables
#cor(data)
# Nice visualization of correlations
ggcorr(data, method = c("everything", "pearson"))
Split by group
It is possible to use ggplot2 aesthetics on the chart, for instance to color each category.
# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables
library(GGally)
# From the help page:
data(flea)
ggpairs(flea, columns = 2:4, ggplot2::aes(colour=species))
Change plot types
Change the type of plot used on each part of the correlogram. This is done with the upper and lower argument.
# Quick display of two cabapilities of GGally, to assess the distribution and correlation of variables
library(GGally)
# From the help page:
data(tips, package = "reshape")
ggpairs(
tips[, c(1, 3, 4, 2)],
upper = list(continuous = "density", combo = "box_no_facet"),
lower = list(continuous = "points", combo = "dot_no_facet")
)
Using the corrgram package
The corrgram is another great alternative to build correlograms. You can choose what to display in the upper, lower and diagonal part of the figure: scatterplot, pie chart, text, ellipse and more.
Correlogram with the corrgram library
The corrgram package allows to build correlogram. The output allows to check the relationship between each pair of a set of numeric variable.
Relationship can be visualized with different methods:
- panel.ellipse to display ellipses
- panel.shade for coloured squares
- panel.pie for pie charts
- panel.pts for scatterplots
# Corrgram librarylibrary(corrgram) # mtcars dataset is natively available in R# head(mtcars) # Firstcorrgram(mtcars, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Car Milage Data in PC2/PC1 Order") # Secondcorrgram(mtcars, order=TRUE, lower.panel=panel.ellipse, upper.panel=panel.pts, text.panel=panel.txt, diag.panel=panel.minmax, main="Car Milage Data in PC2/PC1 Order") # Thirdcorrgram(mtcars, order=NULL, lower.panel=panel.shade, upper.panel=NULL, text.panel=panel.txt, main="Car Milage Data (unsorted)")Correlogram with the ellipse package
Scatterplot matrix with ggpairs()
The ellipse package allows to build a correlogram thanks to the plotcorr() function.
First of all, you have to compute the correlation matrix of your dataset using the cor() function of R. Each correliation will be represented as an ellipse by the plotcorr() function. Color, shape and orientation depend on the correlation value.
# Libraries
library(ellipse)
library(RColorBrewer)
# Use of the mtcars data proposed by R
data <- cor(mtcars)
# Build a Pannel of 100 colors with Rcolor Brewer
my_colors <- brewer.pal(5, "Spectral")
my_colors <- colorRampPalette(my_colors)(100)
# Order the correlation matrix
ord <- order(data[1, ])
data_ord <- data[ord, ord]
plotcorr(data_ord , col=my_colors[data_ord*50+50] , mar=c(1,1,1,1) )
Scatterplot matrix with the native plot() function
This is a scatterplot matrix built with the scatterplotMatrix() function of the car package.
See more correlogram examples in the dedicated section.
Note the |cyl syntax: it means that categories available in the cyl variable must be represented distinctly (color, shape, size..).
# Packages
library(car)
library(RColorBrewer) # for the color palette
data <- mtcars
my_colors <- brewer.pal(nlevels(as.factor(data$cyl)), "Set2")
scatterplotMatrix(~mpg+disp+drat|cyl, data=data ,
reg.line="" , smoother="", col=my_colors ,
smoother.args=list(col="grey") , cex=1.5 ,
pch=c(15,16,17) ,
main="Scatter plot with Three Cylinder Options"
)
Comments
Post a Comment