How to create density plot using ggplot2 in R Programming

 

Density plot

A density plot is a representation of the distribution of a numeric variable. It uses a kernel density estimate to show the probability density function of the variable 

density plot shows the distribution of a numeric variable. In ggplot2, the geom_density() function takes care of the kernel density estimation and plot the results.

Density plots are built in ggplot2 thanks to the geom_density geom. Only one numeric variable is need as input.

Arguments

base_family, base_size

 base font family and size

plot_title_family, plot_title_face, plot_title_size, plot_title_margin

plot title family, face, size and margi

subtitle_family, subtitle_face, subtitle_size

plot subtitle family, face and size

subtitle_margin

plot subtitle margin bottom (single numeric value)

strip_text_family, strip_text_face, strip_text_size

facet label font family, face and size

caption_family, caption_face, caption_size, caption_margin

plot caption family, face, size and margin

axis_title_family, axis_title_face, axis_title_size

axis title font family, face and size

 axis_title_just

axis title font justification, one of [blmcrt]

 plot_margin plot margin (specify with ggplot2::margin)

grid panel

grid (TRUE, FALSE, or a combination of X, x, Y, y) axis add x or y axes? TRUE, FALSE, "xy" ticks ticks if TRUE add ticks

To create a density plot in R using ggplot2, we use the geom_density() function of the ggplot2 package.

Syntax: ggplot( aes(x)) + geom_density( fill, color, alpha)

Parameters:

·         fill: background color below the plot

·         color: the color of the plotline

·         alpha: transparency of graph

 

# Libraries

library(ggplot2)

library(dplyr)

 

# Load dataset from github

data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)

 

# Make the histogram

data %>%

  filter( price<300 ) %>%

  ggplot( aes(x=price)) +

    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.8)

Density plots are built in ggplot2 thanks to the geom_density geom. Only one numeric variable is need as input.

 

# Libraries
library(ggplot2)
library(dplyr)
 # Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
 # Make the histogram
data %>%
  filter( price<300 ) %>%
  ggplot( aes(x=price)) +
    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.8)

 

Custom with theme_ipsum

 

The function is setup in such a way that you can customize your own one by just wrapping the call and changing the parameters.

The hrbrthemes package offer a set of pre-built themes for your charts. I am personnaly a big fan of the theme_ipsum: easy to use and makes your chart look more professional:

# Libraries
library(ggplot2)
library(dplyr)
library(hrbrthemes)
 # Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
 # Make the histogram
data %>%
  filter( price<300 ) %>%
  ggplot( aes(x=price)) +
    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.8) +
    ggtitle("Night price distribution of Airbnb appartements") +
    theme_ipsum()

 

Mirror density chart with ggplot2

A density chart is built thanks to the geom_density geom of ggplot2 (see a basic example). It is possible to plot this density upside down by specifying y = -..density... It is advised to use geom_label to indicate variable names.

# Libraries
library(ggplot2)
library(hrbrthemes)
 # Dummy data
data <- data.frame(
  var1 = rnorm(1000),
  var2 = rnorm(1000, mean=2)
)
 # Chart
p <- ggplot(data, aes(x=x) ) +
  # Top
  geom_density( aes(x = var1, y = ..density..), fill="#69b3a2" ) +
  geom_label( aes(x=4.5, y=0.25, label="variable1"), color="#69b3a2") +
  # Bottom
  geom_density( aes(x = var2, y = -..density..), fill= "#404080") +
  geom_label( aes(x=4.5, y=-0.25, label="variable2"), color="#404080") +
  theme_ipsum() +
  xlab("value of x")
 p

 

Histogram with geom_histogram

Of course it is possible to apply exactly the same technique using geom_histogram instead of geom_density to get a mirror histogram:

# Chart
p <- ggplot(data, aes(x=x) ) +
  geom_histogram( aes(x = var1, y = ..density..), fill="#69b3a2" ) +
  geom_label( aes(x=4.5, y=0.25, label="variable1"), color="#69b3a2") +
  geom_histogram( aes(x = var2, y = -..density..), fill= "#404080") +
  geom_label( aes(x=4.5, y=-0.25, label="variable2"), color="#404080") +
  theme_ipsum() +
  xlab("value of x")
p
 

Density chart with several groups

A multi density chart is a density chart where several groups are represented. It allows to compare their distribution. The issue with this kind of chart is that it gets easily cluttered: groups overlap each other and the figure gets unreadable.

An easy workaround is to use transparency. However, it won’t solve the issue completely and is is often better to consider the examples suggested further in this document.

# Libraries
library(ggplot2)
library(hrbrthemes)
library(dplyr)
library(tidyr)
library(viridis)
 # The diamonds dataset is natively available with R.
 # Without transparency (left)
p1 <- ggplot(data=diamonds, aes(x=price, group=cut, fill=cut)) +
    geom_density(adjust=1.5) +
    theme_ipsum()
p1
 
# With transparency (right)
p2 <- ggplot(data=diamonds, aes(x=price, group=cut, fill=cut)) +
    geom_density(adjust=1.5, alpha=.4) +
    theme_ipsum()
p2

Here is an example with another dataset where it works much better. Groups have very distinct distribution, it is easy to spot them even if on the same chart.

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv", header=TRUE, sep=",")
data <- data %>%
  gather(key="text", value="value") %>%
  mutate(text = gsub("\\.", " ",text)) %>%
  mutate(value = round(as.numeric(value),0))
 # A dataframe for annotations
annot <- data.frame(
  text = c("Almost No Chance", "About Even", "Probable", "Almost Certainly"),
  x = c(5, 53, 65, 79),
  y = c(0.15, 0.4, 0.06, 0.1)
)
 # Plot
data %>%
  filter(text %in% c("Almost No Chance", "About Even", "Probable", "Almost Certainly")) %>%
  ggplot( aes(x=value, color=text, fill=text)) +
    geom_density(alpha=0.6) +
    scale_fill_viridis(discrete=TRUE) +
    scale_color_viridis(discrete=TRUE) +
    geom_text( data=annot, aes(x=x, y=y, label=text, color=text), hjust=0, size=4.5) +
    theme_ipsum() +
    theme(
      legend.position="none"
    ) +
    ylab("") +
    xlab("Assigned Probability (%)")
 

Small Multiple with facet_wrap()

 Facet wrap

facet_wrap() makes a long ribbon of panels (generated by any number of variables) and wraps it into 2d. This is useful if you have a single variable with many levels and want to arrange the plots in a more space efficient manner.

You can control how the ribbon is wrapped into a grid with ncolnrowas.table and dirncol and nrow control how many columns and rows (you only need to set one). as.table controls whether the facets are laid out like a table (TRUE), with highest values at the bottom-right, or a plot (FALSE), with the highest values at the top-right. dir controls the direction of wrap: horizontal or vertical.

facet_grid() lays out plots in a 2d grid, as defined by a formula:

§  . ~ a spreads the values of a across the columns. This direction
facilitates comparisons of y position, because the vertical scales are aligned.

 

# Using Small multiple
ggplot(data=diamonds, aes(x=price, group=cut, fill=cut)) +
    geom_density(adjust=1.5) +
    theme_ipsum() +
    facet_wrap(~cut) +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      axis.ticks.x=element_blank()
    )

 

Stacked density chart

Another solution is to stack the groups. This allows to see what group is the most frequent for a given value, but it makes it hard to understand the distribution of a group that is not on the bottom of the chart.


# Stacked density plot:
p <- ggplot(data=diamonds, aes(x=price, group=cut, fill=cut)) +
    geom_density(adjust=1.5, position="fill") +
    theme_ipsum()
p

 

Comments

Popular posts from this blog

How to create Animated 3d chart with R.

Linux/Unix Commands frequently used

R Programming Introduction