How to create density plot using ggplot2 in R Programming
Density plot
A density plot is a representation
of the distribution of a numeric variable. It uses a kernel density estimate to
show the probability density function of the variable
A density plot shows the distribution of a numeric variable.
In ggplot2,
the geom_density() function takes care of the kernel density
estimation and plot the results.
Density plots are built in ggplot2
thanks to the geom_density geom. Only one numeric variable is need as
input.
Arguments
base_family,
base_size
base font family and size
plot_title_family,
plot_title_face, plot_title_size, plot_title_margin
plot
title family, face, size and margi
subtitle_family,
subtitle_face, subtitle_size
plot
subtitle family, face and size
subtitle_margin
plot
subtitle margin bottom (single numeric value)
strip_text_family,
strip_text_face, strip_text_size
facet
label font family, face and size
caption_family,
caption_face, caption_size, caption_margin
plot
caption family, face, size and margin
axis_title_family,
axis_title_face, axis_title_size
axis
title font family, face and size
axis_title_just
axis
title font justification, one of [blmcrt]
plot_margin
plot margin (specify with ggplot2::margin)
grid
panel
grid
(TRUE, FALSE, or a combination of X, x, Y, y) axis add x or y axes? TRUE,
FALSE, "xy" ticks ticks if TRUE add ticks
To create a
density plot in R using ggplot2, we use the geom_density() function of the
ggplot2 package.
Syntax: ggplot( aes(x)) + geom_density( fill,
color, alpha)
Parameters:
·
fill: background color below the plot
·
color: the color of the plotline
·
alpha: transparency of graph
# Libraries
library(ggplot2)
library(dplyr)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv",
header=TRUE)
# Make the histogram
data %>%
filter(
price<300 ) %>%
ggplot( aes(x=price))
+
geom_density(fill="#69b3a2",
color="#e9ecef", alpha=0.8)
Density plots are built in ggplot2
thanks to the geom_density geom.
Only one numeric variable is need as input.
# Librarieslibrary(ggplot2)library(dplyr) # Load dataset from githubdata <-read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv",header=TRUE)
# Make the histogramdata%>%
filter( price<300)%>%
ggplot(aes(x=price))+
geom_density(fill="#69b3a2",color="#e9ecef",alpha=0.8)
Custom with theme_ipsum
The function is setup in such a way that you can
customize your own one by just wrapping the call and changing the parameters.
The hrbrthemes package
offer a set of pre-built themes for your charts. I am personnaly a big fan of
the theme_ipsum:
easy to use and makes your chart look more professional:
# Librarieslibrary(ggplot2)library(dplyr)library(hrbrthemes) # Load dataset from githubdata <-read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv",header=TRUE)
# Make the histogramdata%>%
filter( price<300)%>%
ggplot(aes(x=price))+
geom_density(fill="#69b3a2",color="#e9ecef",alpha=0.8)+
ggtitle("Night price distribution of Airbnb appartements")+
theme_ipsum()
Mirror density chart with ggplot2
A density chart is built thanks to
the geom_density geom
of ggplot2 (see a basic
example). It is possible to plot this density upside down by
specifying y = -..density... It is advised to use geom_label to
indicate variable names.
# Librarieslibrary(ggplot2)library(hrbrthemes) # Dummy datadata <-data.frame(
var1 =rnorm(1000),
var2 =rnorm(1000,mean=2)
) # Chartp <-ggplot(data,aes(x=x) )+
# Topgeom_density(aes(x =var1,y =..density..),fill="#69b3a2")+
geom_label(aes(x=4.5,y=0.25,label="variable1"),color="#69b3a2")+
# Bottomgeom_density(aes(x =var2,y =-..density..),fill="#404080")+
geom_label(aes(x=4.5,y=-0.25,label="variable2"),color="#404080")+
theme_ipsum()+
xlab("value of x")
p
Histogram with geom_histogram
Of course it is possible to apply
exactly the same technique using geom_histogram instead of geom_density to
get a mirror histogram:
# Chartp <-ggplot(data,aes(x=x) )+
geom_histogram(aes(x =var1,y =..density..),fill="#69b3a2")+
geom_label(aes(x=4.5,y=0.25,label="variable1"),color="#69b3a2")+
geom_histogram(aes(x =var2,y =-..density..),fill="#404080")+
geom_label(aes(x=4.5,y=-0.25,label="variable2"),color="#404080")+
theme_ipsum()+
xlab("value of x")
p
Density chart with several groups
A multi density chart is a density chart where
several groups are represented. It allows to compare their distribution. The
issue with this kind of chart is that it gets easily cluttered:
groups overlap each other and the figure gets unreadable.
An easy workaround is to use transparency.
However, it won’t solve the issue completely and is is often better to consider
the examples suggested further in this document.
# Librarieslibrary(ggplot2)library(hrbrthemes)library(dplyr)library(tidyr)library(viridis) # The diamonds dataset is natively available with R. # Without transparency (left)p1 <-ggplot(data=diamonds,aes(x=price,group=cut,fill=cut))+
geom_density(adjust=1.5)+
theme_ipsum()p1 # With transparency (right)p2 <-ggplot(data=diamonds,aes(x=price,group=cut,fill=cut))+
geom_density(adjust=1.5,alpha=.4)+
theme_ipsum()p2
Here is an example with another
dataset where it works much better. Groups have very distinct
distribution, it is easy to spot them even if on the same chart.
# Load dataset from githubdata <-read.table("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv",header=TRUE,sep=",")
data <-data%>%
gather(key="text",value="value")%>%
mutate(text =gsub("\\."," ",text))%>%
mutate(value =round(as.numeric(value),0))
# A dataframe for annotationsannot <-data.frame(
text =c("Almost No Chance","About Even","Probable","Almost Certainly"),
x =c(5,53,65,79),
y =c(0.15,0.4,0.06,0.1)
) # Plotdata%>%
filter(text%in% c("Almost No Chance","About Even","Probable","Almost Certainly"))%>%
ggplot(aes(x=value,color=text,fill=text))+
geom_density(alpha=0.6)+
scale_fill_viridis(discrete=TRUE)+
scale_color_viridis(discrete=TRUE)+
geom_text(data=annot,aes(x=x,y=y,label=text,color=text),hjust=0,size=4.5)+
theme_ipsum()+
theme(legend.position="none"
)+
ylab("")+
xlab("Assigned Probability (%)")
Small Multiple with facet_wrap()
facet_wrap() makes
a long ribbon of panels (generated by any number of variables) and wraps it
into 2d. This is useful if you have a single variable with many levels and want
to arrange the plots in a more space efficient manner.
You can control how the ribbon is
wrapped into a grid with ncol, nrow, as.table and dir. ncol and nrow control how
many columns and rows (you only need to set one). as.table controls
whether the facets are laid out like a table (TRUE), with highest values at the
bottom-right, or a plot (FALSE),
with the highest values at the top-right. dir controls the direction of
wrap: horizontal or vertical.
facet_grid() lays
out plots in a 2d grid, as defined by a formula:
§ . ~ a spreads the values
of a across
the columns. This direction
facilitates comparisons of y position, because the vertical scales are aligned.
# Using Small multipleggplot(data=diamonds,aes(x=price,group=cut,fill=cut))+
geom_density(adjust=1.5)+
theme_ipsum()+
facet_wrap(~cut)+
theme(legend.position="none",
panel.spacing =unit(0.1,"lines"),
axis.ticks.x=element_blank()
)
Stacked density chart
Another solution is to stack the
groups. This allows to see what group is the most frequent for a given value,
but it makes it hard to understand the distribution of a group that is not on
the bottom of the chart.
# Stacked density plot:p <-ggplot(data=diamonds,aes(x=price,group=cut,fill=cut))+
geom_density(adjust=1.5,position="fill")+
theme_ipsum()p
Comments
Post a Comment