How to create violin chart using ggplot2 in R programming
Violin Chart
Violin charts can be produced with ggplot2 thanks
to the geom_violin() function.
The first chart of
the series below describes its basic utilization and explain how to build
violin chart from different input format.
Building a violin plot with ggplot2 is pretty straightforward thanks
to the dedicated geom_violin() function.
# Librarylibrary(ggplot2) # create a datasetdata <-data.frame(
name=c(rep("A",500),rep("B",500),rep("B",500),rep("C",20),rep('D',100) ),
value=c(rnorm(500,10,5),rnorm(500,13,1),rnorm(500,18,1),rnorm(20,25,4),rnorm(100,12,1) )
) # Most basic violin chartp <-ggplot(data,aes(x=name,y=value,fill=name))+ # fill=name allow to automatically dedicate a color for each group
geom_violin()p
Ggplot2
expects input data to be in a long format: each row is
dedicated to one observation. Your input needs 2 column:
- a categorical variable for the X axis: it needs to be have
the class factor
- a numeric variable for the Y axis: it needs to have the
class numeric
→ From long format
You already have the good format. It’s going to be a
breeze to plot it with geom_violin() as follow:
# Librarylibrary(ggplot2)library(dplyr) # Create datadata <-data.frame(
name=c(rep("A",500),rep("B",500),rep("B",500),rep("C",20),rep('D',100) ),
value=c(rnorm(500,10,5),rnorm(500,13,1),rnorm(500,18,1),rnorm(20,25,4),rnorm(100,12,1) )%>% round(2)
)
|
name |
value |
|
A |
11.02 |
|
A |
5.54 |
|
A |
18.05 |
|
A |
6.57 |
# Basic violinggplot(data,aes(x=name,y=value,fill=name))+
geom_violin()
From wide format
In this case we need to reformat the input. This is
possible thanks to the gather() function of the tidyr library that is part of the tidyverse.
# Let's use the iris dataset as an example:data_wide <-iris[ ,1:4]
|
Sepal.Length |
Sepal.Width |
Petal.Length |
Petal.Width |
|
5.1 |
3.5 |
1.4 |
0.2 |
|
4.9 |
3.0 |
1.4 |
0.2 |
|
4.7 |
3.2 |
1.3 |
0.2 |
|
4.6 |
3.1 |
1.5 |
0.2 |
library(tidyr)library(ggplot2)library(dplyr)data_wide%>%
gather(key="MesureType",value="Val")%>%
ggplot(aes(x=MesureType,y=Val,fill=MesureType))+
geom_violin()
Horizontal violin plot with ggplot2
Building a violin plot with ggplot2 is
pretty
straightforward thanks to the dedicated geom_violin() function. Here, calling coord_flip() allows to flip X and Y axis and
thus get a horizontal version of the chart. Moreover, note the use of the theme_ipsum of the hrbrthemes library that improves general
appearance.
# Librarieslibrary(ggplot2)library(dplyr)library(tidyr)library(forcats)library(hrbrthemes)library(viridis)# Load dataset from githubdata <-read.table("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv",header=TRUE,sep=",")
# Data is at wide format, we need to make it 'tidy' or 'long'data <-data%>%
gather(key="text",value="value")%>%
mutate(text =gsub("\\."," ",text))%>%
mutate(value =round(as.numeric(value),0))%>%
filter(text%in% c("Almost Certainly","Very Good Chance","We Believe","Likely","About Even","Little Chance","Chances Are Slight","Almost No Chance"))
# Plotp <-data%>%
mutate(text =fct_reorder(text, value))%>% # Reorder data
ggplot(aes(x=text,y=value,fill=text,color=text))+
geom_violin(width=2.1,size=0.2)+
scale_fill_viridis(discrete=TRUE)+
scale_color_viridis(discrete=TRUE)+
theme_ipsum()+
theme(legend.position="none"
)+
coord_flip()+ # This switch X and Y axis and allows to get the horizontal version
xlab("")+
ylab("Assigned Probability (%)")
p
Violin plot with included boxplot and sample
size in ggplot2
Building a violin plot with ggplot2 is pretty straightforward thanks
to the dedicated geom_violin() function. It is possible to
use geom_boxplot() with a small width in addition to display a boxplot
that provides summary statistics.
Moreover, note a small trick that allows to provide
sample size of each group on the X axis: a new column called myaxis is created and is then used for
the X axis.
# Librarieslibrary(ggplot2)library(dplyr)library(hrbrthemes)library(viridis)# create a datasetdata <-data.frame(
name=c(rep("A",500),rep("B",500),rep("B",500),rep("C",20),rep('D',100) ),
value=c(rnorm(500,10,5),rnorm(500,13,1),rnorm(500,18,1),rnorm(20,25,4),rnorm(100,12,1) )
)# sample sizesample_size =data%>% group_by(name)%>% summarize(num=n())
# Plotdata%>%
left_join(sample_size)%>%
mutate(myaxis =paste0(name,"\n","n=", num))%>%
ggplot(aes(x=myaxis,y=value,fill=name))+
geom_violin(width=1.4)+
geom_boxplot(width=0.1,color="grey",alpha=0.2)+
scale_fill_viridis(discrete =TRUE)+
theme_ipsum()+
theme(legend.position="none",
plot.title =element_text(size=11)
)+
ggtitle("A Violin wrapping a boxplot")+
xlab("")
Grouped violin chart with ggplot2
A grouped violin plot displays
the distribution of a numeric variable for groups and subgroups. Here, groups
are days of the week, and subgroups are Males and Females. Ggplot2 allows this
kind of representation thanks to the position="dodge" option of the geom_violin() function. Groups must be
provided to x, subgroups must be provided to fill.
# Librarieslibrary(ggplot2)library(dplyr)library(forcats)library(hrbrthemes)library(viridis)# Load dataset from githubdata <-read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/10_OneNumSevCatSubgroupsSevObs.csv",header=T,sep=",")%>%
mutate(tip =round(tip/total_bill*100,1))
# Grouped
data%>%
mutate(day =fct_reorder(day, tip))%>%
mutate(day =factor(day,levels=c("Thur","Fri","Sat","Sun")))%>%
ggplot(aes(fill=sex,y=tip,x=day))+
geom_violin(position="dodge",alpha=0.5,outlier.colour="transparent")+
scale_fill_viridis(discrete=T,name="")+
theme_ipsum()+
xlab("")+
ylab("Tip (%)")+
ylim(0,40)
Comments
Post a Comment