Plotly is an excellent graphing package available for R and R Shiny. The package allows for clean, colourful charts to be created that look modern and eye catching. On top of its heightened attractiveness, Plotly’s main benefit over base R graphs, and the staple graphing package ggplot2, is its interactivity. These graphs are ideal for use in R Shiny data apps, or in HTML markdown documents as they allow for user interaction, elevating static reports into dynamic tools.
Plotly can, however, be a bit of a beast to wrangle with, and I have often found myself searching the web for answers to seemingly simple questions. In this series of blogs, I will share a few of the discoveries I have made that have enhanced my data visualisations.
This blog series starts with the humble bar chart; the bread and butter of any analyst. Here we’ll look at the simple task of creating a bar chart and the not-so-obvious, yet still simple, task of controlling the order that our categorical data is plotted.
A simple bar chart
Plotly’s bar charts allow for much customisation, depending on the desired outcome. For instance, should we want to compare two discrete data series across a number of categories, we could use a grouped bar chart. Or if we wanted to visualise the breakdown of a metric, we could use a stacked bar chart.
The following is a very simple bar chart, plotting the number of games played by each of the 5 top scorers at the 2018 World Cup. First, we will create the data frame (plotData) we will use for our bar charts:
plotData <- data.frame( player = c("Kane", "Griezmann", "Lukaku", "Cheryshev", "Ronaldo"), goalsScored = c(6, 4, 4, 4, 4), assists = c(0, 2, 1, 0, 0), gamesPlayed = c(6, 7, 6, 5, 4) )
Which in table form looks like this:
Now we can plot our bar chart using the following code:
library(plotly) plot_ly(data = plotData, x = ~player, y = ~gamesPlayed, type = "bar" ) %>% layout(title = "Games played per top scorer", xaxis = list(title = ""), yaxis = list(title = "Games Played") )
Which gives us this output:
Note that in our data we ordered our players firstly by the number of goals that they scored, and secondly by the number of assists they made. However, Plotly has rearranged our data in order to plot our goal scorers in alphabetical order. Whilst this was an understandable attempt by Plotly to make my graph more readable, it was not my intention to plot this data alphabetically.
So how can we create a plot with our intended order?
Ordering the bar chart
Luckily for us, there is a helpful (but barely documented) pair of Plotly attributes that will help us achieve our goal: categoryorder and categoryarray. Details for these attributes are buried deep within the Plotly for R reference guide (see further reading) with hardly any other mentions outside of Stack Overflow, so you would be forgiven for not knowing about their existence!
The categoryorder parameter allows us to tell Plotly how exactly we would like it to handle the ordering of our categorical values. Accepted values include “category ascending” and “category descending”, and the option we will choose: “array”. This allows us to supply our own predetermined ordering to the category array attribute, which will accept a data frame column, a list or a vector.
These attributes should be set in the layout function, within the relevant axis. For instance, our graph requires the categories within the x axis to be ordered, but if we’d plotted horizontal bars we would order the y axis.
See below for an example of this in action:
plot_ly(data = plotData, x = ~player, y = ~gamesPlayed, type = "bar" ) %>% layout( title = "Games played per top scorer", xaxis = list(title = "", categoryorder = "array", categoryarray = ~player), yaxis = list(title = "Games Played") )
Plotly graphs are great, but customising them can often be a pain. This is not because of a lack of options, but a lack of instruction about how to access these options. The Plotly for R reference guide is an exhaustive (but dense) list of all of the attributes that can be changed within a plot, so if your Google searches fail you, this is the place to look.
by Jon Willis
- Feb 27, 2019 5 reasons why Microsoft became Gartner’s market leader for BI Feb 27, 2019
- Dec 14, 2018 8 insights from the SDR 2017-18 Dashboard Dec 14, 2018
- Nov 23, 2018 What is a Dashboard? Nov 23, 2018
- Aug 31, 2018 Plotly in R: How to make ggplot2 charts interactive with ggplotly Aug 31, 2018
- Aug 16, 2018 Making the most of box plots Aug 16, 2018
- Jul 24, 2018 Plotly in R: How to order a Plotly bar chart Jul 24, 2018
- Apr 11, 2018 Machine learning in the housing sector Apr 11, 2018
- Mar 5, 2018 How Useful Are Traffic Light Scorecards for Performance Management? Mar 5, 2018
- Feb 16, 2018 How to merge multiple data frames using base R Feb 16, 2018
- Feb 8, 2018 The beginner's guide to time series forecasting Feb 8, 2018
- Jan 24, 2018 R Shiny vs. Power BI Jan 24, 2018
- Aug 15, 2016 Fundamentals of a good performance framework Aug 15, 2016