Plotly is an excellent graphing package available for R and R Shiny. The package allows for clean, colourful charts to be created that look modern and eye catching. On top of its heightened attractiveness, Plotly’s main benefit over base R graphs, and the staple graphing package ggplot2, is its interactivity. These graphs are ideal for use in R Shiny data apps, or in HTML markdown documents as they allow for user interaction, elevating static reports into dynamic tools.
Plotly can, however, be a bit of a beast to wrangle with, and I have often found myself searching the web for answers to seemingly simple questions. In this series of blogs, I will share a few of the discoveries I have made that have enhanced my data visualisations.
This blog series starts with the humble bar chart; the bread and butter of any analyst. Here we’ll look at the simple task of creating a bar chart and the not-so-obvious, yet still simple, task of controlling the order that our categorical data is plotted.
A simple bar chart
Plotly’s bar charts allow for much customisation, depending on the desired outcome. For instance, should we want to compare two discrete data series across a number of categories, we could use a grouped bar chart. Or if we wanted to visualise the breakdown of a metric, we could use a stacked bar chart.
The following is a very simple bar chart, plotting the number of games played by each of the 5 top scorers at the 2018 World Cup. First, we will create the data frame (plotData) we will use for our bar charts:
plotData <- data.frame( player = c("Kane", "Griezmann", "Lukaku", "Cheryshev", "Ronaldo"), goalsScored = c(6, 4, 4, 4, 4), assists = c(0, 2, 1, 0, 0), gamesPlayed = c(6, 7, 6, 5, 4) )
Which in table form looks like this:
Now we can plot our bar chart using the following code:
library(plotly) plot_ly(data = plotData, x = ~player, y = ~gamesPlayed, type = "bar" ) %>% layout(title = "Games played per top scorer", xaxis = list(title = ""), yaxis = list(title = "Games Played") )
Which gives us this output:
Note that in our data we ordered our players firstly by the number of goals that they scored, and secondly by the number of assists they made. However, Plotly has rearranged our data in order to plot our goal scorers in alphabetical order. Whilst this was an understandable attempt by Plotly to make my graph more readable, it was not my intention to plot this data alphabetically.
So how can we create a plot with our intended order?
Ordering the bar chart
Luckily for us, there is a helpful (but barely documented) pair of Plotly attributes that will help us achieve our goal: categoryorder and categoryarray. Details for these attributes are buried deep within the Plotly for R reference guide (see further reading) with hardly any other mentions outside of Stack Overflow, so you would be forgiven for not knowing about their existence!
The categoryorder parameter allows us to tell Plotly how exactly we would like it to handle the ordering of our categorical values. Accepted values include “category ascending” and “category descending”, and the option we will choose: “array”. This allows us to supply our own predetermined ordering to the category array attribute, which will accept a data frame column, a list or a vector.
These attributes should be set in the layout function, within the relevant axis. For instance, our graph requires the categories within the x axis to be ordered, but if we’d plotted horizontal bars we would order the y axis.
See below for an example of this in action:
plot_ly(data = plotData, x = ~player, y = ~gamesPlayed, type = "bar" ) %>% layout( title = "Games played per top scorer", xaxis = list(title = "", categoryorder = "array", categoryarray = ~player), yaxis = list(title = "Games Played") )
Plotly graphs are great, but customising them can often be a pain. This is not because of a lack of options, but a lack of instruction about how to access these options. The Plotly for R reference guide is an exhaustive (but dense) list of all of the attributes that can be changed within a plot, so if your Google searches fail you, this is the place to look.
by Jon Willis
- 27 Feb 2019 5 reasons why Microsoft became Gartner’s market leader for BI 27 Feb 2019
- 14 Dec 2018 8 insights from the SDR 2017-18 Dashboard 14 Dec 2018
- 23 Nov 2018 What is a Dashboard? 23 Nov 2018
- 31 Aug 2018 Plotly in R: How to make ggplot2 charts interactive with ggplotly 31 Aug 2018
- 16 Aug 2018 Making the most of box plots 16 Aug 2018
- 24 Jul 2018 Plotly in R: How to order a Plotly bar chart 24 Jul 2018
- 11 Apr 2018 Machine learning in the housing sector 11 Apr 2018
- 5 Mar 2018 How Useful Are Traffic Light Scorecards for Performance Management? 5 Mar 2018
- 16 Feb 2018 How to merge multiple data frames using base R 16 Feb 2018
- 8 Feb 2018 The beginner's guide to time series forecasting 8 Feb 2018
- 24 Jan 2018 R Shiny vs. Power BI 24 Jan 2018
- 15 Aug 2016 Fundamentals of a good performance framework 15 Aug 2016