Lorenzo Nieri - August 29, 2021

Simple bar charts are one of the more common charts in data visualization, they consist in representing one or more quantitative variables for different categories with rectangles that must have the same base and the heights proportional to the magnitude of the quantitative variables analysed.

History

The origins of this chart can be founded in the end of the 18th century when William Playfair (1759-1823), a Scottish political economist, published in 1786 “The commercial and political atlas” in which he published the chart below. The chart shows the imports and the exports of Scotland in 1781. Using a language more common to BI professionals, this chart is comparing 2 measures (import and exports) for 17 attributes, which are the attributes of the country dimension.

Chart representing exports of Scotland in 1781

Examples

As mentioned in the introduction, Bar charts can analyse one or more quantitative variables along attributes, hence we will look separately at these two cases (single quantitative variable, multiple quantitative variables).

Single quantitative variable

Now let’s consider the table below, and let’s make a Bar chart of it.

Chart comparing sales for different colours of product

Looking at the chart, the eye can capture multiple insights that do not require to make calculations across the data. In fact, the eye can break in pieces the bars and draw imaginary lines between the middle point of the bars to make better comparisons.

Representation and description of the visual analysis for a single variable simple bar chart

a) Multiple quantitative variables

In this case it is mandatory to use a legend to distinguish the different variables/measures.

Let’s take the previous dataset and decompose the sales in profit type (High, Medium, and Low). As said, this generates 3 different variables of sales evaluated for different categories.

Chart comparing sales of different categories in different product colours

Now the chart is giving us the opportunity to compare the data either inside the categories or across them. In fact, the eye can focus on each three of the single variables and observe the differences between the categories or focus on the comparison inside each category. Nonetheless, the overall comparison is not possible anymore hence inserting a category into a single variable bar chart leads to some loss of information.

Representation and description of the visual analysis for a multiple variable simple bar chart

Parameters and best practices

a) Horizontal vs Vertical

Bar charts can be expressed either in horizontal or vertical.

Bar charts with horizontal and vertical bars

Generally, there is no preferable way to display the data, though vertical bars are mandatory when the categorical axis is time.

b) Labels

Labels in bar chart are the icons above or below the bar. They are used to indicate the number displayed by the bar. In my opinion labels are usually redundant when the value axis is exposed since the don’t add nothing new to the chart. Instead, if the bars are done without an axis the labels are necessary.

Bar charts comparison for the preference in the use of labels, two cases are shown

c) Baseline

The baseline is the starting value for the value axis. The baseline of a chart must be set to 0. Let’s check a practical example to see what are the dangers of violating this principle.

Two bar charts having a different baseline

In the example above we show two charts, one with the baseline starting at 0, and one at 300.000€. For the baseline at 0 in the left chart we have the following equality:

This means that the proportions of the visual correspond to the proportion that are in the data.

In the right chart we can see that the bar shows the value of green is nearly six times  greater than the bar of pink. This means that the proportions of the visual do not correspond to the proportion that are in the data, hence the chart is giving a message that is different from the original dataset.

d) Order of bars

The bar can be sorted in different ways:

  • Alphabetically sorted
  • Sorted in ascending or descending order based on the quantitative variable.

Three bar charts sorted differently

In this case there is no general rule on the best practice to adopt.

e) Space between the bars

The space between the bars is more an aesthetic issue and hence it does not require much importance unless an extreme distance (too short or too long) is used. The major risk is to confuse this chart with a histogram due to a small space between bars as shown in the figure below.

Two bar charts. One with normal bars, the other with too large bars

f) Number of bars

This parameter indicates how many bars should be adopted. This question may seem strange and useless since this will depend on the dataset. Nonetheless, we can answer to this question introducing the “others/rest of” categorical field. This is a field that represents all the categorical field that are not shown in the chart and is used to create a chart that can fit a page, imagine a chart with 500 categorical fields, it would be impossible to make a bar chart out of it. There is no general rule on which % of the total should be the “rest of” variable. Two principles can guide us:

  • 80-20% pareto rule: Applying this principle we know the approximately the categorical attributes are around 80% of the dataset.
  • Aesthetics: In this case a principle could be to set the percentage of “others” based on the first categorical fields, these should be greater than “others”. Furthermore, it is hard to fit in page a bar with more than 50 categorical fields hence that could also be a guide to set the percentage for “others”.

Two bar charts. One with normal bars width, the other with too large bars width