Simple bar charts are one of the more common charts in data visualization, they consist in representing one or more quantitative variables for different categories with rectangles that must have the same base and the heights proportional to the magnitude of the quantitative variables analysed.
The origins of this chart can be founded in the end of the 18th century when William Playfair (1759-1823), a Scottish political economist, published in 1786 “The commercial and political atlas” in which he published the chart below. The chart shows the imports and the exports of Scotland in 1781. Using a language more common to BI professionals, this chart is comparing 2 measures (import and exports) for 17 attributes, which are the attributes of the country dimension.
As mentioned in the introduction, Bar charts can analyse one or more quantitative variables along attributes, hence we will look separately at these two cases (single quantitative variable, multiple quantitative variables).
Now let’s consider the table below, and let’s make a Bar chart of it.
Looking at the chart, the eye can capture multiple insights that do not require to make calculations across the data. In fact, the eye can break in pieces the bars and draw imaginary lines between the middle point of the bars to make better comparisons.
In this case it is mandatory to use a legend to distinguish the different variables/measures.
Let’s take the previous dataset and decompose the sales in profit type (High, Medium, and Low). As said, this generates 3 different variables of sales evaluated for different categories.
Now the chart is giving us the opportunity to compare the data either inside the categories or across them. In fact, the eye can focus on each three of the single variables and observe the differences between the categories or focus on the comparison inside each category. Nonetheless, the overall comparison is not possible anymore hence inserting a category into a single variable bar chart leads to some loss of information.
Bar charts can be expressed either in horizontal or vertical.
Generally, there is no preferable way to display the data, though vertical bars are mandatory when the categorical axis is time.
Labels in bar chart are the icons above or below the bar. They are used to indicate the number displayed by the bar. In my opinion labels are usually redundant when the value axis is exposed since the don’t add nothing new to the chart. Instead, if the bars are done without an axis the labels are necessary.
The baseline is the starting value for the value axis. The baseline of a chart must be set to 0. Let’s check a practical example to see what are the dangers of violating this principle.
In the example above we show two charts, one with the baseline starting at 0, and one at 300.000€. For the baseline at 0 in the left chart we have the following equality:
This means that the proportions of the visual correspond to the proportion that are in the data.
In the right chart we can see that the bar shows the value of green is nearly six times greater than the bar of pink. This means that the proportions of the visual do not correspond to the proportion that are in the data, hence the chart is giving a message that is different from the original dataset.
The bar can be sorted in different ways:
In this case there is no general rule on the best practice to adopt.
The space between the bars is more an aesthetic issue and hence it does not require much importance unless an extreme distance (too short or too long) is used. The major risk is to confuse this chart with a histogram due to a small space between bars as shown in the figure below.
This parameter indicates how many bars should be adopted. This question may seem strange and useless since this will depend on the dataset. Nonetheless, we can answer to this question introducing the “others/rest of” categorical field. This is a field that represents all the categorical field that are not shown in the chart and is used to create a chart that can fit a page, imagine a chart with 500 categorical fields, it would be impossible to make a bar chart out of it. There is no general rule on which % of the total should be the “rest of” variable. Two principles can guide us: