Frequently Asked Questions

What is the algebraic structure of AlgebraOfGraphics?

AlgebraOfGraphics is based on two operators, + and *. These two operators induce a semiring structure, with a small caveat. Addition is commutative only up to the drawing order. For example, visual(Lines) + visual(Scatter) is slightly different from visual(Scatter) + visual(Lines), in that the former draws the scatter on top of the lines, and the latter draws the lines on top of the scatter. As a consequence, only right distributivity holds with full generality, whereas left distributivity only holds up to the drawing order.

Why is the mapping pair syntax different from DataFrames?

The transformations passed within a mapping, e.g. mapping(:x => log => "log(x)"), are applied element-wise. Operations that require the whole column are not supported on purpose. An important reason to prefer element-wise operations (other than performance) is that whole-column operations can be error prone in this setting, especially when

the data is grouped or
different datasets are used.

If you do need column-wise transformations, consider implementing a custom analysis, such as density, which takes the whole data as input, or apply the transformation directly in your data before passing it to AlgebraOfGraphics.

See also Pair syntax for a detailed description of the pair syntax within a mapping.

What is the difference between axis scales and data transformations?

There are two overlapping but distinct ways to rescale data.

Keep the data as is and use a nonlinear scale, e.g. axis=(xscale=log,).
Transform the data directly, e.g. mapping(:x => log => "log(x)").

Note that the resulting plots may "look different" in some cases. Consider for instance the following example.

using AlgebraOfGraphics
using AlgebraOfGraphics: density
df = (x = exp.(randn(1000)),)
kde1 = data(df) * mapping(:x) * density()
draw(kde1, axis=(width=225, height=225, xscale=log,))

df = (x = exp.(randn(1000)),)
kde2 = data(df) * mapping(:x => log => "log(x)") * density()
draw(kde2, axis=(width=225, height=225))

The two plots look different. The first represents the pdf of x in a log scale, while the second represents the pdf of log(x) in a linear scale. The two curves differ by a factor 1 / x, the derivative of log(x). See e.g. this post for some mathematical background on the topic.

In general, the second approach (plotting the density of log(x)) could be considered more principled, as it preserves the proportionality between area and probability mass. On the contrary, the first approach (plotting the density of x in a log scale) breaks this proportionality relationship.

A similar reasoning applies to histograms:

using AlgebraOfGraphics
df = (x = exp.(rand(1000)),)
hist1 = data(df) * mapping(:x) * histogram()
draw(hist1, axis=(width=225, height=225, xscale=log))

df = (x = exp.(rand(1000)),)
hist2 = data(df) * mapping(:x => log => "log(x)") * histogram()
draw(hist2, axis=(width=225, height=225))

The data transformation approach is preferable as it produces uniform bins, which are easier to interpret.