Tutorial

Here we will see what are the basic building blocks of AlgebraOfGraphics, and how to combine them to create complex plots based on tables or other formats.

Basic building blocks

The most important functions are mapping, and visual. mapping determines the mappings from data to plot. Its positional arguments correspond to the x, y or z axes of the plot, whereas the keyword arguments correspond to plot attributes that can vary continuously or discretely, such as color or markersize. Variables in mapping are split according to the categorical attributes in it, and then converted to plot attributes using a default palette. Finally visual can be used to give data-independent visual information about the plot (plotting function or attributes).

mapping and visual work in various context. In the following we will explore DataContext, which is introduced doing data(df) for any tabular mapping structure df. In this context, mapping accepts symbols and integers, which correspond to columns of the data.

Operations

The outputs of mapping, visual, and data can be combined with + or *, to generate an AlgebraicList object, which can then be plotted using the function draw. The actual drawing is done by AbstractPlotting.

The operation + is used to create separate layer. a + b has as many layers as la + lb, where la and lb are the number of layers in a and b respectively.

The operation a * b create la * lb layers, where la and lb are the number of layers in a and b respectively. Each layer of a * b contains the combined information of the corresponding layer in a and the corresponding layer in b. In simple cases, however, both a and b will only have one layer, and a * b simply combines the information.

Working with tables

using RDatasets: dataset
using AlgebraOfGraphics, CairoMakie
mpg = dataset("ggplot2", "mpg");
cols = mapping(:Displ, :Hwy);
grp = mapping(color = :Cyl => categorical);
scat = visual(Scatter)
pipeline = cols * scat
data(mpg) * pipeline |> draw

Now let's simply add grp to the pipeline to color according to :Cyl.

data(mpg) * grp * pipeline |> draw

Traces can be added together with +.

using AlgebraOfGraphics: linear
pipenew = cols * (scat + linear)
data(mpg) * pipenew |> draw

We can put grouping in the pipeline (we get a warning because of a degenerate group).

data(mpg) * grp * pipenew |> draw
┌ Warning: Linear fit not possible for the given data
└ @ AlgebraOfGraphics ~/work/AlgebraOfGraphics.jl/AlgebraOfGraphics.jl/src/analysis/smooth.jl:33

This is a more complex example, where we split the scatter plot, but do the linear regression with all the data. Moreover, we pass weights to linear to compute the regression line with weighted least squares.

different_grouping = grp * scat + linear * mapping(wts=:Hwy)
data(mpg) * cols * different_grouping |> draw

Different analyses are also possible, always with the same syntax:

using AlgebraOfGraphics: smooth, density, frequency, reducer
data(mpg) * cols * grp * (scat + smooth(span = 0.8)) |> draw

data(mpg) * cols * density |> draw

data(mpg) * mapping(:Cyl => categorical) * frequency |> draw

data(mpg) * mapping(:Cty, :Hwy) * reducer(agg = +) |> draw

We can also add visual information that only makes sense in one recipe (e.g. markersize) by multiplying them:

newmapping = mapping(markersize = :Cyl) * visual(markersize = (0.1, 5))
data(mpg) * cols * (scat * newmapping + smooth(span = 0.8)) |> draw

Layout

Thanks to the MakieLayout package it is possible to create plots where categorical variables inform the layout.

iris = dataset("datasets", "iris")
cols = mapping(:SepalLength, :SepalWidth)
grp = mapping(layout_x = :Species)
geom = visual(Scatter) + linear
data(iris) * cols * grp * geom |> draw

iris = dataset("datasets", "iris")
cols = mapping(:SepalLength)
grp = mapping(layout_x = :Species)
geom = AlgebraOfGraphics.histogram
data(iris) * cols * grp * geom |> draw

Non tabular mapping (slicing context)

The framework is not specific to tables, but can be used in different contexts. For instance, dims() introduces a context where each entry of the array corresponds to a trace.

x = [-pi..0, 0..pi]
y = [sin cos] # We use broadcasting semantics on `tuple.(x, y)`.
dims() * mapping(x, y, color = dims(1), linestyle = dims(2)) |> draw

using Distributions
distributions = InverseGaussian.(1:4, [6 10])
dims() * mapping(fill(0..5), distributions, color = dims(1), linestyle = dims(2)) |> draw

More generally, one can pass arguments to dims to implement the "slices are series" approach.

s = dims(1) * mapping(rand(50, 3), rand(50, 3, 2))
grp = mapping(color = dims(2), layout_x = dims(3))
s * grp * visual(Scatter) |> draw

This approach can be used in combination with the tabular context to work with "wide" data, where grouping is done by column.

iris = dataset("datasets", "iris")
cols = mapping([:SepalLength, :SepalWidth], [:PetalLength :PetalWidth])
grp = mapping(layout_x = dims(1), layout_y = dims(2), color = :Species)
geom = visual(Scatter) + linear
data(iris) * cols * grp * geom |> draw


This page was generated using Literate.jl.