Mapping

Mappings determine how the data is translated into a plot. For example, this mapping maps columns weight and height to positional arguments 1 and 2, and age to the markersize attribute of the Scatter plotting function:

mapping(:weight, :height, markersize = :age)

AlgebraOfGraphics.mapping — Function

mapping(positional...; named...)

Create a Layer with positional and named selectors. These selectors will be translated into input data for the Makie plotting function or AlgebraOfGraphics analysis that is chosen to visualize the Layer.

A Layer created with mapping does not have a data source by default, you can add one by multiplying with the output of the data function.

The positional and named selectors of mapping are converted to actual input data for the plotting function that will be selected via visual. The translation from selector to data differs according to the data source.

Tabular data

When a mapping is combined with a data(tabular) where tabular is some Tables.jl-compatible object, each argument will be interpreted as a column selector. Additionally, it's allowed to specify columns outside of the dataset directly by wrapping the values in direct. The values can either be vectors that have to match the number of rows from the tabular data, or scalars that will be expanded as if they were a column filled with the same value.

mapping(
    :x,                        # column named "x"
    "a column";                # column named "a column"
    color = 1,                 # first column
    marker = direct("abc"),    # a new column filled with the string "abc"
    linestyle = direct(1:3),   # a new column, length must match the table
)

nothing

If no data is set, each entry of mapping should be an AbstractVector that specifies a column of data directly. Scalars like strings for example will be expanded as if they were a column filled with the same value. This is useful when a legend should be shown, but there's only one group.

mapping(
    1:3,               # a column with values 1 to 3
    [4, 5, 6],         # a column with values 4 to 6 
    color = "group 1", # a column with repeated value "group 1"         
)

Pregrouped

With data(Pregrouped()) * mapping(...) or the shortcut pregrouped(...), each element in mapping specifies input data directly, like with nothing. However, in this mode, data should be passed in pregrouped. Categorical variables should come as a vector of categories, while numerical variables should come as a vector of vectors of values, with as many inner vectors as there are groups in the categorical variables.

pregrouped(
    [[1, 2, 3], [4, 5]], # two grouped vectors, of length 3 and 2
    color = ["A", "B"]   # a vector with two categorical group values
)

source

Aesthetics

The structure of a mapping is always directly tied to the signature of the plotting function (or analysis) that it is being connected with. What visual aspects of the plot the positional or keyword arguments affect depends on the plotting function in use.

To be used with AlgebraOfGraphics, a plotting function has to add a declaration which aesthetics (like X, Y, Color, MarkerSize, LineStyle) its arguments map to. This mechanism allows AlgebraOfGraphics to correctly convert the raw input data into visual attributes for each plotting function and to correctly create and label axes, colorbars and legends.

Aesthetics can also change depending on attributes passed to visual.

For example, for a BarPlot, args 1 and 2 correspond to the X and Y aesthetics by default. But if you change the direction in the visual, then axis labels shift accordingly because the aesthetic mapping has changed to 1 = Y, 2 = X:

using AlgebraOfGraphics
using CairoMakie

df = (; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])
m = mapping(:name, :height_meters)
spec1 = data(df) * m * visual(BarPlot)
spec2 = data(df) * m * visual(BarPlot, direction = :x)

f = Figure()
draw!(f[1, 1], spec1)
draw!(f[1, 2], spec2)
f

Hardcoded aesthetics

Most aesthetics are tied to specific attributes of plot types, for example like AesColor to strokecolor of Scatter. There are a few aesthetics, however, which are hardcoded to belong to certain mapping keywords independent of the plot type in use.

These are layout, row and col for facetting, group for creating a separate plot for each group (like separate lines instead of one long line) and dodge_x and dodge_y for dodging.

Dodging

Dodging refers to the shifting of plots on a (usually categorical) scale depending on the group they belong to. It is used to avoid overlaps. Some plot types, like BarPlot, have their own dodge keyword because their dodging logic additionally needs to transform the visual elements (for example, dodging a bar plot makes thinner bars). For all other plot types, you can use the generic dodge_x and dodge_y keywords.

They work by shifting each categorical group by some value that depends on the chosen "dodge width". The dodge width refers to the width that all dodged elements in a group add up to at a given point. Some plot types have an inherent width, like barplots. Others have no width, like scatters or errorbars. For those plot types that have no width to use for dodging, you have to specify one manually in scales.

Here's an example of a manual width selection:

using AlgebraOfGraphics
using CairoMakie

df = (
   x = repeat(1:10, inner = 2),
   y = cos.(range(0, 2pi, length = 20)),
   ylow = cos.(range(0, 2pi, length = 20)) .- 0.2,
   yhigh = cos.(range(0, 2pi, length = 20)) .+ 0.3,
   dodge = repeat(["A", "B"], 10)
)

f = Figure()
plt = data(df) * (
   mapping(:x, :y, dodge_x = :dodge, color = :dodge) * visual(Scatter) +
   mapping(:x, :ylow, :yhigh, dodge_x = :dodge, color = :dodge) * visual(Rangebars)
)
draw!(f[1, 1], plt, scales(DodgeX = (; width = 1)), axis = (; title = "width = 1"))
draw!(f[1, 2], plt, scales(DodgeX = (; width = 0.75)), axis = (; title = "width = 0.75"))
draw!(f[2, 1], plt, scales(DodgeX = (; width = 0.5)), axis = (; title = "width = 0.5"))
draw!(f[2, 2], plt, scales(DodgeX = (; width = 0.25)), axis = (; title = "width = 0.25"))
f

A common scenario is plotting errorbars on top of barplots. In this case, AlgebraOfGraphics can detect the inherent dodging width of the barplots and adjust accordingly for the errorbars. Note in this example how choosing a manual dodging width only applies to the errorbars (because the barplot plot type handles this internally) and potentially leads to a misalignment between the different plot elements:

using AlgebraOfGraphics
using CairoMakie

df = (
   x = repeat(1:10, inner = 2),
   y = cos.(range(0, 2pi, length = 20)),
   ylow = cos.(range(0, 2pi, length = 20)) .- 0.2,
   yhigh = cos.(range(0, 2pi, length = 20)) .+ 0.3,
   dodge = repeat(["A", "B"], 10)
)

f = Figure()
plt = data(df) * (
   mapping(:x, :y, dodge = :dodge, color = :dodge) * visual(BarPlot) +
   mapping(:x, :ylow, :yhigh, dodge_x = :dodge) * visual(Rangebars)
)
draw!(f[1, 1], plt, axis = (; title = "No width specified, auto-determined by AlgebraOfGraphics"))
draw!(f[2, 1], plt, scales(DodgeX = (; width = 0.25)), axis = (; title = "Manually specifying width = 0.25 leads to a mismatch"))
f

Pair syntax

The Pair operator => can be used for three different purposes within mapping:

renaming columns
transforming columns by row
mapping data to a custom scale

Renaming columns

using AlgebraOfGraphics
using CairoMakie

data((; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])) *
   mapping(:name => "Name", :height_meters => "Height (m)") *
   visual(BarPlot) |> draw

Transforming columns

If a Function is paired to the column selector, it is applied by row to the data. Often, you will want to also assign a new name that fits the transformed data, in which case you can use the three-element column => transformation => name syntax:

using AlgebraOfGraphics
using CairoMakie

data((; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])) *
   mapping(:name => (n -> n[1] * "."), :height_meters => (x -> x * 100) => "Height (cm)") *
   visual(BarPlot) |> draw

Row-by-row versus whole-column operations

The pair syntax acts row by row, unlike, e.g., DataFrames.transform. This has several advantages.

Simpler for the user in most cases.
Less error prone especially
- with grouped data (should a column operation apply to each group or the whole dataset?)
- when several datasets are used

Naturally, this also incurs some downsides, as whole-column operations, such as z-score standardization, are not supported: they should be done by adding a new column to the underlying dataset beforehand.

Functions of several arguments

In the case of functions of several arguments, such as isequal, the input variables must be passed as a Tuple.

accuracy = (:species, :predicted_species) => isequal => "accuracy"

Helper functions

Some helper functions are provided, which can be used within the pair syntax to either rename and reorder unique values of a categorical column on the fly or to signal whether a numerical column should be treated as categorical.

The complete API of helper functions is available at Mapping helpers, but here are a few examples:

# column `train` has two unique values, `true` and `false`
:train => renamer([true => "training", false => "testing"]) => "Dataset"
# column `price` has three unique values, `"low"`, `"medium"`, and `"high"`
:price => sorter(["low", "medium", "high"])
# column `age` is expressed in integers and we want to treat it as categorical
:age => nonnumeric
# column `labels` is expressed in strings and we do not want to treat it as categorical
:labels => verbatim
# wrap categorical values to signal that the order from the data source should be respected
:weight => presorted

Custom scales

All columns mapped to the same aesthetic type are represented using the same scale by default. This is evident if you plot two different datasets with two different plot types.

In the following example, both Scatter and HLines use the Color aesthetic, Scatter for the strokecolor keyword and HLines for color. A single merged legend is rendered for both, which does not have a title because it derives from two differently named columns.

using AlgebraOfGraphics
using CairoMakie

df_a = (; x = 1:9, y = [1, 2, 3, 5, 6, 7, 9, 10, 11], group = repeat(["A", "B", "C"], inner = 3))

spec1 = data(df_a) * mapping(:x, :y, strokecolor = :group) * visual(Scatter, color = :transparent, strokewidth = 3, markersize = 15)

df_b = (; y = [4, 8], threshold = ["first", "second"])

spec2 = data(df_b) * mapping(:y, color = :threshold) * visual(HLines)

draw(spec1 + spec2)

If we want to have separate legends for both, we can assign a custom scale identifier to either the strokecolor or the color mapping. The name can be chosen freely, it serves only to disambiguate.

spec2_custom_scale = data(df_b) * mapping(:y, color = :threshold => scale(:color2)) * visual(HLines)

draw(spec1 + spec2_custom_scale)

Each scale can be customized further by passing configuration options via scales as the second argument of the draw function. More information on scale options can be found under Scale options.

As an example, we can pass separate colormaps using the palette keyword:

spec2_custom_scale = data(df_b) * mapping(:y, color = :threshold => scale(:color2)) * visual(HLines)

draw(
   spec1 + spec2_custom_scale,
   scales(
      Color = (; palette = [:red, :green, :blue]),
      color2 = (; palette = [:gray30, :gray80]),
   )
)