Mapping

Mappings determine how the data is translated into a plot. For example, this mapping maps columns weight and height to positional arguments 1 and 2, and age to the markersize attribute of the Scatter plotting function:

mapping(:weight, :height, markersize = :age)
AlgebraOfGraphics.mappingFunction
mapping(positional...; named...)

Create a Layer with positional and named selectors. These selectors will be translated into input data for the Makie plotting function or AlgebraOfGraphics analysis that is chosen to visualize the Layer.

A Layer created with mapping does not have a data source by default, you can add one by multiplying with the output of the data function.

The positional and named selectors of mapping are converted to actual input data for the plotting function that will be selected via visual. The translation from selector to data differs according to the data source.

Tabular data

When a mapping is combined with a data(tabular) where tabular is some Tables.jl-compatible object, each argument will be interpreted as a column selector. Additionally, it's allowed to specify columns outside of the dataset directly by wrapping the values in direct. The values can either be vectors that have to match the number of rows from the tabular data, or scalars that will be expanded as if they were a column filled with the same value.

mapping(
    :x,                        # column named "x"
    "a column";                # column named "a column"
    color = 1,                 # first column
    marker = direct("abc"),    # a new column filled with the string "abc"
    linestyle = direct(1:3),   # a new column, length must match the table
)

nothing

If no data is set, each entry of mapping should be an AbstractVector that specifies a column of data directly. Scalars like strings for example will be expanded as if they were a column filled with the same value. This is useful when a legend should be shown, but there's only one group.

mapping(
    1:3,               # a column with values 1 to 3
    [4, 5, 6],         # a column with values 4 to 6 
    color = "group 1", # a column with repeated value "group 1"         
)

Pregrouped

With data(Pregrouped()) * mapping(...) or the shortcut pregrouped(...), each element in mapping specifies input data directly, like with nothing. However, in this mode, data should be passed in pregrouped. Categorical variables should come as a vector of categories, while numerical variables should come as a vector of vectors of values, with as many inner vectors as there are groups in the categorical variables.

pregrouped(
    [[1, 2, 3], [4, 5]], # two grouped vectors, of length 3 and 2
    color = ["A", "B"]   # a vector with two categorical group values
)
source

Aesthetics

The structure of a mapping is always directly tied to the signature of the plotting function (or analysis) that it is being connected with. What visual aspects of the plot the positional or keyword arguments affect depends on the plotting function in use.

To be used with AlgebraOfGraphics, a plotting function has to add a declaration which aesthetics (like X, Y, Color, MarkerSize, LineStyle) its arguments map to. This mechanism allows AlgebraOfGraphics to correctly convert the raw input data into visual attributes for each plotting function and to correctly create and label axes, colorbars and legends.

Aesthetics can also change depending on attributes passed to visual.

For example, for a BarPlot, args 1 and 2 correspond to the X and Y aesthetics by default. But if you change the direction in the visual, then axis labels shift accordingly because the aesthetic mapping has changed to 1 = Y, 2 = X:

using AlgebraOfGraphics
using CairoMakie

df = (; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])
m = mapping(:name, :height_meters)
spec1 = data(df) * m * visual(BarPlot)
spec2 = data(df) * m * visual(BarPlot, direction = :x)

f = Figure()
draw!(f[1, 1], spec1)
draw!(f[1, 2], spec2)
f

Pair syntax

The Pair operator => can be used for three different purposes within mapping:

  • renaming columns
  • transforming columns by row
  • mapping data to a custom scale

Renaming columns

using AlgebraOfGraphics
using CairoMakie

data((; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])) *
   mapping(:name => "Name", :height_meters => "Height (m)") *
   visual(BarPlot) |> draw

Transforming columns

If a Function is paired to the column selector, it is applied by row to the data. Often, you will want to also assign a new name that fits the transformed data, in which case you can use the three-element column => transformation => name syntax:

using AlgebraOfGraphics
using CairoMakie

data((; name = ["Anna", "Beatrix", "Claire"], height_meters = [1.55, 1.76, 1.63])) *
   mapping(:name => (n -> n[1] * "."), :height_meters => (x -> x * 100) => "Height (cm)") *
   visual(BarPlot) |> draw

Row-by-row versus whole-column operations

The pair syntax acts row by row, unlike, e.g., DataFrames.transform. This has several advantages.

  • Simpler for the user in most cases.
  • Less error prone especially
    • with grouped data (should a column operation apply to each group or the whole dataset?)
    • when several datasets are used

Naturally, this also incurs some downsides, as whole-column operations, such as z-score standardization, are not supported: they should be done by adding a new column to the underlying dataset beforehand.

Functions of several arguments

In the case of functions of several arguments, such as isequal, the input variables must be passed as a Tuple.

accuracy = (:species, :predicted_species) => isequal => "accuracy"

Helper functions

Some helper functions are provided, which can be used within the pair syntax to either rename and reorder unique values of a categorical column on the fly or to signal whether a numerical column should be treated as categorical.

The complete API of helper functions is available at Mapping helpers, but here are a few examples:

# column `train` has two unique values, `true` and `false`
:train => renamer([true => "training", false => "testing"]) => "Dataset"
# column `price` has three unique values, `"low"`, `"medium"`, and `"high"`
:price => sorter(["low", "medium", "high"])
# column `age` is expressed in integers and we want to treat it as categorical
:age => nonnumeric
# column `labels` is expressed in strings and we do not want to treat it as categorical
:labels => verbatim

Custom scales

All columns mapped to the same aesthetic type are represented using the same scale by default. This is evident if you plot two different datasets with two different plot types.

In the following example, both Scatter and HLines use the Color aesthetic, Scatter for the strokecolor keyword and HLines for color. A single merged legend is rendered for both, which does not have a title because it derives from two differently named columns.

using AlgebraOfGraphics
using CairoMakie

df_a = (; x = 1:9, y = [1, 2, 3, 5, 6, 7, 9, 10, 11], group = repeat(["A", "B", "C"], inner = 3))

spec1 = data(df_a) * mapping(:x, :y, strokecolor = :group) * visual(Scatter, color = :transparent, strokewidth = 3, markersize = 15)

df_b = (; y = [4, 8], threshold = ["first", "second"])

spec2 = data(df_b) * mapping(:y, color = :threshold) * visual(HLines)

draw(spec1 + spec2)

If we want to have separate legends for both, we can assign a custom scale identifier to either the strokecolor or the color mapping. The name can be chosen freely, it serves only to disambiguate.

spec2_custom_scale = data(df_b) * mapping(:y, color = :threshold => scale(:color2)) * visual(HLines)

draw(spec1 + spec2_custom_scale)

Each scale can be customized further by passing configuration options via scales as the second argument of the draw function. More information on scale options can be found under Scale options.

As an example, we can pass separate colormaps using the palette keyword:

spec2_custom_scale = data(df_b) * mapping(:y, color = :threshold => scale(:color2)) * visual(HLines)

draw(
   spec1 + spec2_custom_scale,
   scales(
      Color = (; palette = [:red, :green, :blue]),
      color2 = (; palette = [:gray30, :gray80]),
   )
)