Skip to content

Intro to AoG - IV - Data transformations

In the previous chapters, we have seen two different features of the mapping function, column selection and labelling:

julia
using AlgebraOfGraphics
using CairoMakie
using DataFrames

penguins = DataFrame(AlgebraOfGraphics.penguins())

layer = data(penguins) *
    mapping(
        :bill_length_mm => "Bill length (mm)",
        :bill_depth_mm => "Bill depth (mm)",
        color = :species => "Species",
    ) *
    visual(Scatter)
draw(layer)

Row transformations

There is another feature which allows you to transform your input data before it is passed on to AoG's grouping and drawing machinery. If you pair a column selector with a Function, this function will be applied to each row of that column. For example, we can convert mm to cm in our scatter plot:

julia
layer_cm = data(penguins) *
    mapping(
        :bill_length_mm => (x -> x / 10) => "Bill length (cm)",
        :bill_depth_mm => (x -> x / 10) => "Bill depth (cm)",
        color = :species => "Species",
    ) *
    visual(Scatter)
draw(layer_cm)

When transforming columns, the labels are unchanged by default, so take care that you correct your labels, for example when they contain units.

Note that AoG offers no mechanism to transform input data by-column instead of by-row. An example for a by-column transformation would be a z-transform, for which the column mean and standard deviation have to be computed. You have to apply such transformations to your data before plotting.

Transforming multiple rows

We can actually select two or more columns together in a mapping by wrapping them in a tuple, and we can then apply a transformation function that returns a single value. For example, we can calculate an approximation of "bill area" by multiplying bill depth and length (we have to wrap the standalone * operator in parentheses so this parses correctly):

julia
area_spec = data(penguins) *
    mapping(
        :species,
        (:bill_length_mm, :bill_depth_mm) => (*) => "bill area",
    ) *
    visual(Violin, orientation = :horizontal)

draw(area_spec)

Special transformations

AoG contains a number of special functions which are intended as a way to conveniently control certain aspects of plotting, rather than numerically transforming data. The functions presented here are renamer, sorter, presorted, nonnumeric and verbatim. When used as transformation functions, all of these wrap input data in special types that cause dispatches to enable special behavior in AoGs machinery.

renamer

Calling renamer(vec) returns a function that renames every first element in vec to every last element.

julia
layer_renamed = data(penguins) *
    mapping(
        :bill_length_mm,
        :bill_depth_mm,
        color = :species => renamer(["Adelie" => "Ad.", "Chinstrap" => "Ch.", "Gentoo" => "Ge."]),
    ) *
    visual(Scatter)
draw(layer_renamed)

This renaming affects the order and the labelling of those groups but not the grouping itself. Note how Ad. moves to the back, and how Chinstrap and Gentoo groups do not merge although we label them both "Ch/Ge":

julia
layer_renamed = data(penguins) *
    mapping(
        :bill_length_mm,
        :bill_depth_mm,
        color = :species => renamer(["Chinstrap" => "Ch/Ge", "Gentoo" => "Ch/Ge", "Adelie" => "Ad."]),
    ) *
    visual(Scatter)
draw(layer_renamed)

sorter

The sorter utility is similar to renamer but it only affects the order of elements and doesn't rename them:

julia
layer_sorted = data(penguins) *
    mapping(
        :bill_length_mm,
        :bill_depth_mm,
        color = :species => sorter(["Gentoo", "Chinstrap", "Adelie"]),
    ) *
    visual(Scatter)
draw(layer_sorted)

presorted

The presorted function wraps values in a way such that AlgebraOfGraphics remembers the order in which it encountered them. Normally, AoG sorts categories before plotting, but sometimes the desired order of categories is not alphanumeric but whatever the order in the source data was.

This usually applies when the source data is already sorted meaningfully, for example a list of animals that is ordered by weight. It will by default be ordered alphabetically when plotting, which destroys the weight order:

julia
animals = DataFrame(
    Animal=["Elephant", "Hippo", "Dolphin", "Gorilla", "Cheetah", "Beaver", "Armadillo", "Flamingo"],
    Weight_kg=[4000, 1500, 200, 160, 60, 25, 6, 2.5]
)

animals_autosorted = data(animals) *
    mapping(:Animal, :Weight_kg) *
    visual(BarPlot)

draw(animals_autosorted)

In order to leave it in its original order, we transform :Animal with presorted:

julia
animals_presorted = data(animals) *
    mapping(:Animal => presorted, :Weight_kg) *
    visual(BarPlot)

draw(animals_presorted)

nonnumeric

As mentioned before, numeric columns are treated as continuous data by default. If we want to use such columns for grouping, we can wrap them with nonnumeric:

julia
population = DataFrame(
    subject = repeat(1:12, inner = 30),
    time = repeat(1:30, 12),
    measurement = reduce(vcat, [cumsum(randn(30)) for _ in 1:12])
)

spec_nonnumeric = data(population) *
    mapping(
        :time,
        :measurement,
        layout = :subject => nonnumeric,
    ) *
    visual(Lines)

draw(spec_nonnumeric)

verbatim

As you already know, string columns are treated as categorical by AlgebraOfGraphics. But sometimes we may want to just pass strings as they are to Makie's plotting functions. The main situation where this happens is when we are adding Text plots because the Text plot type expects a vector of string-like objects as its text attribute.

For example, we can label a couple of our penguins with their body weight, by making a subset dataframe, and applying both a string formatting function and the verbatim function to the body_mass_g column:

julia
smallest_largest = sort(penguins, :body_mass_g)[[1, end], :]

base_mapping = mapping(:bill_length_mm, :bill_depth_mm, color = :species)

penguin_layer = data(penguins) * base_mapping * visual(Scatter, alpha = 0.2)

text_layer = data(smallest_largest) *
    base_mapping *
    mapping(
        text = :body_mass_g => x -> verbatim("$(x)g"),
    ) *
    visual(Makie.Text, fontsize = 20)

draw(penguin_layer + text_layer)

Summary

This concludes our chapter on data transformations. You have learned how to transform your data by-row on the fly, and which special transformation functions that AoG offers can make your life easier.

The next chapter will go beyond the long-format tables that we have used so far and demonstrate what other data formats AlgebraOfGraphics can handle.