Mapping
Mappings determine how the date is translated into a plot. Positional mappings correspond to the x
, y
or z
axes of the plot, whereas the keyword arguments correspond to plot attributes that can vary continuously or discretely, such as color
or markersize
.
Mapping variables are split according to the categorical attributes in it, and then converted to plot attributes using a default palette.
using AlgebraOfGraphics
mapping(:weight_mm => "weight (mm)", :height_mm => "height (mm)", marker = :gender)
AlgebraOfGraphics.Layer((), nothing, (:weight_mm => "weight (mm)", :height_mm => "height (mm)"), (marker = :gender,))
Pair syntax
A convenience pair
-based syntax can be used to transform variables on-the-fly and rename the respective column.
Let us assume the table df
contains a column called bill_length_mm
. We can apply an element-wise transformation and rename the column on the fly as follows.
data(df) * mapping(:bill_length_mm => (t -> t / 10) => "bill length (cm)")
A possible alternative, if df
is a DataFrame
, would be to store a renamed, modified column directly in df
, which can be achieved in the following way:
df.var"bill length (cm)" = map(t -> t / 10, df.bill_length_mm)
data(df) * mapping("bill length (cm)") # strings are also accepted for column names
Row-by-row versus whole-column operations
The pair syntax acts row by row, unlike, e.g., DataFrames.transform
. This has several advantages.
- Simpler for the user in most cases.
- Less error prone especially
- with grouped data (should a column operation apply to each group or the whole dataset?)
- when several datasets are used
Naturally, this also incurs some downsides, as whole-column operations, such as z-score standardization, are not supported: they should be done by adding a new column to the underlying dataset beforehand.
Functions of several arguments
In the case of functions of several arguments, such as isequal
, the input variables must be passed as a Tuple
.
accuracy = (:species, :predicted_species) => isequal => "accuracy"
Partial pair syntax
The "triple-pair" syntax is not necessary, one can also only pass the column name, a column name => function pair, or a column name => new label pair.
Helper functions
Some helper functions are provided, which can be used within the pair syntax to either rename and reorder unique values of a categorical column on the fly or to signal whether a numerical column should be treated as categorical.
The complete API of helper functions is available at Mapping helpers.
Examples
# column `train` has two unique values, `true` and `false`
:train => renamer([true => "training", false => "testing"]) => "Dataset"
# column `price` has three unique values, `"low"`, `"medium"`, and `"high"`
:price => sorter(["low", "medium", "high"])
# column `age` is expressed in integers and we want to treat it as categorical
:age => nonnumeric
# column `labels` is expressed in strings and we do not want to treat it as categorical
:labels => verbatim