Mappings
Mappings determine how the date is translated into a plot. Positional mappings correspond to the x
, y
or z
axes of the plot, whereas the keyword arguments correspond to plot attributes that can vary continuously or discretely, such as color
or markersize
.
Mapping variables are split according to the categorical attributes in it, and then converted to plot attributes using a default palette.
using AlgebraOfGraphics
mapping(:weight_mm => "weight (mm)", :height_mm => "height (mm)", marker = :gender)
AlgebraOfGraphics.Layer((), nothing, (:weight_mm => "weight (mm)", :height_mm => "height (mm)"), (marker = :gender,))
Pair syntax
A convenience pair
-based syntax can be used to transform variables on-the-fly and rename the respective column.
Let us assume the table df
contains a column called bill_length_mm
. We can apply an element-wise transformation and rename the column on the fly as follows.
data(df) * mapping(:bill_length_mm => (t -> t + 10) => "bill length (cm)")
A possible alternative, if df
is a DataFrame
, would be to store a renamed, modified column directly in df
, which can be achieved in the following way:
df.var"bill length (cm)" = map(t -> t + 10, df.bill_length_mm)
data(df) * mapping("bill length (cm)") # strings are also accepted for column names
Row-by-row versus whole-column operations
The pair syntax acts row by row, unlike, e.g., DataFrames.transform
. This has several advantages.
- Simpler for the user in most cases.
- Less error prone especially
- with grouped data (should a column operation apply to each group or the whole dataset?)
- when several datasets are used
Naturally, this also incurs some downsides, as whole-column operations, such as z-score standardization, are not supported: they should be done by adding a new column to the underlying dataset beforehand.
Functions of several arguments
In the case of functions of several arguments, such as isequal
, the input variables must be passed as a Tuple
.
accuracy = (:species, :predicted_species) => isequal => "accuracy"
Partial pair syntax
The "triple-pair" syntax is not necessary, one can also only pass the column name, a column name => function pair, or a column name => new label pair.
Helper functions
Some helper functions are provided, which can be used within the pair syntax to either rename and reorder unique values of a categorical column on the fly or to signal that a numerical column should be treated as categorical.
AlgebraOfGraphics.renamer
— Functionrenamer(arr::Union{AbstractArray, Tuple})
Utility to rename a categorical variable, as in renamer([value1 => label1, value2 => label2])
. The keys of all pairs should be all the unique values of the categorical variable and the values should be the corresponding labels. The order of arr
is respected in the legend.
Examples
julia> r = renamer(["class 1" => "Class One", "class 2" => "Class Two"])
AlgebraOfGraphics.Renamer{Vector{String}, Vector{String}}(["class 1", "class 2"], ["Class One", "Class Two"])
julia> println(r("class 1"))
Class One
Alternatively, a sequence of pair arguments may be passed.
julia> r = renamer("class 1" => "Class One", "class 2" => "Class Two")
AlgebraOfGraphics.Renamer{Tuple{String, String}, Tuple{String, String}}(("class 1", "class 2"), ("Class One", "Class Two"))
julia> println(r("class 1"))
Class One
If arr
does not contain Pair
s, elements of arr
are assumed to be labels, and the unique values of the categorical variable are taken to be the indices of the array. This is particularly useful for dims
mappings.
Examples
julia> r = renamer(["Class One", "Class Two"])
AlgebraOfGraphics.Renamer{Nothing, Vector{String}}(nothing, ["Class One", "Class Two"])
julia> println(r(2))
Class Two
AlgebraOfGraphics.sorter
— Functionsorter(ks...)
Utility to reorder a categorical variable, as in sorter("low", "medium", "high")
. ks
should include all the unique values of the categorical variable. The order of ks
is respected in the legend.
AlgebraOfGraphics.nonnumeric
— Functionnonnumeric(x)
Transform x
into a non numeric type that is printed and sorted in the same way.
Examples
# column `train` has two unique values, `true` and `false`
:train => renamer(true => "training", false => "testing") => "Dataset"
# column `price` has three unique values, `"low"`, `"medium"`, and `"high"`
:price => sorter("low", "medium", "high")
# column `age` is expressed in integers and we want to treat it as categorical
:age => nonnumeric