Data Transformations

Histogram

AlgebraOfGraphics.histogramFunction
histogram(; bins=automatic, weights=automatic, normalization=:none)

Compute a histogram. bins can be an Int to create that number of equal-width bins over the range of values. Alternatively, it can be a sorted iterable of bin edges. The histogram can be normalized by setting normalization. Possible values are:

  • :pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.
  • :density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1.
  • :probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.
  • :none: Do not normalize.

Weighted data is supported via the keyword weights.

Note

Normalizations are computed withing groups. For example, in the case of normalization=:pdf, sum of weights within each group will be equal to 1.

source
using AlgebraOfGraphics, CairoMakie
set_aog_theme!()

df = (x=randn(1000), y=randn(1000), z=rand(["a", "b", "c"], 1000))
specs = data(df) * mapping(:x, layout=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
specs = data(df) * mapping(:x, dodge=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
specs = data(df) * mapping(:x, stack=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
data(df) * mapping(:x, :y, layout=:z) * histogram(bins=15) |> draw

Density

df = (x=randn(5000), y=randn(5000), z=rand(["a", "b", "c", "d"], 5000))
data(df) * mapping(:x, layout=:z) * AlgebraOfGraphics.density() |> draw
data(df) * mapping(:x, :y, layout=:z) * AlgebraOfGraphics.density(npoints=50) |> draw
specs = data(df) * mapping(:x, :y, layout=:z) *
    AlgebraOfGraphics.density(npoints=50) * visual(Surface)

draw(specs, axis=(type=Axis3, zticks=0:0.1:0.2, limits=(nothing, nothing, (0, 0.2))))

Frequency

df = (x=rand(["a", "b", "c"], 100), y=rand(["a", "b", "c"], 100), z=rand(["a", "b", "c"], 100))
specs = data(df) * mapping(:x, layout=:z) * frequency()
draw(specs)
specs = data(df) * mapping(:x, layout=:z, color=:y, stack=:y) * frequency()
draw(specs)
specs = data(df) * mapping(:x, :y, layout=:z) * frequency()
draw(specs)

Expectation

df = (x=rand(["a", "b", "c"], 100), y=rand(["a", "b", "c"], 100), z=rand(100), c=rand(["a", "b", "c"], 100))
specs = data(df) * mapping(:x, :z, layout=:c) * expectation()
draw(specs)
specs = data(df) * mapping(:x, :z, layout=:c, color=:y, dodge=:y) * expectation()
draw(specs)
specs = data(df) * mapping(:x, :y, :z, layout=:c) * expectation()
draw(specs)

Linear

AlgebraOfGraphics.linearFunction
linear(; interval)

Compute a linear fit of y ~ 1 + x. An optional named mapping weights determines the weights. Use interval to specify what type of interval the shaded band should represent. Valid values of interval are :confidence delimiting the uncertainty of the predicted relationship, and :prediction delimiting estimated bounds for new data points.

source
x = 1:0.05:10
a = rand(1:7, length(x))
y = 1.2 .* x .+ a .+ 0.5 .* randn.()
df = (; x, y, a)
specs = data(df) * mapping(:x, :y, color=:a => nonnumeric) * (linear() + visual(Scatter))
draw(specs)

Smoothing

AlgebraOfGraphics.smoothFunction
smooth(span=0.75, degreee=2)

Fit a loess model. span is the degree of smoothing, typically in [0,1]. Smaller values result in smaller local context in fitting. degree is the polynomial degree used in the loess model.

source
x = 1:0.05:10
a = rand(1:7, length(x))
y = sin.(x) .+ a .+ 0.1 .* randn.()
df = (; x, y, a)
specs = data(df) * mapping(:x, :y, color=:a => nonnumeric) * (smooth() + visual(Scatter))
draw(specs)

This page was generated using Literate.jl.