Analyses

Histogram

AlgebraOfGraphics.histogram โ€” Function
histogram(; bins=automatic, datalimits=automatic, closed=:left, normalization=:none)

Compute a histogram.

The attribute bins can be an Integer, an AbstractVector (in particular, a range), or a Tuple of either integers or abstract vectors (useful for 2- or 3-dimensional histograms). When bins is an Integer, it denotes the approximate number of equal-width intervals used to compute the histogram. In that case, the range covered by the intervals is defined by datalimits (it defaults to the extrema of the whole data). The keyword argument datalimits can be a tuple of two values, e.g. datalimits=(0, 10), or a function to be applied group by group, e.g. datalimits=extrema. When bins is an AbstractVector, it denotes the intervals directly.

closed determines whether the the intervals are closed to the left or to the right.

The histogram can be normalized by setting normalization. Possible values are:

  • :pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.
  • :density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1.
  • :probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.
  • :none: Do not normalize.

Weighted data is supported via the keyword weights (passed to mapping).

Note

Normalizations are computed withing groups. For example, in the case of normalization=:pdf, sum of weights within each group will be equal to 1.

source
using AlgebraOfGraphics, CairoMakie
set_aog_theme!()

df = (x=randn(5000), y=randn(5000), z=rand(["a", "b", "c"], 5000))
specs = data(df) * mapping(:x, layout=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
specs = data(df) * mapping(:x, dodge=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
specs = data(df) * mapping(:x, stack=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
specs = data(df) *
    mapping((:x, :z) => ((x, z) -> x + 5 * (z == "b")) => "new x", col=:z) *
    histogram(datalimits=extrema, bins=20)
draw(specs, facet=(linkxaxes=:minimal,))
data(df) * mapping(:x, :y, layout=:z) * histogram(bins=15) |> draw

Density

AlgebraOfGraphics.density โ€” Function
density(; datalimits=automatic, kernel=automatic, bandwidth=automatic, npoints=200)

Fit a kernel density estimation of data.

Here, datalimits specifies the range for which the density should be calculated (it defaults to the extrema of the whole data). The keyword argument datalimits can be a tuple of two values, e.g. datalimits=(0, 10), or a function to be applied group by group, e.g. datalimits=extrema. The keyword arguments kernel and bandwidth are forwarded to KernelDensity.kde. npoints is the number of points used by Makie to draw the line

Weighted data is supported via the keyword weights (passed to mapping).

source
df = (x=randn(5000), y=randn(5000), z=rand(["a", "b", "c", "d"], 5000))
specs = data(df) * mapping(:x, layout=:z) * AlgebraOfGraphics.density(datalimits=((-2.5, 2.5),))

draw(specs)
specs = data(df) *
    mapping((:x, :z) => ((x, z) -> x + 5 * (z โˆˆ ["b", "d"])) => "new x", layout=:z) *
    AlgebraOfGraphics.density(datalimits=extrema)
draw(specs, facet=(linkxaxes=:minimal,))
data(df) * mapping(:x, :y, layout=:z) * AlgebraOfGraphics.density(npoints=50) |> draw